Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Probabilistic divide-and-conquer -- a new method of exact simulation -- and lower bound expansions for random Bernoulli matrices via novel integer partitions
(USC Thesis Other)
Probabilistic divide-and-conquer -- a new method of exact simulation -- and lower bound expansions for random Bernoulli matrices via novel integer partitions
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PROBABILISTIC DIVIDE-AND-CONQUER { A NEW METHOD FOR EXACT SIMULATION { AND LOWER BOUND EXPANSIONS FOR RANDOM BERNOULLI MATRICES VIA NOVEL INTEGER PARTITIONS by Stephen Anthony DeSalvo A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) May 2012 Copyright 2012 Stephen Anthony DeSalvo Dedication This dissertation is dedicated to my family. ii Epigraph Whatever you do will be insignicant, but it is very important that you do it. { Mahatma Gandhi Silence is sometimes the best answer. { Dalai Lama XIV "Don't try to be a great man, just be a man, and let history make its own judgments" { Zefram Cochrane A monkey is typing . . . iii Acknowledgements First, I would like to thank the anonymous referees for the paper on random Bernoulli matrices, who found numerous typos and made some helpful suggestions for improving the presentation of the material. I would like to thank Paul Newton for his guidance early on in my graduate studies. Also Martin Lo of the Jet Propulsion Laboratory was a source of inspiration and support. I learned many valuable lessons from Max Jodeit during my last year as an under- graduate at the University of Minnesota, and I benetted tremendously from all the time he spent with me during oce hours. Gary Rosen was also very supportive during my time at USC, as were many of the faculty in the department of mathematics. I also enjoyed the visits by Mario Martelli from the Claremont Graduate University, where I received not only valuable advice but also a chance to practice my Italian. My fellow graduate students also gave me an invaluable perspective on math and the world, and I wish the best for them. I was fortunate to befriend a great many individuals during my time as a graduate student, and I thank them for their support and I hope I was a source of support for them as well. iv I would like to thank my siblings: Veronica, Paul, Chris and my parents, for their support over the years. My dad in particular was able to impart valuable advice from his days as a graduate student. My mom supported me in pretty much anything I decided to do, and made sure I had everything I needed while away from home. I thank my brother Chris for creating SpaceTime, which gave me a productive hobby of custom-designing and testing a high level programming language. My wife Rehana has been especially understanding this nal year, and her family has welcomed me into their lives. I especially thank Nazneen, Yousuf, Sara, and Farah. A heartfelt thanks to Dana and David Dornsife, whom I have never met, but who generously donated to the College of Letters, Arts, and Sciences, and as a result there were many more dissertation fellowships awarded, myself being the recipient of one of them. This award freed me from teaching duties, which allowed me much more time to pursue my mathematical interests. Finally, my sincerest thanks to my advisor, Richard Arratia, who very closely guided my development and chose problems that were fun and interesting to develop. I learned more about probabilistic intuition by simply listening to him speak his thoughts out loud than in any class or book, and I appreciate all the time he spent with me. v Table of Contents Dedication ii Epigraph iii Acknowledgements iv List Of Tables viii Abstract ix Chapter 1: Introduction 1 Chapter 2: Integer Partitions 3 2.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Generating function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 A series for p(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Numerical evaluation of p(n) . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 3: Random Bernoulli Matrices 10 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Lower bound expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Templates, Bernoulli orthogonal complements, and novel partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Polynomial coecients arising from inclusion-exclusion . . . . . . . . . . . 35 3.5 Interaction of left and right null vectors . . . . . . . . . . . . . . . . . . . 45 Chapter 4: Probabilistic Divide-And-Conquer 48 4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1.1 Exact simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1.2 Probabilistic divide-and-conquer, motivated by mix-and-match . . 51 4.2 The basic lemma for exact sampling with divide-and-conquer . . . . . . . 53 4.2.1 Algorithmic implications of the basic lemma . . . . . . . . . . . . . 55 4.2.2 Use of acceptance/rejection sampling . . . . . . . . . . . . . . . . . 56 4.2.3 Caution: mix-and-match not yet enabled . . . . . . . . . . . . . . 58 4.3 Simple matching enables mix-and-match . . . . . . . . . . . . . . . . . . . 59 vi 4.4 Algorithms for simulating random partitions of n . . . . . . . . . . . . . . 62 4.4.1 For baseline comparison: table methods . . . . . . . . . . . . . . . 63 4.4.2 Waiting to get lucky . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.3 Divide-and-conquer, by small versus large . . . . . . . . . . . . . . 68 4.4.3.1 Divide-and-conquer with mix-and-match . . . . . . . . . 70 4.4.3.2 Roaming x . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.4.3.3 Deliberately missing data . . . . . . . . . . . . . . . . . . 71 4.4.4 Divide-and-conquer with a trivial second half . . . . . . . . . . . . 73 4.4.5 Self-similar iterative divide-and-conquer: p(z) =d(z)p(z 2 ) . . . . . 74 4.4.5.1 Exploiting a parity constraint . . . . . . . . . . . . . . . 76 4.4.5.2 The overall cost of the main problem and all its subproblems 78 4.4.5.3 A variation based on p(z) =p odd (z)p(z 2 ) . . . . . . . . . 80 4.5 For integer partitions: review, method of choice . . . . . . . . . . . . . . . 82 4.5.1 Method of choice for unrestricted partitions . . . . . . . . . . . . . 82 4.5.2 Complexity considerations . . . . . . . . . . . . . . . . . . . . . . . 84 4.5.3 Partitions with restrictions . . . . . . . . . . . . . . . . . . . . . . 87 4.5.4 An eye for gathering statistics . . . . . . . . . . . . . . . . . . . . . 88 4.6 For comparison: opportunistic divide-and-conquer with mix-and-match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Chapter 5: Opportunistic Probabilistic Divide-And-Conquer 92 References 94 vii List Of Tables 3.1 Novel partitions sorted by r . Conjectured to be complete with respect to r 38=256. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Novel partitions of length 8, conjectured to be the complete list. . . . . . 33 viii Abstract This thesis is divided into two areas of combinatorial probability: probabilistic divide- and-conquer, and random Bernoulli matrices via novel integer partitions. Probabilistic divide-and-conquer is a new method of exact sampling that simulates from a set of objects by dividing each object into two disjoint parts, and pieces them together. The study of random Bernoulli matrices is driven by the asymptotics of the probability that a random matrix whose entries are independent, identically distributed Bernoulli random variables with parameter 1=2 is singular. Our approach is an inclusion-exclusion expansion for this probability, dening a necessary and sucient class of integer partitions as an index set to characterize all of the singularities. ix Chapter 1 Introduction An expert is a person who has made all the mistakes which can be made in a narrow eld. { Niels Bohr. This thesis is primarily composed of two papers in combinatorial probability, one re- cently accepted to the Annals of Combinatorics on random Bernoulli matrices (RBM), which is posted on the ArXiv at http://arxiv.org/abs/1105.2834, and the other submit- ted to Combinatorics, Probability and Computing on probabilistic divide-and-conquer (PDC), also posted on the ArXiv at http://arxiv.org/abs/1110.3856. Chapter 2 introduces some basic notation and results on integer partitions. Chapter 3 is based on the the paper submitted to the Annals of Combinatorics on the probability that a random Bernoulli matrix is singular, with corrections added from the anonymous referees. Chapter 4 is taken almost verbatim from the paper submitted to the journal Combinatorics, Probability, Computing on probabilistic divide-and-conquer. 1 These two disparate subjects have a common thread in this document, viz., integer partitions. Integer partitions are used as an example of an object that yields to e- cient sampling methods under probabilistic divide-and-conquer, and they are also used as an index set to describe the singularities in an inclusion-exclusion expansion of the probability that a random Bernoulli matrix is singular. Chapter 2 introduces integer partitions and is a very brief account of some of the major results that I found useful during the development of Chapter 4. A very nicely written book at the undergraduate level is [20], which is a must-read for anyone beginning a serious study of integer partitions. Integer partitions are a fundamental combinatorial structure and the information provided in Chapter 2 should be a sucient starting point for understanding the other chapters. Chapter 3 was developed shortly after a USC Mathematics Colloquium talk my ad- visor and I attended by Philip Matchett Wood on [10] in January 2011. We were drawn to the subject mainly because the index sets were integer partitions, and because no one had yet worked on the lower bound expansions in great detail. Chapter 4 began by the posing of a simple question to the author by the author's advisor, \how would you simulate a random integer partition quickly for as large an n as possible?" Building upon the work of Fristedt [17] and [7], an idea was born to break up the task into two smaller tasks, simulate each one separately, and piece them back together. 2 Chapter 2 Integer Partitions An equation means nothing to me unless it expresses a thought of God. { Srinivasa Ramanujan I am interested in mathematics only as a creative art. { G. H. Hardy 2.1 History There is a long history associated with the study of integer partitions, with an account contained in the thesis of Dr Paul Garcia [19] on Persy Alexander MacMahon. George Andrews' book [2] from 1976 gives an extensive mathematical summary of the major developments and is a natural starting point for a serious study. However, the history that has driven many of the developments that we will nd pertinent has its origins in a paper written by Hardy and Ramanujan [22] on estimating 3 the number of integer partitions of size n, denoted by p(n). This collaboration led to many other discoveries, including Ramanujan's congruence observations, but in particular it opened up the subject to analysis by introducing an integration technique now known as the circle method. In the words of Hardy and Ramanujan, The idea which dominates this paper is that of obtaining asymptotic formul for p(n) by a detailed study of the integral (1.21). This idea is an extremely obvious one; it is the idea which has dominated nine-tenths of modern research in the analytic theory of numbers: and it may seem very strange that it should never have been applied to this particular problem before. Of this there are no doubt two explanations. The rst is that the theory of partitions has received its most important developments, since its foundation by Euler, at the hands of a series of mathematicians whose interests have lain primarily in algebra. The second and more fundamental reason is to be found in the extreme complexity of the behavior of the generating function f(x) near a point of the unit circle. 2.2 Introduction An integer partition is an unordered list of positive integers. An integer partition of size n is a list of unordered positive integers whose sum is n. For example, the seven integer partitions of size 5 aref5g,f4,1g,f3,2g,f3,1,1g,f2,2,1g,f2,1,1,1g,f1,1,1,1,1g. It is standard to write the elements in decreasing order (reverse lexicographic) and to drop the commas and brackets when there is no danger of ambiguity, simply listing the elements consecutively, as in 5, 41, 32, 311, 221, 2111, 11111. A further notational convenience is to introduce exponential notation for the sake of the latter two partitions, so that we may write 5 1 ; 4 1 1 1 ; 3 1 2 1 ; 3 1 1 2 ; 2 2 1 1 ; 2 1 1 3 ; 1 5 and if there is no confusion we may drop the superscript 1s as we please, so that our list may be more easily represented as 5; 41; 32; 31 2 ; 2 2 1; 21 3 ; 1 5 . 4 The number of integer partitions of size n is denoted by p(n), where for example p(5) = 7. The rst few values are (starting with p(0) := 1) 1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, where p(10) = 42. p(n) grows very rapidly, as is evidenced by p(100) = 190569292 : = 1:9 10 6 , and p(1000) : = 2:4 10 31 . In fact, if we dene a n b n to mean that lim n!1 an bn = 1, then it was rst shown by Hardy and Ramanujan [22] that p(n) exp(c 1 p n) c 2 n ; (2.1) where c 1 = q 2 3 and c 2 = 4 p 3. The proof of this result is quite technical, but proving for example, p(n) < exp(c 1 p n) and p(n) < exp(c 1 p n)=c 3 p n for some c 3 > 0, is easily established using standard elementary analysis; these cases are treated in [3]. 2.3 Generating function Generating functions play a signicant role in the study of integer partitions. A good introduction to generating functions is the book generatingfunctionology by Herbert Wilf [52]. The idea of a generating function is to encode information into a lattice structure. For example, suppose we wanted to encode the list of numbers a =fa 0 ;a 1 ;:::;a n g into a single object in a way that makes them easily accessible. The ordainary generating function for the set a would be the polynomial g(x) = a 0 +a 1 x +a 2 x 2 +:::a n x n , with g(0) =a 0 , g 0 (0) =a 1 , g 00 (0)=2 =a 2 , . . .g (n) (0)=n! =a n . 5 The generating function for p(n) is given by f(x) = Q i1 (1x i ) 1 . We can expand this function out as follows, f(x) = 1 1x 1 1x 2 1 1x 3 ::: = (1 +x +x 2 +:::)(1 +x 2 +x 4 +:::)(1 +x 3 +x 6 +:::) = 1 +x + 2x 2 + 3x 3 + 5x 4 + 7x 5 +::: = 1 X n=1 p(n)x n : The coecients p(n) represent unrestricted partitions, but we could also ask for the number of partitions of size n with distinct parts. For example, the partitions of 5 with distinct parts are 5, 41, 32. The generating function is given by f d (x) = (1 +x)(1 + x 2 )(1 +x 3 ) . A more general version places at most b 1 1s,b 2 2s, etc. For example, the number of partitions of 5 with at most 2 ones are 5, 41, 32, 311, 221. The generating function h(x) has the form h(x) = Y i1 (1 +x +x 2 +x b 1 )(1 +x 2 +x 2b 2 )(1 +x 3 +x 6 + +x 3b 3 ) : The generating function can be a source of identities, for instance, f d (x) = Y i (1 +x i ) = Y i (1 +x i ) 1x i 1x i = Y i 1x 2i 1x i = Y i odd 1 1x i =f o (x); where f o (x) is the generating function for the number partitions with only odd parts. This allows us to conclude that the number of partitions of n with distinct parts is equal to the number of partitions with only odd parts. 6 A good treatment on the use of generating functions in integer partitions is [2] and also [51]. 2.4 A series for p(n) The celebrated result of Hardy and Ramanujan is the asymptotic formula p(n) exp(c 1 p n)=c 2 n; but their work showed quite a bit more than just that. Their nal result states that for any xed n, there exists an a> 0 such that p(n) = a p n X q=1 A q q +O(n 1=4 ); (2.2) where A q and q are explicit, computable expressions. That is, once n is xed, only O( p n) summands need to be considered before one is within 0.5 of the true answer. We do not obtain the value ofa from their theory, however; so while it is of great theoretical insight and yields the asymptotic formula from Equation (2.1), from a computational point of view it does not allow one to compute this value with absolute certainty. In 1937, however, Rademacher [45] published an innite series forp(n) that converges, that is, p(n) = 1 X q=1 A q ' q ; 7 for similarly dened functions A q and ' q . With a convergent series representation, this opens up the possibility of estimating the error involved in truncating the series, whereas the original series in Equation (2.2) is not convergent. 2.5 Numerical evaluation of p(n) Shortly after the work of Rademacher, Lehmer [31, 33] worked on coming up with explicit upper bounds on a truncated series, bounding the remainder in a way that guaranteed the estimate was within 0.5 of the true answer. In particular, his Theorem 13 from [31] states that Theorem 1 (Lehmer). If only 2n 1=2 =3 terms of the Hardy-Ramanujan series be taken, the resulting sum will dier from p(n) by less than 1=2, provided n> 600: The cut-o value of 600 was chosen because at the time (ca. 1930s) there was a table of values ofp(n) forn = 0; 1; 2;:::; 600. He goes on to write that the 2=3 can be improved upon, for example to 1=2 if n> 3600, and this is generalized to Theorem 2 (Lehmer). Let > 1 and let c = (2=3) 1=2 = 2:565 . Then p(n) is the nearest integer to the sum of the rst n 1=2 = terms of the Hardy-Ramanujan series provided n> 27 1=2 c 6 2 sinh(c ) c 3 3 + 1 6 3 =O(e 3c 11 ): A related question is for > 1, how large does n need to be so that the rst n 1=2 = terms of the Hardy-Ramanujan series are within a specied Tolerance of the true value? In the age of computers and oating point accuracy, it would be grossly inecient to 8 attempt to compute a oating point result for p(1000) : = 2 10 31 , accurate to say 16 digits, by computing all 32 digits accurately. The reason why this is a worthwhile endeavor is because the asymptotic expansions are heavily weighted in the rst few terms. To again quote [22], Taking n = 100, we found that the rst six terms of our formula gave 190568944:783 +348:872 2:598 +:685 +:318 :064 190569291:996; while p(100) = 190569292; so that the error after six terms is only .004. We then proceeded to calculate p(200), and found 3; 972; 998; 993; 185:896 +36; 282:978 87:555 +5:147 +1:424 +0:071 +0:000 +0:043 3; 972; 999; 029; 388:004; and Major MacMahon's subsequent calculations showed that p(200) is, in fact, 3; 972; 999; 029; 388: This demonstrates the striking principle of the Hardy-Ramanujan series, which is that the rst term gives striking accuracy, with remaining terms contributing towards pinning down the smaller digits. 9 Chapter 3 Random Bernoulli Matrices There is no philosophy which is not founded upon knowledge of the phenomena, but to get any prot from this knowledge it is absolutely necessary to be a mathematician. { Daniel Bernoulli A mathematician is a device for turning coee into theorems. { Paul Erdos 3.1 Introduction To introduce our problem, we quote verbatim 1 the opening 3 paragraphs of a paper by Kahn, Koml os, and Szemer edi [24]: 1 Apart from correcting a typographical error, and using our own display equation numbering and reference numbering. 10 1.1. The problem. For M n a random n n1-matrix (\random" meaning with respect to uniform distribution), set P n =Pr(M n is singular): The question considered in this paper is an old and rather notorious one: What is the asymptotic behavior of P n ? It seems often to have been conjectured that P n = (1 +o(1))n 2 =2 n1 ; (3.1) that is, that P n is essentially the probability that M n contains two rows or two columns which are equal up to a sign. This conjecture is perhaps best regarded as folklore. It is more or less stated in [30] and is mentioned explic- itly, as a standing conjecture, in [40], but has surely been recognized as the probable truth for considerably longer. (It has also been conjectured ([37]) that P n =(n 2 2 n )!1.) Of course the guess in (3.1) may be sharpened, e.g., to P n 2 n 2 2 (n1) 2 4 n 4 3 8 n ; (3.2) the right-hand side being essentially the probability of having a minimal row or column dependency of length 4. Our paraphrase: (3.1) says, for a Bernoulli matrix to be singular, the most likely way is having a left or right null vector of the template 11, and (3.2) says that the second most likely way is having a left or right null vector of the template 1111. When one continues the expansion (3.2) to higher order, two features emerge. First, the patterns, corresponding to 11, 1111,::: , which we call templates, have a rich structure. The third most likely template is 1 6 = 111111, with exponential decay (5=16) n , and the fourth most likely template is 1 8 , with exponential decay (35=128) n . The real pattern, which is not simply 1 2m , emerges starting with the fth template, 21111. In the higher order continuation of (3.2), the second feature also rst appears with the fth term, which is the distinction between 2 n 2 2 (n1) , which is the expected number of occurrences of a right or left null vector of template 11, and the probability of one 11 or more such occurrences. This is because the exponential decay rate for 21111, which is (1=4) n , is small enough to force consideration of the probability that two null vectors from the template 11 appear. In Section 3.3 we dene what we call novel integer partitions, which include the templates from the previous paragraph, and we prove that this set is both necessary (Theorem 4) and sucient (Theorem 2) for detecting singularities. The natural extensions of (3.2) are our Conjectures 1 and 2, immediately below. Conjecture 1. LetS denote the event that then byn random Bernoulli matrixM =M n is singular, with P n =P(S) =P n (S). Then for every > 0, P(SnD 11 ) = o 3 + 8 n (3.3) P(Sn (D 11 [D 1111 )) = o 5 + 16 n P(Sn (D 11 [D 1111 [D 1 6)) = o 35 + 128 n P(Sn (D 11 [D 1111 [D 1 6[D 1 8) = o 1 + 4 n P(Sn (D 11 [D 1111 [D 1 6[D 1 8[D 21111 )) = o 63 + 256 n P(SnE 6 ) = o 15 + 64 n P(SnE 7 ) = o 231 + 1024 n P(SnE 8 ) = o 7 + 32 n 12 and so on, where E 6 = D 11 [D 1111 [D 1 6[D 1 8[D 21111 [D 1 10, E 7 = E 6 [D 21 6, and E 8 =E 7 [D 1 12: Conjecture 2. There is a list of novel partitions, which in order of exponential rate (3.5) is (1) = 11;(2) = 1111;:::;(5) = 21111;:::; (8) = 1 12 ;:::, such that for every r> 0, there exists K > 0 P Sn K [ i=1 D (i) ! =o(r n ): It was rst shown that P n decays exponentially, in [24], with an upper bound of :999 n . This was later improved by Tao and Vu [47] to (:958 +o(1)) n and again [48] to (3=4 +o(1)) n . (See also [49]). Recently Bourgain, Vu, and Wood [10] provided a further improvement to 1 p 2 +o(1) n , which is currently the most accurate bound. Section 3.2 presents an explicit lower bound expansion ofP n , whose exponential decay rates are based on the novel integer partitions of Section 3.3. In Section 3.4 we derive the polynomial coecients of our lower bound expansion. In Section 3.5 we give some bounds on the interaction of potential left and right null vectors, hoping to supply a tool for use in boundingP(SnD 11 ). 3.2 Lower bound expansions The expansion in (3.2) can be continued by considering eventsD 1111 , thatM has a left or right null eigenvector of the form e i e j e k e ` , withD 1 6,D 1 8,D 21 4,D 1 10,D 21 6, and D 1 12 dened similarly. Letting E 8 =D 11 [D 1 4[D 1 6[D 1 8[D 21 4[D 1 10[D 21 6[D 1 12, our expansion can be stated as 13 Theorem 1. For each n, P n P(D 11 ) 4 n 2 1 2 n 12 n 2 2 4 n 2 ! 1 4 n : For each n, the event E =E 8 is a subset of the event that M is singular, hence trivially, P n P n (E), and for all > 0, P n (E) =Q 1 (n) 1 2 n +Q 2 (n) 3 8 n +Q 3 (n) 5 16 n +Q 4 (n) 35 128 n +Q 5 (n) 1 4 n +Q 6 (n) 63 256 n +Q 7 (n) 15 64 n +Q 8 (n) 231 1024 n +o 7 + 32 n where Q 1 (n) = 2 2 n 2 ; Q 6 (n) = 2 10 n 10 ; Q 2 (n) = 2 4 n 4 ; Q 7 (n) = 2 7 7 1 n 7 ; Q 3 (n) = 2 6 n 6 ; Q 8 (n) = 2 12 n 12 ; Q 4 (n) = 2 8 n 8 ; Q 5 (n) = 2 5 5 1 n 5 4 2 n 2 2 + 8 n 4 + 5 n 3 ! : Proof. This result follows easily from the combination of Lemmas 3 { 6, given in Sections 3.3 and 3.4. 14 3.3 Templates, Bernoulli orthogonal complements, and novel partitions In the explicit expansions of Conjecture 1 and Theorem 1 there are exponentially de- caying factors such as (1=2) n , (3=8) n , (1=4) n , corresponding to the integer partitions 11, 1111, and 21111. When the expansion is carried out to high order, an obvious necessary condition for a partition = ( 1 ; 2 ;:::; k ) to appear is that it be fairly divisible, in the sense that for some combination of signs, 0 = 1 2 k . However, this is not sucient; some fairly divisible partitions, such as 211 and 321, will never appear. We call the partitions that eventually appear novel. The denitions below will let us characterize these novel partitions, and, to a limited extent, compute them explicitly. Denition 1. Integer partition, as a template for vectors. For a given partition withk parts, letV NZ k1 denote the set of all vectors formed by reordering the parts of lambda, together with all combinations of plus and minus with the requirement that the rst coordinate always has a plus. 2 If has c(i) parts of size i, so that len() :=c(1) +c(2) + =k, then j V j = 2 k1 k! c(1)!c(2)! : Notation: coordinate injection, from R k to R n . We often want to pad our vectors of length k with zeros, to get a vector of length n. We say that a k by n matrix C, with all entries 0 or 1, is a coordinate injection matrix, if 2 We writeN :=f1; 2;:::g for the set of strictly positive integers. 15 every row has exactly one 1, and no column has more than one 1, andC ij =C i 0 j 0 = 1 with i < i 0 implies j < j 0 . (This last requirement is imposed, since our V already accounts for all rearrangements of the parts.) There are n k such matrices. We speak of vectors of length n, of the form vC for some v2V as having template . Denition 2. Templates, used in n dimensions. We write V (n) for the subset of Z n of length n vectors with template . Note, vectors in V (n) may have rst coordinate zero, but the rst non-zero coordinate must be strictly positive. The number of vectors of length n, having template , is jV (n) j = n k j V j = 2 k1 (n) k c(1)!c(2)! ; (3.4) where we write (n) k for n falling k. For an integer partition with parts 1 ; 2 ;::: k , and X = ( 1 ;:::; k ) a vector of independent Bernoulli random variables, letX = 1 1 +:::+ k k denote the weighted sum, and dene r :=P(X = 0): (3.5) We can then compute, for example, r 11 = 1=2, r 1 2m = 2m m =2 2m , r 21111 = 1=4. Denition 3. Bernoulli orthogonal complement. For a vector v2Z k , v ?B =fx2f1; 1g k :vx = 0g: 16 This denition can also be applied when v = = ( 1 ;:::; k ) with 1 k > 0 is an integer partition with k parts, in which case the probability r dened by (3.5) is given by r = jv ?B j 2 k : (3.6) Remark 1. Clearly x2 v ?B ix2 v ?B , that isv ?B = v ?B . For a partition , all v in V have the same sizejv ?B j for their Bernoulli orthogonal complement. Indeed, the various sets v ?B for v2 V are related, by permuting the k coordinates, and applying, for some xed If2; 3;:::;kg, sign ips to all the coordinates indexed by I. Hence, if ?B 6=;, thenf1; 1g k =[ v2V v ?B . Denition 4. The matrix A () for ?B . For an integer partition of length k, with 2p =j ?B j > 0, the matrix A () for the Bernoulli orthogonal complement of is the k by p matrix whose columns are those ele- ments of ?B whose rst coordinate is +1, taken in lexicographic order, with +1 preceding 1. 17 Example 1. Displaying the Bernoulli orthogonal complement. When = 1111, we have 1111 ?B = f (+1; +1; 1; 1); (+1; 1; +1; 1); (+1; 1; 1; +1); (1; 1; +1; +1); (1; +1; 1; +1); (1; +1; +1; 1) g = f + + ; + + ; + +; + +; + +; + + g ; where the second representation omits the parentheses and commas for each k-tuple, and also shows only the signs. Say that has length k, and 2p =j ?B j. Showing only those elements of ?B that begin with +, and transposing, we have ak byp display, to be thought of as an economical representation of the set ?B ; we use this display in Example 4. Treating the same k by p array as a matrix, we have A () , as dened in Denition 4. For instance, A (1111) = 0 B B B B B B B B B B @ + + + + + + 1 C C C C C C C C C C A : 18 Denition 5. Equivalence of templates. For partitions ; with the same number of parts, we say ! i9v2 V ;w2 V , such that v ?B = w ?B . Clearly, this ! is an equivalence relation on integer parti- tions. (Note, ! i9w2 V such that ?B = w ?B , that is, we need only apply rearrangement and sign ips to one of ;.) Example 2. Equivalence is more than just multiples. Trivially, scalar multiples of any partition are all equivalent to each other. But equivalence involves more. Let = 321; = 211. Then 321 ! 211 since ?B = ?B = f + ; + + g ; with no need to apply rearrangements or sign ips. Rearrangement and sign ips may change the Bernoulli complement. For instance, V = f(2; 1; 1); (1; 2; 1); (1; 1; 2); (2; 1;1); (1; 2;1); (1; 1;2); (2;1; 1); (1;2; 1); (1;1; 2); (2;1;1); (1;2;1); (1;1;2)g: and with v = (1;2;1)2V , we have v ?B = f + + ; + g 6= ?B : 19 Example 3. Rearrangements are needed in the denition of equivalence. 3 The partitions = 9 7 4 4 3 1 = 7 5 5 4 4 3 are such that ?B 6= ?B , but for v = (9; 7; 3; 4; 4; 1)2V , v ?B = ?B , hence !. Denition 6. Reduction of templates. For any partitions ; with having m parts and having k parts, mk > 0, we say that ! (read reduces to or implies ) i either k =m and9v2V ; ?B v ?B ; (3.7) or else 9If1;:::;mg; v2V ; Proj I ?B v ?B : (3.8) Clearly, the relation ! is transitive. Our use of the subset symbol includes equality. We note that (! and!) i !, so that denitions 5 and 6 are compatible. Remark 2. Denition 6 is set up so that it is obvious that if!, andw2V (n) , and M is an n by n Bernoulli matrix with wM = 0, then there exists v2V (n) with vM = 0. Denition 7. Strict reduction. We dene a relation of strict reduction, 6 ! (read strictly reduces to ) i 3 This example was found by considering partitions of the form (a +x1;b +x1;b;b;ax1;bx1) and (a +x2;ax2;b +x2;b;b;bx2), whereab;x1;x2 are chosen so that the two b's must cancel, but are in a dierent monotonic order in each partition. Here we have taken a = 6;b = 4;x1 = 3;x2 = 1. 20 ! and not ! . Hence, 6 ! i (3.7) with proper subset containment of the Bernoulli complements, or (3.8) holds. Clearly, the relation6 ! is transitive and irre exive. Example 4. Strict reduction using (3.7). := 3322116 ! := 221111. A (332211) = 0 B B B B B B B B B B B B B B B B B B B @ + + + + + + + + + + + + + + 1 C C C C C C C C C C C C C C C C C C C A 0 B B B B B B B B B B B B B B B B B B B @ + + + + + + + + + + + + + + + + + + + + 1 C C C C C C C C C C C C C C C C C C C A =A (221111) Upon visual inspection, it is easily seen that each column of 332211 ?B appears as a column in 221111 ?B . Example 5. Strict reduction using (3.8). The partition 211 reduces to the partition 11. A (211) = 0 B B B B B B @ + 1 C C C C C C A ; A (11) = 0 B B @ + 1 C C A : 21 Take I =f1; 2g, so that projection onto I \forgets" the third coordinate in 211 ?B . We then have Proj I 211 ?B = 11 ?B , and 2116 ! 11. Example 6. The consequence of not implying 11. If does not imply 11, then every two rows of A () are linearly independent. Thus, for every i6=j, both i + j and i j are expressible as a plus-minus combination of the remaining parts. Proposition 5 uses this in order to prove that 21111 is the only novel partition of size 5. There is a natural description of this principle in terms of coin weighing problems (see for example [1]). You have k coins of various positive integer weights. Not implying 11 means that if an adversary selects any two coins and places them on the same or opposite sides of a balance scale, you can place all of the remaining coins on the scale so that it balances. We now come to the denition that eectively governs explicit expansions such as those in Conjecture 1 and Theorem 1. Denition 8. Novel partitions. We call an integer partition a novel partition if and only if there does not exist any other partition 0 with 6 ! 0 , and among all partitions equivalent to , in the sense of Denition 5, is lexicographically rst. Theorem 2 (Suciency of the set of novel partitions). The set of all novel partitions is sucient, acting as possible left null vectors, to detect singularity for Bernoulli matrices M. That is, if such a matrix is singular, say of size n by n, then there exists a novel partition with len()n, and v2V (n) with vM = 0. 22 Proof. If M is singular, then there is a nonzero vector w2 Z n with wM = 0. Taking absolute values of the coordinates, deleting zeros if they occur, and listing in nonincreasing order yields an integer partition , and w2V (n) . If is novel, we are done. If is not novel, then it must reduce to a novel partition , and then Remark 2 applies. Theorem 3. Intrinsic characterization of novel partitions. An integer partition with k parts is novel i the matrix A () , specied in Denition 4, has rank k 1, and gcd() = 1. Proof. Let A A () , with rows r 1 ;:::;r k . To prove the only if direction, suppose rank A < k 1. Then there exists j < k, and integers c 1 ;:::;c j 6= 0, 1 ;:::; j distinct elements off1;:::;kg, such that c 1 r 1 +:::c j r j = 0: Letting v = (c 1 ;:::;c j ), we have v2V , len() =j <k, and 6 !, so that is not novel. If gcd() > 1, then = 1 gcd() , ! , is earlier in lexicographic order, so is not novel. In the other direction, suppose rankA =k1, gcd() = 1, but assume is not novel. Then either 1. 6 !, len()< len(), but then there existsv2V (k) , withk1 or fewer nonzero components such that vA = 0, which implies rank A<k 1; 2. 6 !, len() = len(), ?B ( ?B . Then A () and A () both have rank k 1, withA () = 0 andA () = 0. By the inclusion, we also have A () = 0. But then if we consider the vectorv = 1 1 of lengthk with rst coordinate 0,v has at most k 1 nonzero entries, and vA () = 0. Since we assumed A () has rank k 1, we conclude v = 0. 23 Corollary 1. An integer partition is either 1. novel (or a multiple of a novel), 2. implies a novel partition of strictly smaller length, or 3. is not fairly divisible, i.e., ?B =;. The only part that is not trivial is (2). We already showed that partitions like 332211 can strictly reduce to a partition of the same length, but without Theorem 3 it is not a priori obvious that there will always be a strict reduction in the length of the partition. Theorem 4 (Minimality of the set of novel partitions). The family of novel partitions is minimal, in the sense that if any single one of the s in that set is removed, then the family, acting as possible left null vectors, does not detect all singularities. Proof. For any v 2 V , let 2p =jv ?B j; we can write v ?B =fx 1 ;x 2 ;:::;x 2p g, where x i 2f1; 1g k ,x i 6=x j ,i6=j. As in Example 1, let AA () denote thek byp matrix of rank k 1 with columns given by x 1 ;x 2 ;:::;x p , the vectors with rst entry positive. 1. If p < k, then add kp columns which are duplicates of column p, and call this square matrix M. 2. Else, ifp>k, since the rank ofA isk1, the rstk1 rows, denotedr 1 ;r 2 ;:::;r k1 , form an independent set in R p . The existence of an independent set of p vectors inR p , whose entries consist of plus and minus 1, is guaranteed by the existence of nonsingular Bernoulli matrices of all sizes; denote such a set asfs 1 ;:::;s p g. By the 24 basis extension theorem, re-indexing the s i as needed, there is an independent set of the formfr 1 ;r 2 ;:::;r k1 ;s k ;s k+1 ;:::;s p g. Replacing s k with r k , the rows r 1 ;r 2 ;:::;r k1 ;r k ;s k+1 ;:::;s p form a p by p Bernoulli matrix M with rank p 1. 3. Else, if p =k, then set M =A. M is thus a singular square matrix with entries either plus or minus 1. Denote the dimension of M as n by n; it has rank n 1, and if w2 V (n) for some novel and wM = 0, then w ?B v ?B , which implies w =v. Remark 3. In Theorem 4, it was necessary to specify testing for null vectors on one particular side, since in the case n = 4 any instance of singularity caused by 1111 in one direction carries with it a 11 in the other direction. In this case one could say that 1111 is not really necessarily to detect singularity. We have just dened and characterized novel partitions, which form the foundation for the expansion in Theorem 1. The next set of theorems bounds the exponential decay from each term. Proposition 1 (Erd} os, Littlewood, Oord [16]). Letx 1 ;x 2 ;::: be real numbers,jx i j 1, and 1 ; 2 ;::: be +1 or1. Then the number of sums of the form P k i=1 x i i which fall into an arbitrary open interval I of length 2 does not exceed k bk=2c . TakingI = (1; 1), an immediate consequence is that for any integer partition with k parts, 2 k r =j ?B j k b k 2 c ; (3.9) 25 and in casek = 2m is even, the novel partition = 1 2m achieves equality with this upper bound. A related theorem of Erd} os [16] expands Proposition 1 by widening the target interval. Proposition 2 (Erd} os [16]). Letr be any integer, the x i real,jx i j 1. Then the number of sums P k i=1 i x i which fall into the interior of any interval of length 2r is not greater than the sum of the r greatest binomial coecients belonging to k. This proposition was proved by showing that the size of the union of r disjoint an- tichains inf1; 1g k is at most the sum of the r largest binomial coecients for k; see [49], Proposition 7.7, and [9], Section 3, Exercise 7. As a corollary of this, we get Theorem 5. Suppose is an integer partition with k parts, not all equal. Then 2 k r = j ?B j is at most the sum of the largest four binomial coecients of k 2. Hence, for k 2 and even, = 1 k hasj ?B j>j ?B j for any partition with k parts, not all equal, and for k 5 and odd, = 2 1 k1 hasj ?B jj ?B j, for any partition with k parts. Proof. Fixi;j such that i 6= j . Partition the set ?B into four (possibly empty) subsets A = fx2 ?B :x i = 1;x j = 1g; B = fx2 ?B :x i =1;x j =1g; C = fx2 ?B :x i = 1;x j =1g; D = fx2 ?B :x i =1;x j = 1g: 26 These are disjoint antichains, since they specify four distinct target values for the sums P 0 x ` ` , where the sum is over the k 2 indices other than i;j, and each ` is strictly positive. Projecting out the two coordinates indexed by i andj, we get 4 disjoint antichains inf1; +1g k2 . Conjecture 3. Runners-up in Erd} os-Littlewood-Oord. For all partitions with exactlyk parts, with greatest common divisor 1, if k 4 is even, the second largest probability r is achieved, uniquely, by 2 2 1 k2 , while if k 7 is odd, the largest probability is achieved by 2 1 k1 (already proved, as part of Proposition 5), and the second largest is achieved, uniquely, by 2 3 1 k3 . We note that for k 5 odd, it is trivial to check that 2 1 k1 strictly beats 2 3 1 k3 , and with k = 5,j22211 ?B j =j32111 ?B j = 6. Proposition 3. There are no novel partitions of size three. Proof. By Theorem 3, any novel partition with three parts must have rank A () = 2. Since 111 is not a valid template, the parts in are not all equal, and so by Theorem 5, j ?B j 2, which means that A () has at most 1 column, and hence rank at most 1. Proposition 4. The only novel partition of size 4 is 1111. Proof. By Theorem 3, any novel partition with four parts must have rank A () = 3. By Theorem 5, any novel partition with four parts, not all equal, hasj ?B j 4, which means that A () has at most 2 columns, and hence rank at most 2. If all parts of the partition are equal, then the requirement of gcd() equal 1 forces = 1111. This is indeed novel, with A (1111) given in Example 1. 27 Proposition 5. The only novel partition of size 5 is 21111. Proof. Without loss of generality, assume = (a;b;c;d;e), where abcde> 0. As described in Example 6, in order to avoid implying 11, every pair of parts in , when added or subtracted, must be a signed combination of the others, e.g., a +b = cde b +c = ade: Let us look at the rst equation. If any of the signs are negative, then monotonicity is necessarily broken. Thus, any novel partition of length ve must have a +b =c +d +e. Similarly, we can look atb+c =ade, and by a monotonicity argument we conclude that the only viable form is b +c = ade. We will look at each of these four cases separately. 1. a +b = c +d +e b +c = a +d +e can be rened (by adding or subtracting one from the other) to b = d +e and a = c. By monotonicity this means that a = b = c, and hence our partition would be of the form (d +e;d +e;d +e;d;e). However, we must have a solution to de =(d +e) (d +e) (d +e), which would imply that e = 0. 28 2. a +b = c +d +e b +c = a +de can be rened similarly to b =d and a =c +e, which yields partitions of the form (c +e;c;c;c;e). We must have a solution to c + 2e =ccc, which, to avoid implying e 0, implies e =c, and our template reduces to a multiple of 21111. 3. a +b = c +d +e b +c = ad +e can be rened to b =e, a =c +d, hence a multiple of 21111. 4. a +b = c +d +e b +c = ade forces b = 0. 29 len() r 256r f1;1g 2 1=2 128: f1;1;1;1g 4 3=8 96: f1;1;1;1;1;1g 6 5=16 80: f1;1;1;1;1;1;1;1g 8 35=128 70: f2;1;1;1;1g 5 1=4 64: f1;1;1;1;1;1;1;1;1;1g 10 63=256 63: f2;1;1;1;1;1;1g 7 15=64 60: f1;1;1;1;1;1;1;1;1;1;1;1g 12 231=1024 57:75 f2;2;1;1;1;1g 6 7=32 56: f2;1;1;1;1;1;1;1;1g 9 7=32 56: f1;1;1;1;1;1;1;1;1;1;1;1;1;1g 14 429=2048 53:625 f2;1;1;1;1;1;1;1;1;1;1g 11 105=512 52:5 f2;2;1;1;1;1;1;1g 8 13=64 52: f1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 16 6435=32768 50:2734 f2;1;1;1;1;1;1;1;1;1;1;1;1g 13 99=512 49:5 f2;2;1;1;1;1;1;1;1;1g 10 49=256 49: f2;2;2;1;1;1;1g 7 3=16 48: f1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 18 12155=65536 47:4805 f2;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 15 3003=16384 46:9219 f2;2;1;1;1;1;1;1;1;1;1;1g 12 93=512 46:5 f2;2;2;1;1;1;1;1;1g 9 23=128 46: f1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 20 46189=262144 45:1064 f2;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 17 715=4096 44:6875 f2;2;1;1;1;1;1;1;1;1;1;1;1;1g 14 1419=8192 44:3438 f3;2;1;1;1;1;1g 7 11=64 44: f2;2;2;2;1;1;1;1g 8 11=64 44: f2;2;2;1;1;1;1;1;1;1;1g 11 11=64 44: f1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 22 88179=524288 43:0562 f2;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 19 21879=131072 42:7324 f2;2;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 16 2717=16384 42:4531 f2;2;2;1;1;1;1;1;1;1;1;1;1g 13 675=4096 42:1875 f3;1;1;1;1;1;1;1g 8 21=128 42: f3;3;1;1;1;1;1;1g 8 21=128 42: f3;2;1;1;1;1;1;1;1g 9 21=128 42: f3;1;1;1;1;1;1;1;1;1g 10 21=128 42: f2;2;2;2;1;1;1;1;1;1g 10 21=128 42: f1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 24 676039=4194304 41:2621 f2;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 21 20995=131072 41:0059 f2;2;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 18 10439=65536 40:7773 f3;2;1;1;1;1;1;1;1;1;1g 11 81=512 40:5 f2;2;2;2;1;1;1;1;1;1;1;1g 12 323=2048 40:375 f3;1;1;1;1;1;1;1;1;1;1;1;1;1g 14 1287=8192 40:2188 f3;1;1;1;1;1g 6 5=32 40: f3;2;2;1;1;1g 6 5=32 40: f3;2;2;2;1;1;1g 7 5=32 40: f3;2;2;1;1;1;1;1g 8 5=32 40: f2;2;2;2;2;1;1;1;1g 9 5=32 40: f1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 26 1300075=8388608 39:6751 f2;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 23 323323=2097152 39:4681 f2;2;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 20 20111=131072 39:2793 f3;2;1;1;1;1;1;1;1;1;1;1;1g 13 627=4096 39:1875 f3;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 16 5005=32768 39:1016 f3;2;2;1;1;1;1;1;1;1g 10 39=256 39: f3;3;1;1;1;1;1;1;1;1g 10 39=256 39: f2;2;2;2;1;1;1;1;1;1;1;1;1;1g 14 623=4096 38:9375 f2;2;2;2;2;1;1;1;1;1;1g 11 155=1024 38:75 f1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 28 5014575=33554432 38:2582 f2;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1g 25 156009=1048576 38:0881 f3;2;2;2;1;1;1;1;1g 9 19=128 38: Table 3.1: Novel partitions sorted by r . Conjectured to be complete with respect to r 38=256. 30 Proposition 6. The only novel partitions of length 6 are 111111, 221111, 311111, 322111. The only novel partitions of length 7 are 2 1 1 1 1 1 1 2 2 2 1 1 1 1 3 2 1 1 1 1 1 3 2 2 2 1 1 1 3 3 2 1 1 1 1 3 3 2 2 2 1 1 4 1 1 1 1 1 1 4 2 2 1 1 1 1 4 3 2 2 1 1 1 4 3 3 1 1 1 1 4 3 3 2 2 1 1 5 2 2 2 1 1 1 5 3 3 2 1 1 1 5 4 3 2 2 1 1 Proof. The same technique that was used in Proposition 5 can be continued for novel partitions of length 6, 7, etc., eliminating cases that imply 11. Mathematica [35] code was written to list all cases and reduce them. For the sake of economy in running time, we only considered the requirement that all four of 1 2 and k1 k be expressible plus-minus combination of the other k 2 parts. When the reduction yields a space of dimension greater than one, the result may be viewed as what we call a meta-template, 31 e.g., (a +b;a +b;b;b;a;a). The list of meta-templates includes all novel partitions and possibly others that are not novel. For k = 6, the following candidates were returned: 111111; 221111; 311111; 322111; 332211; 433211; 533221: We showed in Example 4 that 3322116 ! 221111, and the ranks ofA (433211) andA (533221) are 4, whereas the others have rank 5. Fork = 7, a list of 14 templates was found; all of which turned out to be novel. Also, for k = 7, 12 meta-templates were found; by hand inspection, 11 were easily shown to violate the monotonicity requirement that 1 2 . The remaining meta-template, (a;b;ab;d;e;h;hde) is seen, by hand, to imply 11, since after the initial renement to the form above one can apply the same technique again to the smallest two parts and reduce each case to either a monotonicity or positivity violation. Lemma 3. In order of decreasing r , the rst eight novel partitions are 11; 1111; 1 6 ; 1 8 ; 21111; 1 10 ; 2 1 6 ; 1 12 . For novel partitions other than these eight, writing k for the number of parts k = 6; 7 : r 60 256 ; k = 8; 9 : r 56 256 ; k = 10; 11 : r 52:5 256 ; not 1 14 ; 1 16 ; and k 12 : r 49:5 256 : 32 j ?B j j ?B j j ?B j 11111111 70 54433221 22 65522211 16 22111111 52 53332211 22 65533211 16 22221111 44 65322211 22 65544332 16 33111111 42 64322111 22 76433221 16 31111111 42 64432111 22 76533211 16 32211111 40 63222111 22 76543221 16 33221111 36 55422211 20 76544321 16 32222111 36 55433211 20 75543211 16 33222211 34 65332111 20 87433221 16 43211111 32 65432211 20 86433211 16 42111111 32 64331111 20 86543211 16 42221111 32 64332211 20 85432211 16 33311111 30 63321111 20 85542211 16 33322111 30 63322211 20 84332211 16 44221111 30 44333111 18 55443331 14 43222111 30 55333111 18 54411111 14 43321111 30 55443221 18 53332222 14 43322211 28 54433111 18 51111111 14 33322221 26 54433322 18 76433111 14 44322111 26 65332221 18 76522211 14 44332211 26 65422111 18 76544211 14 43332111 26 65433111 18 76554331 14 43332221 26 65433221 18 75443322 14 54222111 26 65443211 18 75522111 14 53221111 26 65443321 18 74333222 14 53322111 26 65543221 18 74431111 14 53331111 26 64421111 18 73331111 14 52222111 26 64433211 18 72222111 14 54321111 24 62221111 18 87533211 14 54322211 24 76332221 18 87543221 14 54332111 24 76432211 18 87654321 14 53222211 24 75332211 18 86533111 14 53311111 24 75432111 18 85532111 14 52211111 24 75433211 18 83332111 14 44311111 22 75442211 18 98543221 14 44333221 22 75533111 18 97543211 14 55322111 22 74322211 18 97644211 14 54332221 22 74422111 18 96542211 14 54333211 22 74432211 18 95532211 14 54422111 22 73322111 18 94432211 14 54432211 22 73332211 18 Table 3.2: Novel partitions of length 8, conjectured to be the complete list. 33 Hence, aside from the rst eight novel partitions, all other novel partitions have r 56=256. Observe that = 1 14 hasr = 53:625=256 and = 1 16 hasr = 50:2734375=256. Proof. This follows immediately from Theorem 5. Conjecture 4. In order of decreasing r , the novel partitions with r 38=256 are precisely those given in Table 3.3. Example 7. The shortest novel arithmetic progression. The partition = (8; 7; 6; 5; 4; 3; 2; 1) is novel. It hasj ?B j = 14, so A () is an 8 by 7 matrix of rank 7. Examination of the 21 = 1 + 0 + 1 + 1 + 4 + 14 novel partitions of lengths 2,3,4,5,6,7 in Propositions 4 { 6 shows that this is the shortest novel partition which is also an arithmetic progression. Conjecture 5. There are exactly 122 novel partitions of length k = 8. The list of 122 is given in Table 3.2. Our evidence in favor of this conjecture is that these 122, and no others, were found by a random survey, using Mathematica, of 420 million singular n by n matrices M, for n = 8. Of course, this is not a proof. For an exhaustive search, to guarantee that all novel partitions of length 8 have been found, one might observe that, with respect to the integer partitions underlying potential right and left null vectors, M can be taken to have rst row and rst column all +1, so that it would suce to examine 2 49 matrices M. Remark 4. The Mathematica command NullSpace applied to a singular n by n Bernoulli matrix M returns a list of length n vectors that forms a basis for the null space of M. Aside from the sign requirement in the rst nonzero entry, these vectors have 34 always been of the form v2 V (n) for some novel partition . One would like to prove a result about this, but since the basis returned by a generic null space algorithm is not unique, and hence implementation dependent, we will not pursue this idea further. 3.4 Polynomial coecients arising from inclusion-exclusion For eventsfA g 2I , for a nite index set I, and A =[ 2I A , the inclusion-exclusion formula states that P(A) = X 2I P(A ) X f;gI;6= P(A \A ) + X P(A \A \A ) (3.10) + + (1) jIj1 P(\ 2I A ): With W = P 2I 1(A ), a sum of indicators of the events, the formula above may be expressed as P(A) =EWE W 2 +E W 3 + + (1) jIj1 E W jIj : The Bonferroni inequalities state that for eventsfA g 2I , for a nite index setI, and A =[ 2I A , P(A) X 2I P(A ) P(A) X 2I P(A ) X f;gI;6= P(A \A ) (3.11) : 35 Equation 3.11 is a lowerbound, with the ::: representing higher order bounds. A variation of (3.11), with B =[ 2I 0A , P(AnB) X 2I P(A ) X f;gI;6= P(A \A ) X 2I;2I 0 P(A \A ); (3.12) is proved similarly. For each integer partition = ( 1 ; 2 ;:::; k ), let D denote the event that there exists some set of indices i 1 ;i 2 ;:::;i k , all distinct, such that 1 e i 1 2 e i 2 ::: k e i k is a null vector. Similarly, let R andL denote the sub event of D that the null vector is a right or left null vector. For exampleD 11 is the event that for somei6=j,e i +e j ore i e j is a left or right null vector. We take I = [n] 2 f; +gfL;Rg, so that 2I species a set of two distinct indices along with sign and direction bits. The event A corresponds to the occurrence of a null vector of the form . For example, = (f2; 5g;;R)2I, and A is the event that e 2 e 5 is a right null vector. 36 Proposition 7. For W = P 2I 1(A ) with I and the A as above, so that D 11 =fW > 0g, EW = 4 n 2 1 2 n ; E W 2 = 12 n 2 2 4 n 2 ! 1 4 n ; E W 3 = 2 2 n 3 1 4 n + + 2 33n 13 3 n 2 3 4 n 2 2 2 3 n 2 1 3 n 3 3 2 3 n 2 n 2 2 ! = 4 n 3 1 4 n +O(n 6 2 3n ): Proof. Let t = 2 n 2 2 n . Clearly, EW = X 2I P (e i =e j ) =jIj2 n = 4 n 2 2 n = 2t: Let I R I denote the subsets of I with last coordinate R, with I L dened similarly. Then E W 2 = 2 X ;2I R EI I + X 2I R ;2I L EI I =:G 1 +G 2 : The rst sum covers two null vectors on the same side, the second sum when they are on opposite sides. Focusing just on the sums, we have G 1 = t 2 2 2 n 2 2 2n =t 2 t2 1n ; (3.13) G 2 = 2 n 2 2 n 2 n 2 2 n+2 1 2 = 2t 2 : (3.14) 37 The rst equation (3.13) is found by considering all pairs on one side, and then taking away all pairs that share two rows along with any plus or minus combination. The second equation (3.14) is found by conditioning on one side rst, say 2I R , and then the other side picks up a boost from the conditioning on the values of four elements. Finally, we have 6E W 3 = X (;; ) 1(;; distinct)EI I I =:F 1 +F 2 Here F 1 denotes the sum over all events where ;; appear on the same side, and F 2 denotes the sum over events where two appear on one side, and one on the other. By choosing a side (left or right), we have for some t 2 , t 3 , t 4 functions of n, F 1 2 =t 3 t 2 t 3 t 4 + 2 2 3 2 n 3 1 2 2n : Thet 3 considers all triplets, and thet i considers the triplets that are supported oni rows, i = 2; 3; 4, that need to be excepted. We have t 2 = n 2 2 3 2 3n ; which chooses any two rows and all sign combinations. When three rows are supported, there are precisely 2 3 3 1 3 combinations, but events of the formfe i e j , e i e k , e j e k are null vectorsg are sometimes valid. When they are valid, they have a probability of (1=2) 2n , hence excepting all events involving three rows we have 38 t 3 = n 3 2 3 3 1 3 2 3n ; and the term at the end adds back in the valid combinations supporting three rows. These events are of the formfe i e j and e i e k are null vectorsg, which imply one of e j +e k ore j e k is a null vector as well. Finally, when four rows are supported, the exceptional cases are those in which two of ;; share two rows, and one does not share any, thus t 4 = 3 1 n 2 n 2 2 2 3 2 3n : For F 2 , there are two choices for which side the solo index appears, and then three choices for which of ;; is this solo index. We have F 2 6 = 4t 3 n 2 2 2 n 2 2 2 23n : The factor of four comes fromEI I I = 4EI EI EI , which is a boost from conditioning on an opposite side. The exceptional cases are those where the support of the non-solo pair lie on the same two rows, and includes all sign combinations. The solo index can be anything, and gets a conditional boost from being on the other side. 39 Proposition 8. Recall that R (and L , respectively) denotes the event that there is a right (left) null vector of template . We have P(R 11 nL 11 ) 2 n 2 1 2 n 12 n 2 2 4 n 2 ! 1 4 n P(R 11 nL 11 ) = P (R 11 ) 8 n 2 2 1 4 n +O(n 6 2 3n ): Proof. The expansion follows along the same reasoning as Proposition 7 using (3.12); in particular, with t = 2 n 2 2 n as before, we have P(R 11 nL 11 )tG 1 G 2 : A similar analysis can be undertaken for D 1111 ; we omit the details. Lemma 4. P (D 11 ) 4 n 2 1 2 n 12 n 2 2 4 n 2 ! 1 4 n : P (D 11 ) = 4 n 2 1 2 n 12 n 2 2 4 n 2 4 n 3 ! 1 4 n +O(n 6 2 3n ); P(D 1111 ) = 2 4 n 4 (3=8) n +O(n 5 (3=16) n ); P(D 1 6) = 2 6 n 6 5 16 n +O n 7 5 32 n ; 40 P(D 1 8) = 2 8 n 8 35 128 n +O n 9 35 256 n ; P(D 21111 ) = 2 5 n 5 5 1 1 4 n +O n 6 1 8 n ; P(D 1 10) = 2 10 n 10 63 256 n +O n 11 63 512 n ; P(D 21 6) = 2 7 n 7 7 1 60 256 n +O n 8 60 512 n ; P(D 1 12) = 2 12 n 12 231 1024 n +O n 13 231 2048 n : Proof. The rst two equations follow from Proposition 7. The rest are proved similarly, but we omit the details. Next we move to probabilitiesP(D \D ) for various choices of6=. Observe that the events involved can be highly positively correlated; for example, with = 11; = 1111 we haveP(D \D )=(P(D )P(D )) grows exponentially fast, as (4=3) n . Proposition 9. For two distinct novel partitions ;, havingj andk parts, respectively, P(D \D )O n k+j (max(r ;r )=2) n : The implicit constant in the big O varies with the choice of ;. 41 Proof. Consider the event R \R = S fvM = wM = 0g, where the union is over v2V (n) ;w2V (n) . The crucial ingredient is to show, with the notation of (3.5), that P(vX =wX = 0) max(r ;r )=2: (3.15) Without loss of generality, assume that the nonzero components of v are indexed by J, sojJj = j, and the nonzero components of w are indexed by K, withjKj = k. With I =J[K having sizem =jIj, the event of interest is based on m independent fair coins i ; i2I, and can be expressed as ( X i2J v i i = 0 ) \ ( X i2K w i i = 0 ) : (3.16) Case 1: J 6= K. Without loss of generality, interchanging the and if needed, I =f1; 2;:::;mg and m2 KnJ. Condition on the values of the rst m 1 coins, with a conguration that satises P j2J v j = 0. These congurations belong to the event vM = 0, and hence have probabilities summing to at most r . Each conguration, together with the requirement wM = 0, dictates the value needed for m , which occurs with conditional probability 1=2. The possible exchange of ; at the start means that we have shown (3.15). 42 Case 2: J = K. Without loss of generality, rearranging the coordinates, and taking scalar multiples if needed, we can haveJ =K =f1; 2;:::;kg anda :=v k =w k 6= 0. The event in (3.16) simplies to ( a k = k1 X i=1 v i i = k1 X i=1 w i i ) : From this we conclude r =P(vX = 0) =P k1 X i=1 v i i 2fag ! 2P k1 X i=1 v i i = k1 X i=1 w i i 2fag ! ; inequality arises since the second sum, with weights w i , might not even be infag, and the factor of 2 arises since when it is in the set, it dictates the choice of sign. For the case where the potential null vectors are used on opposite sides, e.g., L \R , we have P(L \R ) =O(P(L )P(R ) ); for the simple reason that conditioning of events of the formMw = 0 withw2V (n) only aects k of the columns, giving the bound above, with constant (1=r ) k as the implicit constant for the big O. Proposition 9 involves a bound that can have exponential decay as large as (1=4) n . For the sake of proving Theorem 1, with error term involving (7=32) n , we need a stronger bound, as given below. 43 Lemma 5. For all novel partitions ;, having j and k parts, respectively, with 6=, and neither partition equal to the partition 11, we have, P(D 11 \D 1111 ) = 2 3 n 4 1 4 n +O n 5 3 16 n ; (3.17) P(D 11 \D ) = O n j+2 3 16 n ; (3.18) P(D \D ) = O n k+j 3 16 n : (3.19) Proof. Equation (3.17) can be computed directly using inclusion exclusion, whereas Equations (3.18) and (3.19) use Proposition 9 with = 1111 since it is the most likely partition after 11. Finally, we note a trivial lemma to simplify the coecient for 4 n in the expansion of Theorem 1, Lemma 6. 1 2 n 2 2 n 2 ! = 3 n 4 + 3 n 3 : (3.20) Proof. Either simplify algebraically or note that the left hand side is the number of ways to choose any two unordered distinct pairs of unordered distinct pairs of n objects. The right hand side counts the number of ways to select these pairs where all four indices are distinct and can be placed in 3 distinct congurations, and the second term counts the number of pairs that share a common index, of which there are 3 choices for the repeated index. 44 3.5 Interaction of left and right null vectors Proposition 8 gives a lower bound onP(R 11 nL 11 ) =P(L 11 nR 11 ) which has, as a corollary, P(SnL 11 )P(R 11 nL 11 )P(R 11 ) =P(L 11 ); and omitting the middle terms, and writing a n & c n to mean that there exists b n with a n b n and b n c n , we have P(SnL 11 )&P(L 11 ): (3.21) Expressing (3.21) in terms of left null vectors, with the outer union on the left taken over all novel partitions of length less than or equal to n, other than 11, P 0 B @ [ 6=11 [ v2V (n) fvM = 0g 1 C A&P 0 B @ [ v2V (n) 11 fvM = 0g 1 C A: Writing this withL for the event thatM has a left null vector of template , the above display can be rewritten as X 6=11 P(L )&P(L 11 ): We believe that to prove sharp upper bounds on P n , say as given by (3.1) or (3.3), it will be necessary to consider the eect of conditioning on D c 11 . Propositions 10 and 11 might be a rst step in this direction. 45 Proposition 10. Suppose that is a novel partition of length k, with k = n. Let 2p =j ?B j. Recall that R 11 is the event that our n by n matrix M has a right null vector of the form e i e j . For every v2V , P(vM = 0jR c 11 ) P(vM = 0) = (p) n p n : Proof. The hypothesis k = n is essential: if x denotes a column of M, then, thanks to k = n, we know that x2 v ?B . There are p choices for the \direction"fx;xg with x2v ?B , and dierent columns ofM must choose dierent directions, otherwise the event R 11 would occur. By giving the ratio of the conditional probability to the unconditional probability, factors of 2, for choosing between x andx, for each column, cancel. Proposition 11. Suppose that is a novel partition of length k, with k = n 1. Let 2p =j ?B j. Recall that R 1111 is the event that our n by n matrix M has a right null vector of the form e j 1 e j 2 e j 3 e j 4 . For every v2V (n) , as specied by Denition 2, P(vM = 0j(R 11 [R 1111 ) c ) P(vM = 0) = (p) n p n + 1 2p n 2 (p) n1 p n1 : Proof. The hypothesis k = n 1 is essential. Without loss of generality, assume that v n = 0, so v = (w; 0) with w2V . Let x = (y;s) denote a column of M, where y gives the rst n 1 coordinates, and s2f1; +1g. Then, thanks to k = n 1 and v n = 0, we know that y2 w ?B . There are p choices for the \direction"fy;yg | restricting to the rst n 1 coordinates, with y2 w ?B , and dierent columns of M must choose dierent directions, apart from possibly one pair of columns, where the columns in a pair 46 may share the underlying n 1 direction, but have opposite choices of s for their nth coordinate. (If three columns share the underlying n 1 direction, the event R 11 would occur; if two pairs of columns share, then event R 1111 would occur.) 47 Chapter 4 Probabilistic Divide-And-Conquer I, at any rate, am convinced that He does not throw dice. { Albert Einstein God not only plays dice, He also sometimes throws the dice where they cannot be seen. { Stephen Hawking 4.1 Introduction. 4.1.1 Exact simulation Exact simulation methods provide samples from a set of objects according to some given probability distribution. For many combinatorial problems, the given distribution is the uniform choice over all possibilities of a given size. 48 An important technique from von Neumann 1951, [50], which we review in Section 4.2.2, is acceptance/rejection sampling, where one samples from an easy-to-simulate dis- tribution related to the desired distribution, and then rejects some samples. The precise recipe is to accept samples with probability proportional to the ratio of the desired prob- ability to the probability under the easy-to-simulate distribution. The result is that upon acceptance, the sample is an exact sample from the desired distribution. Another important technique is Markov chain coupling from the past (CFTP), where one keeps track of coalescing Markov chains starting from some time in the past, and constructs a long enough past that all chains have coalesced by time zero [44]. This is now a very lively subject, with over 160 papers described at http://dimacs.rutgers.edu/dbwilson/exact.html/ Divide-and-conquer is a basic strategy for algorithms. Notable examples include the Cooley-Tukey fast Fourier transform, attributable to Gauss [23], and Karatsuba's fast multiplication algorithm [25], which surprised Kolmogorov, [26]. We note that these and other cases treated in textbooks on algorithms are deterministic. Randomized quicksort, see for example [12] Section 7.3, is the prototype of divide-and-conquer using randomness, but such algorithms can be thought of as a variation on the deterministic algorithm, applied to a permutation of the input data. In this chapter, we propose a new method for exact sampling: probabilistic divide- and-conquer; here, the subdivision of the original problem is inherently random. We prove, in a couple of general settings, that this method does achieve exact samples. Then we illustrate the use of this general method, with the target being to sample, for a given n, uniformly from the p n partitions of the integer n. The starting point is 49 Fristedt's construction, [17], with a random integer T of size around n, such that, for a random partition of T , the counts of parts of various sizes are mutually independent; we review this in Section 4.4.2. The problem then is how to simulate eciently, since the event T =n is a rare event, as in [11]. In order of ontogeny these methods express a partition as = (A;B), where 1. A is the (list of) large parts, say p n up ton,B is the small parts, size 1 up to p n. Mix-and-match may oer an additional speedup. 2. B is the number of parts of size 1, A is everything else. Hence the B side of the simulation is trivial, with no calls to a random number generator. Nevertheless, there is a large speedup, by a factor asymptotic to p n=c, where c == p 6. 3. In p(z) =d(z)p(z 2 ); with the classical p(z) = P n0 p n z n = Q i1 (1z i ) 1 , and d(z) = Q i1 (1 +z i ) to enumerate partitions with distinct parts, A corresponds to d(z). This method iterates beautifully, reducing the target, n, by a factor of approximately 4 per iteration, with an acceptance/rejection cost of only roughly 2 p 2, improved to p 2 in Section 4.4.5.1. We have run this on a personal computer, with n as large as 2 49 , and relative to the basic algorithm \waiting-to-get-lucky", analyzed in Section 4.4.2, this version of divide-and-conquer achieves roughly a billion-fold speedup. 4 4 The RandomPartition function in Mathematica R [35] appears to hit the wall at around n = 2 20 . 50 4.1.2 Probabilistic divide-and-conquer, motivated by mix-and-match For us, the random objects S whose simulation might benet from a divide-and-conquer approach are those that can be described as S = (A;B), where there is some ability to simulate A and B separately. Specically, we require that A2A, B2B, and that there is a function h :AB!f0; 1g, so that for a2A;b2B, h(a;b) is the indicator that \a and b t together to form an acceptable s." Furthermore, we require that A and B be independent, and that the desiredS be equal in distribution to ( (A;B)jh(A;B) = 1). This description, independent objects conditional on some event, may seem restrictive, but [7, 15] shows that very broad classes | combinatorial assemblies, multisets, and selections | t into this setup. Now imagine that one wants an honest sample of size m, that is, S 1 ;S 2 ;:::;S m , to be independent, with the original S along with S 1 ;S 2 ;::: to be identically distributed. The pedestrian approach is to propose sample values A 1 ;A 2 ;:::, and B 1 ;B 2 ;:::, and to consider the indicators of aligned matching, that is, h(A 1 ;B 1 );h(A 2 ;B 2 );:::. One naturally has waiting times T 1 ;T 2 ;:::, withT 1 := minfi 1 :h(A i ;B i ) = 1g, and for fork> 1,T k := minfi>T k1 :h(A i ;B i ) = 1g. And of course, the `th sample found is S ` := (A T ` ;B T ` ): In the case where k = T m is large, the scheme just described seems wasteful | we proposedk values ofA2A, andk values ofB2B, and hence might havek 2 opportunities for a match. That is, rather than just look for the aligned potential matches, scored by the h(A i ;B i ) for 1ik, we might have considered the h(A i ;B j ) for 1i;jk, with k 2 index pairs (i;j). Indeed, this situation arises naturally in biological sequence matching, 51 [8, 6], where for two independent sequences of iid letters, in many but not all cases for the two marginal distributions, unrestricted rather than aligned matching eectively squares the number of ways to look for a match, and hence approximately doubles the length of the longest match found. Of course, the diculty with the general program \search allk 2 index pairsij" is that con icting matches might be found. Suppose, for example, that exactly two matches are found, withA i matching bothB j andB j 0, withj6=j 0 . It is easy to see that taking both AB pairs ruins the iid nature of a sample. Also, though perhaps not as obvious, other strategies, such as suppressing both AB pairs, or taking only the pair indexed by the lexicographically rst of ij;ij 0 , or taking only one pair, based on an additional coin toss, introduce bias relative to the desired distribution for S. There is a way to allow mix-and-match, and still get an honest sample 5 ; details will be given in Section 4.3. The rst step is to use rejection sampling to produce a list of As, distributed as a sample from the distribution of As biased by the chance that they would match, if a single B were proposed. For us, it only became apparent later, that this use of rejection, even in the absence of mix-and-match, can be useful; Theorem 11 describes a surprising example. 5 under a condition on the structure of h, described by Lemma 8. 52 4.2 The basic lemma for exact sampling with divide-and- conquer We assume throughout that A2A; B2B have given distributions; (4.1) A;B are independent; (4.2) h :AB!f0; 1g (4.3) satises p :=Eh(A;B)2 (0; 1]; 6 where, of course, we also assume that h is measurable, and S2AB has distributionL(S) =L( (A;B)jh(A;B) = 1); (4.4) i.e., the law ofS is the law of the independent pair (A;B) conditional on havingh(A;B) = 1. Note that a restatement of (4.4), exploiting the hypothesis (4.3), is that for measurable sets RAB, P(S2R) = P((A;B)2R and h(A;B) = 1) p ; 6 The requirement thatp> 0 is not needed for divide-and-conquer to be useful; but rather, a choice we make for the sake of simpler exposition. In cases where p = 0, the conditional distribution, apparently specied by (4.4), needs further specication | this is known as Borel's paradox. 53 or equivalently, for bounded measurable functions g fromAB to the real numbers, Eg(S) =E (g(A;B)h(A;B))=p: Since we have assumed p> 0, this is elementary conditioning. This allows the distri- butions of A and B to be arbitrary: discrete, absolutely continuous, or otherwise. The following lemma is a straightforward application of Bayes' formula. Lemma 7. Suppose X is the random element ofA with distribution L(X) =L(Ajh(A;B) = 1 ); (4.5) and Y is the random element ofB with conditional distribution L(YjX =a) =L(Bjh(a;B) = 1 ): (4.6) Then (X;Y ) = d S, i.e. the pair (X;Y ) has the same distribution as S, given by (4.4). Proof. A restatement of (4.5) is P(X2da) = P(A2da)Eh(a;B) p ; (4.7) and relation (4.6) is equivalent to L((X;Y )jX =a) =L( (a;B)jh(a;B) = 1 ): (4.8) 54 Hence, for any bounded measurable g :AB!R, Eg(X;Y ) = E (E (g(X;Y )jX)) = Z A P(X2da) E (g(X;Y )jX =a) = Z A P(A2da)Eh(a;B) p E (g(a;B)h(a;B)) Eh(a;B) = 1 p Z A P(A2da)E (g(a;B)h(a;B)) = E (g(A;B)jh(A;B) = 1) =Eg(S): We used (4.8) for the middle line in the display above; on the set A 0 := fa 2 A : Eh(a;B) = 0g, which contributes 0 to the integral, we took the usual liberty of dividing by 0, rather than writing out separate expressions for the integrals overA 0 andAnA 0 . 4.2.1 Algorithmic implications of the basic lemma Assume that one wants a sample of xed size m from the distribution of S. That is, one wants to carry out a simulation that provides S 1 ;S 2 ;:::, with S 1 ;S 2 ;:::;S m being mutually independent, with each equal in distribution to S. According to Lemma 7, this can be done by providing m independent pairs (X i ;Y i ), i = 1 to m, each equal in distribution to S. A reasonable choice of how to carry this out, not yet using mix-and-match, involves the following: 55 Outline of an algorithm to gather sample of size m. Stage 1. Sample X 1 ;X 2 ;:::;X m from the distribution of X, i.e., from L(X) =L(Ajh(A;B) = 1 ). Stage 2. Conditional, on the result of stage 1 producing (X 1 ;:::;X m ) = (a 1 ;:::;a m ), nd Y 1 ;Y 2 ;:::;Y m , mutually independent, with distribu- tionsL(Y i ) =L(Bjh(a i ;B) = 1 ). Note that in general, conditional on the result of stage 1, the Y i in stage 2 are not identically distributed. Furthermore, the trials undertaken to nd these Y i need not be independent of each other. 4.2.2 Use of acceptance/rejection sampling Assume that we know how to simulateA | this is under the distribution in (4.1), where A;B are independent. But, we need instead to sample from an alternate distribution, denoted above as that of X2A. The rejection method recipe, for using (4.7), may be viewed as having 4 steps, as follows. 1. Find a threshold function t :A! [0; 1], with t(a) proportional toEh(a;B)=p, i.e., t of the form t(a) =CEh(a;B)=p for some positive constant C. 2. Propose iid samples A 1 ;A 2 ;:::, 3. Independently generate uniform (0,1) random variables U 1 ;U 2 ;:::, 4. If U i t(A i ), then accept A i as the X value; otherwise reject it. 56 The cost of using acceptance/rejection. Naturally, one wants to take the constant C for the threshold function t in step (1) to be as large as possible. This is subject to the constraint t(a) 1 for all a, i.e., CEh(a;B)=p 1. The expected fraction of proposed samples A i to be accepted will be the average of t(a) with respect to the distribution of A, i.e., p acc :=P(Ut(A)) =Et(A) =E C h(A;B) p =C; and the expected number of proposals needed to get each acceptance is the reciprocal of this, so we dene Acceptance cost := 1 p acc = 1 C : (4.9) Assuming that we can nd an a whereEh(a;B) achieves its maximum value, this sim- plies to Acceptance cost = 1 C = p Eh(a ;B) ; (4.10) For comparison, if we were not using divide-and-conquer, but instead proposing pairs (A i ;B i ) and hoping to get lucky, i.e., hoping that h(A i ;B i ) = 1, with success probability p and expected number of proposals to get one success equal to 1=p, then, ignoring the cost of proposing the B i , the speedup involved in (4.10) is a factor of 1=Eh(a ;B). Is the threshold functiont computable? In step (4), for each proposed valuea =A i , we need to be able to compute t(a); this can be either a minor cost, a major cost, or an absolute impediment, making probabilistic divide-and-conquer infeasible. All of these 57 variations occur, in the context of integer partitions, and will be discussed in Sections 4.4.3 { 4.4.5, and again in Section 4.5.2. 4.2.3 Caution: mix-and-match not yet enabled In Stage 2 of the 2-stage algorithm described at the beginning of Section 4.2.1, there is a subtle issue involved in the phrase \independently nd Y 1 ;Y 2 ;:::;Y m ". Suppose, for example, that one plans to propose iid copies of B, say B 1 ;B 2 ;:::, waiting for samples that will match one of the a 1 ;a 2 ;:::;a m obtained in the rst stage. A correct way to carry out stage 2 is to consider stopping times 1 1 < 2 << m , with 1 := minfj 1:h(a 1 ;B j ) = 1g; (4.11) and then recursively, for ` = 2 tom, ` := minfj `1 :h(a ` ;B j ) = 1g: Here, stage 2 is completed after proposing m copies ofB, and them samples ofS are (a ` ;B ` ) for` = 1 to m. In general, it would be incorrect to try to speed up stage 2 by taking stopping times 1 1 < 2 << m , with 1 := minfj 1:h(a i ;B j ) = 1 for some i2f1; 2;:::;mgg; (4.12) and then 2 := minfj 1 : h(a i ;B j ) = 1 for one of the m 1 values i not already matchedg, and so on. [In case the B at time 1 matches a i for more than one index i, we might choose, for example, to declare the smallest available i to serve as the index matched, for the sake of specifying whichm1 indices are available in the denition of 2 . 58 Other choices as to which i, even those involving auxiliary randomization, will have the same eect.] Say thatI(`) is the index matched at time ` . With the random permutation dened to be the inverse of the permutation I(), the sample consists of (a ` ;B (`) ) for ` = 1 tom. Not only there may be dependence in the sequenceB (1) ;B (2) ;:::;B (m) , it may also be the case 7 that B (`) fails to have the desired marginal distribution, i.e. that of B conditional on h(a ` ;B) = 1. (Perhaps this B was matched to a ` when a higher priority index i was available; the fact of not satisfying h(a i ;B) = 1 biases the distribution.) 4.3 Simple matching enables mix-and-match Lemma 7 was basic, and so is the following lemma, but the pair serves nicely to clarify the logical structure of what is needed to enable probabilistic divide-and-conquer, versus what is needed to further enable mix-and-match. Lemma 8. Given h :AB!f0; 1g, the following two conditions are equivalent: Condition 1:8a;a 0 2A;b;b 0 2B; 1 =h(a;b) =h(a 0 ;b) =h(a;b 0 ) implies h(a 0 ;b 0 ) = 1; (4.13) Condition 2:9C, and functions c A :A!C;c B :B!C, so that8(a;b)2AB; h(a;b) = 1(c A (a) =c B (b)): (4.14) 7 We thank Sheldon Ross for rst observing this. 59 We think ofC as a set of colors, so that condition (4.14) says that a and b match if and only if they have the same color. Proof. That (4.14) implies (4.13) is trivial. In the other direction, it is easy to check that (4.13) implies that the relation A onA given by a A a 0 i9b2B; 1 =h(a;b) = h(a 0 ;b) is an equivalence relation. Likewise for the relation B onB given by b B b 0 i 9a2A; 1 = h(a;b) = h(a;b 0 ). For the set of colors,C, we might take either the set of equivalence classes ofA modulo A , or the set of equivalence classes ofB modulo B , and then (4.13) also provides a bijection between these two sets of equivalence classes, to induce (4.14). Remark 1. The proof of Lemma 8, with equivalence classes ofA= A and B= B , shows that the pair of coloring functions satisfying (4.14) is essentially unique. Specically, unique apart from relabeling and padding, i.e., an arbitrary permutation on the names of the colors used, and enlarging the range,C, to an arbitrary superset of the image. Remark 2. The statement of Lemma 8 shows that coloring is not essentially an issue of sucient statistics. After all, hypothesis (4.13) only concerns the logical structure of the matching function h appearing in (4.3), and doesn't involve the distributions onA andB appearing in (4.1). 60 Remark 3. When (4.14) holds, we can write the event that A matches B as a union indexed by the color involved: fh(A;B) = 1g =[ k2C fc A (A) =k;c B (B) =kg; so that p = P k2C P(c A (A) = k;c B (B) = k), and we see that at most a countable set of colors k contribute a strictly positive amount to p. As a notational convenience, we take NC, and use positive integers k for the names of colors that have P(c A (A) =k;c B (B) =k)> 0: (4.15) [The fact that A;B are independent, hence P(c A (A) = k;c B (B) = k) = P(c A (A) = k)P(c B (B) = k), is irrelevant to main idea behind (4.15). However, a technique we refer to as `roaming x' in Section 4.4.3.2 is an example of how to take advantage of the independence of A and B.] The intent of the following lemma is to show that if h satises (4.14), then mix-and- match strategies can can be used in stage 2 of the broad outline of Section 4.2.1. Lemma 9. Assume that h satises (4.14). Consider a procedure which proposes a se- quence D 1 ;D 2 ;::: of elements ofB with the following properties: There is a sequence of algebrasF 0 F 1 F 2 . [We think ofF 0 as car- rying the information from stage 1 of an algorithm along the lines described in Section 4.2.1, carrying information such as \which demands a 1 ;a 2 ;::: must be met | or reduced information, such as the colors c A (a 1 );:::;c A (a m ).] 61 For everyn 1 andk satisying (4.15), conditional onF n1 together withc B (D n ) =k, the distribution of D n is equal toL(Bjc B (B) =k). For every k satisying (4.15), with probability 1, innitely many n have c B (D n ) =k: (4.16) Dene stopping times (k) i , the \timen of thei-th instance ofc B (D n ) =k, by (k) 0 = 0 and for i 1; (k) i = inffn> (k) i1 :c B (D n ) =kg. We write D(n)D n , to avoid multi- level subscripting, and dene B (k) i :=D( (k) i ) for i = 1; 2;:::. Then, for each k, the B (k) 1 ;B (k) 2 ;::: are independent, with the distribution L(Bjc B (B) =k), and as k varies, these sequences are mutually independent. Proof. The proof is a routine exercise; it suces to check the independence claim for an arbitrary nite number of choices of k, restricting to the B (k) i for 1 i i 0 with an arbitrary nite i 0 , and this can be done, along with checking for the specied marginal distributions, by summing over all possible values for the random times (k) i . Writing out the full argument would be notationally messy, and not interesting. 4.4 Algorithms for simulating random partitions of n In this section we focus on the generation of partitions under the uniform distribution. Other distributions on partitions such as the Plancherel measure, while important in many applications, for example [14, 18], are not addressed in the current paper, but it is highly plausible that these techniques can be adapted to those measures. We note that the phrase \generating partitions" is usually taken, as in [28], to mean the systematic 62 listing of all partitions, perhaps subject to additional constraints; this is very dierent from simulating a random instance. The computational analysis that follows uses an informal adaptation of uniform cost- ing; see Section 4.5.2. Some elements of the analysis, specically asymptotics for the acceptance rate, will be given rigorously, for example in Theorems 10, 11, and 12. 4.4.1 For baseline comparison: table methods A natural simulation method is to nd the largest part rst, then the second largest part, and so on according to a distribution of largest part 8 . The main cost associated with this method is the storage of all the distributions needed. Some details follow. Letp(k;n) denote the number of partitions of n with each part of sizek, so that p(n;n) =p n . These can be quickly calculated from the recurrence p(k;n) =p(k 1;n) +p(k;nk); (4.17) where the right hand side counts the number of partitions without anyk's plus the number of partitions with at least one k. LetX i denote thei-th largest part of a randomly generated partition, so = (X 1 ;X 2 ;:::). We have P(X 1 j) =p(j;n)=p n ; P(X 2 sjX 1 =j) =p(s;nj)=p(j;nj); 8 For the largest part, we are are dealing with unrestricted partitions of n, but for subsequent parts, the problem involves the largest part of a partition of an integer m into parts of size at most k. 63 and in general for i 2 P(X i j i jX 1 =j 1 ;:::;X i1 =j i1 ) =p(j i ;n X j ` )=p(j i1 ;n X j ` ): Rather than computing each quantity on the right hand side as it appears, an n by n table, where the j-th column consists of the numbers p( 1;j);p( 2;j);:::;p( n;j), can be computed and stored, hence generating partitions is extremely fast once this table has been created. For one uniformly distributed random number, we can look up the value of the largest part of a random partition; this implies order of p n logn lookups. An easy variation also nds the multiplicity of the largest part, implying order of p n lookups. The memory conditions are on the order ofn 2 for the entire table, a severe constraint, but once it has been constructed the generation of random partitions is rapid. 9 Another variation on this method, based on Euler's identity np n = P d;j1 dp ndj , is given in [38, 39], and cited as \the recursive method." We haven't found a clearcut complexity analysis in the literature, although [13] comes close. But we believe that this algorithm is less useful than thep(k;n) table method | sampling from the distribution on (d;j) implicit in np n = P d;j1 dp ndj requires computation of partial sums; if all the partial sums for P d;j1;djm dp mdj with m n are stored, the total storage requirement is of 9 Since the largest part of a random partition is extremely unlikely to exceed O( p n logn), one can get away using a table of size O(n 3=2 logn), which will only rarely need to be augmented. Specically, writing 0 1 for the largest part of a partition, with c = = p 6, for xed A > 0, with i0 = i0(n;A) := p n log(A p n=c)=c,Pn( 0 1 i0)! 1 exp(1=A). For example, take A = 1000, so that the i0 by n table will need to be augmented with probability approximately :001. With M words of memory available for the table, we solvei0(n;A)n =M; for example, withM = 2 28 andA = 1000 we haven = 15000 : = 2 17 and i0 = 1700, and increasing M to 2 37 gets us up to n = 9 10 6 : = 2 23 , i0 = 15000. Instead of actually augmenting the table, one could treat the roughly one out of every A computationally dicult cases as deliberately missing data, as in Section 4.4.3.3. 64 ordern 2 logn, and if they aren't stored, computing the values needed, on the y, becomes a bottleneck. 4.4.2 Waiting to get lucky Hardy and Ramanujan [22] proved the asymptotic formula, as n!1, p n exp(2c p n)=(4 p 3n); where c == p 6 : = 1:282550: (4.18) Fristedt [17] observed that, for any choice x2 (0; 1), if Z i Z i (x) has the geometric distribution given by P(Z i =k)P x (Z i =k) = (1x i ) (x i ) k ; k = 0; 1; 2;:::; (4.19) with Z 1 ;Z 2 ;::: mutually independent, and T is dened by T =Z 1 + 2Z 2 + ; (4.20) then, conditional on the event (T = n), the partition having Z i parts of size i, for i = 1; 2;:::, is uniformly distributed over the p n possible partitions of n. 10 This extremely useful observation is easily seen to be true, since for any nonnegative integers (c(1);c(2);:::) withc(1) + 2c(2) + =n, specifying a partition of the integer n, P(Z i =c i ;i = 1; 2;:::) = Y P(Z i =c(i)) = Y (1x i )(x i ) c(i) 10 We writePx orP, orZi orZi(x) interchangeably, depending on whether the choice ofx2 (0; 1) needs to be emphasized, or left understood. 65 = x c(1)+2c(2)+ Y (1x i ) =x n Y (1x i ); (4.21) which does not vary with the partition of n. The event (T =n) is the disjoint union, over all partitions ofn, of the events whose probabilities are given in (4.21), showing that P x (T =n) =p n x n Y (1x i ): (4.22) If we are interested in random partitions ofn, an especially eective choice forx, used by [17, 42], is x(n) = exp(c= p n); where c == p 6: (4.23) Under this choice, we have, as n!1, 1 n E x(n) T! 1; 1 n 3=2 Var x(n) T! 2 c ; (4.24) this is essentially a pair of Riemann sums, see [7], page 106. If we write (x) for the standard deviation of T , then the second part of (4.24) may be paraphrased as, with x =x(n); as n!1, (x) p 2=c n 3=4 : (4.25) The local central limit heuristic would thus suggest asymptotics forP x (T =n), and these simplify, using (4.23) and (4.24), as follows: P x (T =n) 1 p 2 (x) 1 2 4 p 6n 3=4 : (4.26) 66 The Hardy-Ramanujan asymptotics (4.18) and the exact formula (4.22) combine to show that (4.26) does hold. Theorem 10. Analysis of Waiting-to-get-lucky. Consider the following algorithm to generate a random partition of n, chosen uni- formly from the p n exp(2c p n)=(4 p 3n) possibilities. Use the distributions in (4.19), with parameter x given by (4.23). 1. Propose a sample, Z 1 ;Z 2 ;:::;Z n ; compute T n :=Z 1 + 2Z 2 + +nZ n . 2. In case (T n =n), we have got lucky. Report the partition with Z i parts of size i, for i = 1 to n, and stop. Otherwise, repeat from the beginning. This algorithm does produce one sample from the desired distribution, and the expected number of proposals until we get lucky is asymptotic to 2 4 p 6n 3=4 . Proof. It is easily seen that P x (Z i = 0 for all i > n)! 1. [In detail, P(not (Z i = 0 for all i > n)) P i>n P(Z i 6= 0) = P i>n x i < P in x i = x n =(1x) x n =(c= p n) = exp(c p n) p n=c! 0.] Hence the asymptotics (4.26), given for the innite sum T , also serve for the nite sumT n , in which the number of summands, along with the parameter x =x(n), varies with n. Remark 4. We are not claiming that the running time of the algorithm grows like n 3=4 , but only that the number of proposals needed to get one acceptable sample grows like n 3=4 . The time to propose a sample also grows with n. Assigning cost 1 to each call to the random number generator, with all other operations being free, the cost to propose one sample grows like p n rather thann; details in Section 4.5.2. Combining with Theorem 10, 67 the cost of the waiting-to-get-lucky algorithm grows like n 5=4 . A simple Matlab R program to carry out waiting-to-get lucky is presented in Section 4.5.1. 4.4.3 Divide-and-conquer, by small versus large The waiting-to-get-lucky strategy is limited primarily by the probability that the target is hit, which diminishes liken 3=4 . Already atn = 10 8 , the probability is one in a million. Instead of trying to hit a hole in one, we allow approach shots. Recall that to sample partitions uniformly, based on (4.19) { (4.26), our goal is to sample from the distribution of (Z 1 ;Z 2 ;:::;Z n ) conditional on T = n. Using x = x(n) from (4.23), and with b2f1; 2;:::;n 1g, we let A = (Z b+1 ;Z b+2 ;:::;Z n ); B = (Z 1 ;:::;Z b ): (4.27) Motivated by the standard paradigm for deterministic divide-and-conquer, that the two tasks should be roughly equal in diculty, we will take b =O( p n), having observed that the expected largest part of a random partition grows like p n logn. With T A = n X i=b+1 iZ i ; T B = b X i=1 iZ i ; (4.28) we want to sample from (A;B) conditional on T A +T B =n. We haveA;B independent, and we use h(A;B) = 1(T A +T B = n). The divide-and-conquer strategy, according to Lemma 7, is to sample X from the distribution of A biased by the probability that an independently chosen B will will make a match, and then, having observed A = a, sampling Y from the distribution of B conditional on h(a;B) = 1. 68 In order to simulate X, we will use rejection sampling, as reviewed in Section 4.2.2. To nd the optimal rejection probabilities, we want the largest C such that C max j P(T B =nj) P(T n =n) 1; or equivalently, C = P(T n =n) max j P(T B =j) : (4.29) The values P(T B = k) for k = 0; 1;:::;n can be simultaneously computed, using the recursion (4.17); what we really have is a variant of the n by n table method of Section 4.4.1, in which the table is b by n, so the computation time is bn; furthermore, one only needs to store the current and previous row, (or with overwriting, only the current row), so the storage is n. Once we have the last row of the b by n table, we can easily nd C and indeed the entire threshold function t. The time to compute the table, bn, would have been the much larger (nb)n had we interchanged the roles ofA andB, i.e., taken B := (Z b+1 ;Z b+2 ;:::;Z n ) instead of B := (Z 1 ;:::;Z b ). In order to simulateY = (BjZ 1 +2Z 2 ++bZ b =nk), in situations whereb is too large to store a b by n table, we resort to waiting-to-get-lucky. 11 Our goal for the next section is to explore mix-and-match using integer partitions as an example, regardless of whether there exists a better competing algorithm for integer partitions. 11 Of course, instead of trying to get lucky all at once, one might apply divide-and-conquer to this B task. But we do not pursue this particular iteration, in light of a better iteration scheme, presented in Section 4.4.5. 69 4.4.3.1 Divide-and-conquer with mix-and-match When a sample of size m> 1 is desired, Lemmas 7 { 9 can be used to generate unbiased samples. Observe that the matching function with h(A;B) = 1(T A + T B = n), for (A;B)2AB, satises condition 2 of Lemma 8, withc A (A) =T A andc B (B) =nT B . Hence mix-and-match is enabled. The rst phase of the algorithm, phase A, generates samples A 1 ;A 2 ;:::;A m from X, according to Lemma 7. This creates a multiset of m colors,fc 1 ;:::;c m g, where c i = c A (A i ). We think of these as m demands that must be met by phase B of the algorithm. One strategy for phase B is to generate an iid sequence of samples of B; initially, for each sample, we compute its color c = c B (B) and check whether c is in the multiset of demandsfc 1 ;:::;c m g; when we nd a match, we pair B with one of the A i of the matching color, to produce our rst sample, which we set as S i = (A i ;B). Then we reduce the multiset of demands by one element, and iterate, until all m demands have been met. Lemma 9 implies that the resulting list (S 1 ;S 2 ;:::;S m ) is an iid sample of m values of S, as desired. Remark 5. Note that in the above, if the rst match found, (A i ;B), is labelled asS 1 , and the second matching pair is labelled S 2 , and so on, then the resulting list (S 1 ;S 2 ;:::;S m ) is not necessarily an iid sample of S; the colors c withP(c =c B (B)) large would tend to show up earlier in the list. 4.4.3.2 Roaming x Consider again a sample of size m = 1. Having accepted X = (Z b+1 ;:::;Z n ) with color k =T A , in the notation of (4.28), we now need Y , which is B = (Z 1 ;:::;Z b ) conditional 70 on having color k, which simplies to having nk = T B := P b i=1 iZ i . One obvious strategy is to sampleB repeatedly, until getting lucky. The distribution of B is specied by (4.19) and (4.23) | with a choice of parameter, x = x(n), not taking into account the values ofb andnk. A computation similar to (4.21) shows that the distribution of (Z 1 (y);:::;Z b (y)) conditional on P b i=1 iZ i (y) =nk is the same, for all choicesy2 (0; 1). As observed in [7], they which maximizesP y T B =nk, i.e., gives us the best chance of getting lucky, is the solution of nk = P b i=1 EiZ i (y). Thus, in the case m = 1, the optimal choice of y is easily prescribed. However, for large m we also recommend using mix-and-match, which brings into play a complicated coupon collector's situation. With a multiset of demandsfc 1 ;:::;c m g from Section 4.4.3.1, the algorithm designer has many choices of global strategy; it is not obvious whether or not a greedy strategy | picking y for the next proposed B, to maximize the chance that B satised at least one of the demands | is optimal. 4.4.3.3 Deliberately missing data For motivation: in the B phase of mix-and-match, one situation would have allm demands c 1 ;c 2 ;:::;c m distinct, and in addition, P(c B (B) = c i ) relatively constant as i varies. In this situation,we have the classic coupon collector's problem; with timem logm to collect all m coupons. There is a harmonic slowdown: for example, the rst match is found 100 times as quickly as the match after ninetynine percent of the demands have been met; the lastv matches are expected to take about (1 + 1=2 + + 1=v)= logm as long as the rst mv matches combined. In the situation with the P(c B (B) = c i ) relatively constant, but with large multiplicities in the multisetfc 1 ;:::;c m g, there is an even more dramatic 71 slowdown near the end, based on the size of the set underlying the multiset of remaining demands. Note however, the valuesP(c B (B) =c i ) might not be relatively constant, even if we adjust the sampling distributions of B to t the particular colors remaining to be found, as described in Section 4.4.3.2. Suppose we stop before completing the B phase, with some v of demands remaining to be met. There is usable information in the partially completed sample, which has the mv partitions found so far. This list of mv is not a sample of size mv, since sample requires iid from the original target distribution. But, had we run the B phase to completion, we would have had an honest sample of size m, so there is information in knowing all but v of the m values. Think of this sample of size m as the sample, with some v of its elements being unknown. For estimates based on the sample proportion, an error of size at most v=m is introduced by the unknown elements. Since the standard deviation, due to sample variability, decays roughly like 1= p m, it makes sense to allowv comparable to p m. For example, Pittel [43] proves that, as n!1, given two partitions of n, choosing (;) uniformly over the (p n ) 2 possible pairs, the probability n that dominates satises n ! 0. Also, he proves that, choosing a single partition uniformly over the p n possibilities, the probability n that dominates its dual 0 satises n ! 0. It is natural to guess that lim n = n exists, with a value in [0;1]; one can use simulations to suggest whether 0; 1; or1 is the most plausible value. The harder task is to simulate for n . If one has an honest sample of 2m partitions of n, with v missing items due to terminating the B phase early, then one would have an honest sample of mV pairs 72 (;), with random variable V v. 12 If we had v = 0, and H of the pairs have dominates , then the point estimate for p := n is H=m, and the standard (1)% condence interval is " H m z =2 r p(1p) m ; H m +z =2 r p(1p) m # : WithV missing pairs, consider the countK, how many of themV completed pairs had dominates. We can do a worst-case analysis by assuming, on one side, that allV of the missing pairs have domination, and on the other side, that none of the V missing pairs have domination; more succinctly, H2 [K;K +V ]. Hence the condence interval " K m z =2 r p(1p) m ; K +V m +z =2 r p(1p) m # : is at least a (1)% condence interval, for the procedure with deliberately missing data 13 . 4.4.4 Divide-and-conquer with a trivial second half Ignoring the paradigm that divide-and-conquer should balance its tasks, in (4.27) a very useful choice is b = 1. Loosely speaking, it reduces the cost of waiting-to-get lucky from order n 3=4 to order n 1=4 . 12 The pairing onf1; 2;:::; 2mg must be assigned before observing which v items are missing. Pairing up the missing partitions, in order to getdv=2e missing pairs, in not valid; see Remark 5. 13 We feel compelled to speculate that many users of \condence intervals" don't really care about the condence, nor the width of the interval, but really rely on the center of the interval, which in the standard case is an unbiased estimator. Our interval has center (K +V=2)=m, and this value is not an unbiased estimator; indeed, anything based on K, including the natural K=(mV ), is biased in an unknowable way. 73 The analysis of the speedup, relative to waiting-to-get-lucky, is easy. Theorem 11. The speedup of the b = 1 procedure above, relative to the waiting-to- get-lucky algorithm described in Theorem 10, is asymptotically p n=c, with c = = p 6. Equivalently, the acceptance cost is asymptotically 2n 1=4 6 3=4 =. Proof. Recall thatx =e c= p n . From (4.10) and (4.29), the acceptance cost 1=C is given by C = P(T =n)= max k P(Z 1 =k) = P(T = n)=P(Z 1 = 0) = P(T = n)=(1x). The comparison algorithm, waiting-to-get-lucky, has acceptance cost 1=P(T = n). The ratio simplies to 1=(1x) p n=c. To review, stage 1 is to simulate (Z 2 ;Z 3 ;:::;Z n ), and accept it with probability proportional to the chance that Z 1 = n (2Z 2 + +nZ n ); the speedup comes from the brilliant idea in [50]. In contrast, waiting-to-get-lucky can be viewed as simulating (Z 2 ;Z 3 ;:::;Z n ) and then simulating Z 1 to see whether or notZ 1 =n(2Z 2 ++nZ n ). 4.4.5 Self-similar iterative divide-and-conquer: p(z) = d(z)p(z 2 ) The methods of Sections 4.4.2 { 4.4.4 have acceptance costs that go to innity with n. We now demonstrate an iterative divide-and-conquer that has an asymptotically constant acceptance cost. A well-known result in partition theory is p(z) = Y i (1z i ) 1 = Y i 1 +z i ! Y i 1 1z 2i ! =d(z)p(z 2 ); (4.30) whered(z) = Q i 1+z i is the generating function for the number of partitions with distinct parts, andp(z 2 ) is the generating function for the number of partitions where each part has 74 an even multiplicity. This can of course be iterated to, for example,p(z) =d(z)d(z 2 )p(z 4 ), etc., and this forms the basis for a recursive algorithm. Recall from (4.19) in Section 4.4.2, that each Z i Z i (x) is geometrically distributed with P (Z i k) =x ik . The parity bit of Z i , dened by i = 1(Z i is odd); is a Bernoulli random variable i i (x), with P( i (x) = 1) = x i 1 +x i ; P( i (x) = 0) = 1 1 +x i : (4.31) Furthermore, (Z i (x) i )=2 is geometrically distributed, as Z i (x 2 ), again in the notation (4.19), and (Z i (x) i )=2 is independent of i . What we really use is the converse: with i (x) as above, independent of Z i (x 2 ), the Z i (x) constructed as Z i (x) := i (x) + 2Z i (x 2 ); i = 1; 2;::: indeed has the desired geometric distribution. Theorem 12. The asymptotic acceptance cost for one step of the iterative divide-and- conquer algorithm using A = ( 1 (x); 2 (x);:::), B = ((Z 1 (x) 1 )=2; (Z 2 (x) 2 )=2;:::), is p 8. 75 Proof. The acceptance cost 1=C can be computed via (4.29) and (4.26), with C = P x (T =n) max k P x 2(T = nk 2 ) = P x (T =n) max k P x 2(T =k) 1 p 2(x) 1 p 2(x 2 ) n 3=4 (n=4) 3=4 = 4 3=4 = 8 1=2 : Here is an informal discussion of the full algorithm. First, propose A until getting ac- ceptance, then, since the B task is to nd a uniformly chosen partition of a smaller integer, iterate to nish up. In eect, the iterative algorithm is to determine the (Z 1 ;Z 2 ;:::;Z n ) conditional on Z 1 + 2Z 2 +::: =n, by nding the binary expansions: rst the 1s bits of all the Z i s, then the 2s bits, then the 4s bits, and so on. With a little more detail: to start, with A = ( 1 (x); 2 (x);:::) and T A = P i i i , we have ET A = P n i=1 ix i 1+x i n=2, and it can be shown that, even after conditioning on acceptance, the distribution of T A is concentrated around n=2. Since B = ((Z 1 (x) 1 )=2; (Z 2 (x) 2 )=2;:::) is equal in distribution to (Z 1 (x 2 );Z 2 (x 2 );:::), and targetn 0 = (nT A )=2, we see that the B phase is to nd a partition of the integer n 0 , uniform over the p n 0 possibilities. In carrying out the B task we simply use x(n 0 ) as the parameter, but the choice (x(n)) 2 would also work. 4.4.5.1 Exploiting a parity constraint Theorem 12 states that the asymptotic acceptance cost for proposals of A = ( 1 (x); 2 (x);:::) is 2 p 2, and this already takes into account an obvious lower bound of 2, since the parity ofT A = 1 + 2 ++ n is nearly equally distributed overfodd,eveng, 76 and rejection is guaranteed if T A does not have the same parity as n. An additional speedup is attainable by moving 1 from the A side to the B side: instead of simulating 1 , there now will be a trivial task, just as there was Z 1 in the \b = 1" procedure of Section 4.4.4. That is, we switch to A = ( 2 (x); 3 (x);:::) and B = ( 1 (x); (Z 1 (x) 1 )=2; (Z 2 (x) 2 )=2;:::); the parity of the new T A dictates, deterministically, the value of the rst component of B, under the conditioning on h(a;B) = 1. The rejection probabilities for a proposedA are like those in Theorem 12, but with an additional factor of 1=(1+x 1 ) orx=(1+x 1 ), depending on the parity ofn+ 2 ++ n . Sincex =x(n)! 1 asn!1, these two factors both tend to 1/2, so the constantC as determined by (4.29) becomes, asymptotically, twice as large. Theorem 13. The asymptotic acceptance cost for one step of the iterative divide-and- conquer algorithm using A = ( 2 (x); 3 (x);:::) and B = ( 1 (x); (Z 1 (x) 1 )=2; (Z 2 (x) 2 )=2;:::) is p 2. Proof. The acceptance cost 1=C can be computed, as in the proof of Theorem 12, with the only change being that in the display for computing C, the expression under the max k , which wasP(T (x 2 ) = (nk)=2) changes to P 2jnk + 1 (x) and T (x 2 ) = nk 2 =P(2jnk + 1 (x))P T (x 2 ) = nk 2 1 2 P T (x 2 ) = nk 2 : 77 4.4.5.2 The overall cost of the main problem and all its subproblems Informally, for the algorithm in the previous section, the main problem has size n and acceptance cost p 2, applied to a proposal cost asymptotic toc 0 p n, for a net cost p 2c 0 p n. The rst subproblem has random size, concentrated around n=4, and hence half the cost of the main problem. The sum of a geometric series with ratio 1=2 is twice the rst term, so the net cost of the main problem and all subproblems combined is 2 p 2c 0 p n. In framing a Theorem to describe this, we try to allow for a variety of costing schemes. We believe that the rst sentence in the hypotheses of Theorem 14 is valid, with = 1=2, for the scheme of Remark 4. The second sentence, about costs of tasks other than proposals, is trivially true for the scheme of Remark 4, but may indeed be false, in costing schemes which assign a cost to memory allocation, and communication. Theorem 14. Assume that the cost C(n) to propose the A = ( 2 (x); 3 (x);:::) in the rst step of the algorithm of Section 4.4.5.1, is given by a deterministic function with C(n)c 0 n for some positive constant c 0 and constant 1=2, or even more generally, C(n) =n times a slowly varying function of n. Assume that the cost of all steps of the algorithm, other than making proposals, 14 is relatively negligible, i.e., o(C(n)). Then, the asymptotic cost of the entire algorithm is 1 1 1=4 p 2C(n) 2 p 2C(n): 14 such as the arithmetic to compute acceptance/rejection thresholds, the generation of the random numbers used in those acceptance/rejection steps, and merging the accepted proposals in to a single partition 78 Proof. The key place to be careful is in the distinction between the distribution of a proposed A = ( 2 (x); 3 (x);:::), and the distribution after rejection/acceptance. For proposals, in which the i are mutually independent, with T A := P n 2 i i (x), with x = x(n) from (4.23), calculation gives ET A n=2 and Var(T A ) (1=c)n 3=2 . Chebyshev's inequality for being at least k standard deviations away from the mean, to be used with k =k(n) =o(n 1=4 ), and k!1, givesP(jT A ET A j>k SD(T A )) 1=k 2 . Now consider the good event G that a proposed A is accepted; conditional on G, the i are no longer mutually independent. But the upper bound from Chebyshev is robust, with P(jT A ET A j > k SD(T A )jG) 1=(k 2 P(G)). Since P(G) is bounded away from zero, by Theorem 13, we still have an upper bound which tends to zero, and shows that (nT A )=2, divided by n, converges in probability to 1=4. Write N i N i (n) for the random size of the subproblem at stage i, starting from N 0 (n) =n. The previous paragraph showed that for i = 0, N i+1 (n)=N i (n)! 1=4, where the convergence is convergence in probability, and the result extends automatically to each xed i = 0; 1; 2;:::. We have deterministically that N i+1 =N i 1=2, so in particular N i > 0 impliesN i+1 <N i . SetC(0) = 0, redening this value if needed, so that the costs of all proposals is exactly the random S(n) := X i0 C(N i (n)): 79 It is then routine analysis to use the hypothesis thatC(n) is regularly varying, to conclude that S(n)=C(n)! 1=(1 4 ), where again, the convergence is convergence in proba- bility. The deterministic bound N i+1 (n)=N i (n) 1=2 implies that the random variables S(n)=C(n) are bounded, so it also follows thatES(n)=C(n)! 1=(1 4 ). 4.4.5.3 A variation based on p(z) =p odd (z)p(z 2 ) With p odd (z) := Y i odd (1z i ) 1 ; Euler's identity d(z) =p odd (z) suggests a variation on the algorithm of section 4.4.5. It is arguable, whether the original algorithm, based on p(z) =d(z)p(z 2 ), and the variant, based on p(z) =p odd (z)p(z 2 ), are genuinely dierent. Arguing the variant algorithm is dierent: the initial proposal is A = (Z 1 (x);Z 3 (x);Z 5 (x);:::). Upon acceptance, we have determined (C 1 ();C 3 ();C 5 ();:::), where is the partition of n that the full iterative algorithm will determine, and C i () is the number of parts of size i in . The B task will nd (C 2 ();C 4 ();C 6 ();:::) by iterating the divide-and-conquer idea, so that the second time through the A procedure determines (C 2 ();C 6 ();C 10 ();:::), and the third time through the A procedure determines (C 4 ();C 12 ();C 20 ();:::), and so on. Arguing that the variant algorithm is essentially the same: just as in Euler's bijective proof that p odd (z) = d(z), the original algorithm had a proposal A = ( 1 (x); 2 (x);:::), which can be used to construct the proposal (Z 1 (x);Z 3 (x);Z 5 (x);:::) for the variant algorithm. That is, one can check that starting 80 with independent(i;x) i (x) given by (4.31), forj = 1; 3; 5;:::;Z j := P m0 (j 2 m ;x) 2 m indeed has the distribution ofZ j (x) specied by (4.19), withZ 1 ;Z 3 ;::: independent. And conversely, one can check that starting with the independent geometrically distributed Z 1 (x);Z 3 (x);:::, taking base 2 expansions yields mutually independent 1 (x); 2 (x);::: with the Bernoulli distributions specied by (4.31). Hence one could program the two algorithms so that they are coupled: starting with the same seed, they would produce the same sequence of colorsT A for the initial proposal, and the same count of rejections before the acceptance for the rst time through the A procedure, with same T A for that rst acceptance, and so on, including the same number of iterations before nishing. Under this coupling, the original algorithm produces a partition of n, the variant algorithm produces a partition of n | and we have implicitly dened the deterministic bijection f with =f(). Back to arguing that the algorithms are dierent: we believe that the coupling de- scribed in the preceding paragraph supplies rigorous proofs for the analogs of Theorems 12 and 13. For Theorem 14 however, one should also consider the computational cost of Euler's bijection, for various costing schemes, and we propose the following analog, for the variant based on p(z) =p odd (z)p(z 2 ), combined with the trick of moving 1 (x) from the A side to the B side, as in Section 4.4.5.1: Theorem 15. Assume that the costD(n) to propose (Z 1 (x);Z 2 (x);:::;Z n (x)), withx = x(n), satisesD(n) =n times a slowly varying function of n. Assume also that the cost D A (n) to propose A = (Z 1 (x) 1 (x);Z 3 (x);Z 5 (x);:::) satises D A (n)D(n)=2. 81 Then, the asymptotic cost of the entire algorithm is 1 1 1=4 p 2D(n)=2 p 2D(n): Proof. Essentially the same as the proof of Theorem 14. It is plausible that the cost functionC(n) from Theorem 14 and the the cost function D(n) from Theorem 15 are related byC(n)D(n); note that this depends on the choice of costing scheme, essentially asking whether or not the algorithmic cost of carrying out Euler's odd-distinct bijection is negligible. Another argument that the two algorithms are dierent arises from the completely articial example at the end of Section 4.5.4. 4.5 For integer partitions: review, method of choice In Section 4.4, ve methods were presented for the simulation of partitions of the integer n. Now we review the costs of running these algorithms, taking into account the size of n, the number m of samples to be delivered, and so on. We also consider alternate simulation tasks involving integer partitions with restrictions on the parts to see how the methods adapt. 4.5.1 Method of choice for unrestricted partitions If one is interested in generating just a few partitions of a moderately sized n, then the waiting-to-get-lucky method, with a dumb \timen" method of proposal for (Z 1 ;:::;Z n ), 82 is very easy to program. The overall runtime is order of n 7=4 | a factor of n to make a proposal, and a factor of n 3=4 for waiting to get lucky. 15 For example, in Matlab R [36] n=100; logx=-sqrt(6*n)/pi; s=0; while s~=n, Z=floor(log(rand(n,1))./(1:n)'.*logx); s=(1:n)*Z; end runs on a common desktop computer 16 at around 600 partitions of n = 100 per second; withn = 1000 the same runs at about 20 partitions per second, and at n = 10; 000 takes about 2 seconds per partition. The table method is by far the fastest method, if one is interested in generating many samples, and the table of size n 2 oating point numbers ts in random access memory. For example, forn = 10; 000 the same computer as above takes 5 seconds to generate the table | a one time cost, and then nds 40 partitions per second. At n = 15; 000, the same computer takes 28 seconds to generate the table, and then nds 25 partitions per second. But at n = 19; 000, the computer freezes, as too much memory was requested. The divide-and-conquer methods of Sections 4.4.3 and 4.4.4, using the small versus large division of (4.27), oer a large speedup over waiting-to-get-lucky, but only for case 15 A smarter \time p n " method of proposal for (Z1;:::;Zn) is described in Secton 4.5.2. It is harder to program, but gets the overall runtime down to order of n 5=4 . 16 Macintosh iMac, 3.06 Ghz, 4 GB RAM 83 b = 1, with its trivial second half, can we analyze the speedup | the p n=c factor in Theorem 11. The divide-and-conquer method based on p(z) =d(z)p(z 2 ) is unbeatable for large n. Regardless of the manner of costing, be it only counting random bits used, or uniform costing, or logarithmic (bitop) costing, the cost to nd a random partition of n must be asymptotically at least as large as the time to propose (Z 1 ;:::;Z n ) for a random partition of a random number around n. The entire divide-and-conquer algorithm of Theorem 15, compared with just proposing (Z 1 ;:::;Z n ), has asymptotically an extra cost factor of p 2. So the claim of unbeatable at the start of this paragraph really means: asymptotically unbeatable by anything more than a factor of p 2. 4.5.2 Complexity considerations At the end of Section 4.2.2 we note that in the general view of probabilistic divide-and- conquer algorithms, a key consideration is computability of the acceptance threshold t(a). The case of integer partitions, using any of the divide-and-conquer algorithms of Section 4.4.5, is perhaps exceptionally easy, in that computing the acceptance threshold is essentially the same as evaluating p m , an extremely well-studied task. For m > 10 4 a single term of the Hardy-Ramanujan asymptotic series suces to evaluate p m with relative error less than 10 16 ; see Lehmer [32, 34]. 17 This single term is hr 1 (n) := exp(y) 4 p 3(n 1 24 ) 1 1 y ; where y = 2c r n 1 24 ; 17 We thank David Moews for bringing these papers to our attention. 84 and numerical tabulation 18 , together with Lehmer's guarantee, shows that jlnp n lnhr 1 (n)j< 10 16 for all n 810: Is oating point accuracy sucient, in the context of computing an acceptance thresh- old t(a)? There is a very concrete answer, based on [29]. First, as in Lemma 7.14 in [4], given p2 (0; 1), a p-coin can be tossed using a random number of fair coin tosses; the expected number is exactly 2, unless p is a kth level dyadic rational, i.e., p =i=2 k with oddi, in which case the expected number is 22 1k . The proof is by consideration of say B;B 1 ;B 2 ;::: iid withP(B = 0) =P(B = 1) = 1=2; afterr tosses we have determined the rstr bits of the binary expansion of a random numberU which is uniformly distributed in (0; 1), and the usual simulation recipe is that ap-coin is the indicator 1(U <p). Unless b2 r pc =b2 r Uc, the rst r fair coin tosses will have determined the value of the p-coin. Exchanging the roles of U and p, we see that number of bits of precision read o of p is, on average, 2, and exceeds r with probability 2 r . If a oating point number delivers 50 bits of precision, the chance of needing more precision is 2 50 , per evaluation of an indicator of the form 1(U < p). Our divide-and-conquer doesn't require very many ac- ceptance/rejection decisions; for example, with n = 2 60 , there are about 30 iterations of the algorithm in Theorem 13, each involving on average about p 2 acceptance/rejection decisions, according to Theorem 12. So one might program the algorithm to deliver exact results; most of the time determining acceptance thresholds p = t(a) in oating point arithmetic, but keeping track of whether more bits of p are needed. On the unlikely 18 done in Mathematica R 8. 85 event, of probability around 30 p 2=2 50 < 4 10 14 , that more precision is needed, the program demands a more accurate calculation of t(a). This would be far more ecient than using extended integer arithmetic to calculate values of p n exactly. Another place to consider the use of oating point arithmetic is in proposing the vector (Z 1 (x);:::;Z n (x)). If one call to the random number generator suces to nd the next arrival in a rate 1 Poisson process, we have an algorithm using O( p n) calls, which can propose the entire vector (Z 1 ;Z 2 ;:::), using x = x(n) from (4.23). The proposal algorithm is based on a compound Poisson representation of geometric distributions, and is similar to a coupling used in [5], Section 3.4.1. The supporting calculation here is that, with x =x(n) = exp(c= p n) and c == p 6, s(n) := X i;j1 x ij j = X j 1 j 2 jx j 1x j 2 6 x 1x c p n: The algorithm constructs the rate 1 Poisson process on (0;s(n)), with the full interval partitioned into subintervals of lengths x ij =j. With Y i;j equal to the number of arrivals in the subinterval of lengthx ij =j, theY i;j are mutually independent, Poisson distributed, andZ i := P jY i;j constructs the desired mutually independent geometrically distributed Z 1 (x);Z 2 (x);:::. Once again, suppose we want to guarantee exact simulation of a proposal (Z 1 (x);:::;Z n (x)). In the compound Poisson process with s(n) arrivals expected, we need to assign exactly, for each arrival, say at a random time R, the corresponding index (i;j), such the par- tial sum for s(n) up to, but excluding the ij term, is less than R, but the partial sum, 86 including the ij term, is greater than or equal to R. Based on an entropy result from Knuth-Yao [29], a crucial quantity is h(n) := X i;j1 x ij j log j x ij (c 0 (2)=c) p n: An exact simulation of the Poisson process, assigning ij labels to each arrival, can be done 19 withO(s(n) +h(n)) genuine random bits, and the bounds fors(n) andh(n) show that this is O( p n). The costing scheme which counts only the expected number of random bits needed is clearly inadequate. Consider the impractical algorithm: list allp n partitions, for example in lexicographic order. Usedlog 2 p n e random bits to choose an integer I uniformly from f1; 2:::;p n g. Report the Ith partition of n. If one costs only by the number of random bits needed, the algorithm just described is strictly unbeatable! 4.5.3 Partitions with restrictions As with unrestricted partitions, ifn is moderate and a recursive formula exists, analogous to that of Section 4.4.1, then the table method is the most rapid, and divide-and-conquer is not needed. However, the requirement of random access storage of size n 2 is a severe limitation. The self-similar iterative divide-and-conquer method of Section 4.4.5 is nearly un- beatable for large n, for ordinary partitions. There are many classes of partitions with 19 on each interval (m1;m] form = 1 tods(n)e, perform an exact simulation of the number of arrivals, which is distributed according to the Poisson distribution with mean 1. For each arrival on (m 1;m], there is a discrete distribution described by those partition points lying in (m 1;m), together with the endpoints m 1 and m; calling the corresponding random variable Xm, the sum of the base 2 entropies satises P h(Xm)h(n) +s(n), since each extra subdivision of one of the original subintervals of length x ij =j adds at most one bit of entropy. 87 restrictions that iterate nicely, and should be susceptible to a corresponding iterative divide-and-conquer algorithm. Some of these classes, with their self-similar divisions, are 1. distinct parts, d(z) =d odd (z)d(z 2 ); 2. odd parts, p odd (z) =d odd (z)p odd (z 2 ); 3. distinct odd parts, d odd (z) =d (z)d odd (z 3 ). Here d (z) = (1 +z)(1 +z 5 )(1 +z 7 )(1 +z 11 ) represents distinct parts1 mod 6. Other recurrences are discussed in [27, 41, 46], and the standard text on partitions [2]. It is not easy to come up with examples where the optimal divide-and-conquer is like that in Section 4.4.3, based on small parts versus large parts. One suggestion is partitions with all parts prime; there should be a large range of n for which table methods are ruled out by the memory requirement, while the n memory,bn computational time to calculate rejection probabilities is not prohibitive. Another suggestion is partitions with a restriction on the multiplicity, for example, a part of sizei can occur at mostf(i) times | with f suciently complicated as to rule out iterative formulas such as those in the preceding parapgraph. Another family is, ford = 2; 3;:::, partitions where all parts dier by at least d. 4.5.4 An eye for gathering statistics The underlying motivation for sampling a random element S,m times under some given distribution , might be to produce a statistic h(S 1 ;S 2 ;:::;S m ) based on that sample. Conceivably, the deterministic function h might only look at some part of the data, so that there is a divide-and-conquer scheme in which S = (X;Y ), using the notation of 88 Lemma 7, and h(S 1 ;S 2 ;:::;S m ) =g(X 1 ;X 2 ;:::;X m ) for a deterministic g. In this case, in the notation of Section 4.2.1, one need only carry out Stage 1. Here is a completely articial example, motivated by the question of dominance, discussed in Section 4.4.3.3. For a partition of n, dene f() to be the partition, having distinct parts, in whichi is a part off() if and only ifi has odd multiplicity as a part of. Now suppose that one needs to approximate, for say n = 10 9 , parity n =P(f() dominates f()), choosing (;) uniformly over the (p n ) 2 possible pairs of partitions of n. 4.6 For comparison: opportunistic divide-and-conquer with mix-and-match Recall the setup of (4.1) { (4.4). In the third and fourth paragraphs of Section 4.1.2, we describe our starting point, a motivation stemming from mix-and-match, and admit that we can nd no way to carry out \opportunistic" mix-and-match, to get a perfect sample. Here, we point out that nonetheless, the natural opportunistic procedure does supply a consistent estimator. Take a deterministic design: for integers m 1 ;m 2 1, let A 1 ;:::;A m 1 be distributed according to the distribution of A given in (4.1), and let B 1 ;:::;B m 2 be distributed according to the distribution of B, with these m 1 +m 2 sample values being mutually independent. The opportunistic observations, under a take-all-you-can-get strategy, 89 are all the pairs (A i ;B j ) for which h(A i ;B j ) = 1. To use these in an estimator, one would naturally count the available pairs, via W =W (m 1 ;m 2 ) = X 1im 1 ;1jm 2 h(A i ;B j ); (4.32) and for a deterministic function g : supporthAB !R, form the total score from these pairs, say GG(m 1 ;m 2 ) = X 1im 1 ;1jm 2 g(A i ;B j )h(A i ;B j ): (4.33) The natural estimator is the observed average score per pair, G=W . Unfortunately, this is not unbiased. However, it is consistent, i.e., as m 1 ;m 2 !1, G=W!Eg(S) in probability. Theorem 16. For bounded g, we get an unbiased estimate of pEg(S) from G=(m 1 m 2 ). Furthermore, as m 1 ;m 2 !1 G(m 1 ;m 2 )=(m 1 m 2 )! pEg(S), where the convergence is convergence in probability. Proof. To show unbiased: for eachi;j we have, sinceh is an indicator, withEh(A i ;B j ) = p, Eg(A i ;B j )h(A i ;B j ) =E (g(A i ;B j )jh(A i ;B j ) = 1) p: According to (4.4), the conditional distribution of (A i ;B j ) given h = 1 is equal to the distribution of S, so the right side of the display above equals pEg(S). For consistency: write X ij = g(A i ;B j )h(A i ;B j ). In the variance-covariance expan- sion of Var(G), there are m 1 m 2 terms cov(X ij ;X i 0 j 0) where both i = i 0 and j = j 0 , 90 m 1 m 2 (m 2 1) terms where only i =i 0 , m 1 (m 1 1)m 2 terms where only j =j 0 , and all other terms | involving four independent random variables A i ;A i 0;B j ;B j 0 | are zero. Hence Var(G=(m 1 m 2 ))! 0, and Chebyshev's inequality implies the desired convergence in probability. Observe that in the special case g = 1, the random variable G is the count W , so Theorem 16 implies thatW=(m 1 m 2 )!p. This shows thatG=W is a consistent estimator ofEg(S). 91 Chapter 5 Opportunistic Probabilistic Divide-And-Conquer Don't only practice your art, but force your way into its secrets, for it and knowledge can raise Men to the Divine. { Beethoven Finally I'm becoming stupider no more. { Erdos The original intention of PDC was to provide exact samples, but in order to achieve this we must have access to the conditional distributions. In Section 4.2.2 the quantity t(a) was dened as the threshold function necessary to carry out von Neumann's rejection recipe in order to obtain the appropriate conditional distribution of (Ajh(A;B) = 1). Un- fortunately, it is not always possible to calculate, or to calculate eciently, the conditional distributions required, and yet we may still be interested in the information contained in 92 a sample from a related distribution, i.e., a sample that comes from a distribution \close" to the target. There are dierent ways to quantify the distance between two distributions, for exam- ple total variation or Wasserstein distance, and this information can be used in various contexts like condence intervals. A good starting reference on probability metrics is [21]. Section 4.6 introduced a take-all-you-can-get approach to parameter estimation, which produced consistent but not unbiased estimators. We are in the process of developing various opportunistic approaches to PDC, with the goal of obtaining upper bounds on the total variation and Wasserstein distance between the opportunistic quasi-samples and the desired but unobtainable exact samples. The results are preliminary at present, and will not be discussed in this thesis. 93 References [1] Noga Alon, Dmitry N. Kozlov, and Van H. Vu. The geometry of coin-weighing prob- lems. In 37th Annual Symposium on Foundations of Computer Science (Burlington, VT, 1996), pages 524{532. IEEE Comput. Soc. Press, Los Alamitos, CA, 1996. [2] George E. Andrews. The Theory of Partitions. Cambridge Mathematical Library, 1984. [3] T. Apostol. Introduction to analytic number theory. Springer-Verlag, New York, 1984. [4] Sanjeev Arora and Boaz Barak. Computational complexity. Cambridge University Press, Cambridge, 2009. A modern approach. [5] R. Arratia. On the amount of dependence in the prime factorization of a uniform random integer. In Contemporary combinatorics, volume 10 of Bolyai Soc. Math. Stud., pages 29{91. J anos Bolyai Math. Soc., Budapest, 2002. [6] Richard Arratia, Louis Gordon, and Michael S. Waterman. The Erd} os-R enyi law in distribution, for coin tossing and sequence matching. Annals of Statistics, 18(2):529{ 570, 1990. [7] Richard Arratia and Simon Tavar e. Independent process approximations for random combinatorial structures. Advances in Mathematics, 104:90{154, 1994. [8] Richard Arratia and Michael S. Waterman. Critical phenomena in sequence match- ing. Annals of Probability, 13(4):1236{1249, 1985. [9] B. Bollob as. Random graphs. Cambridge studies in advanced mathematics. Cam- bridge University Press, 2001. [10] Jean Bourgain, Van H. Vu, and Philip Matchett Wood. On the singularity proba- bility of discrete random matrices. J. Funct. Anal., 258(2):559{603, 2010. [11] James Antonio Bucklew. Introduction to rare event simulation. Springer Series in Statistics. Springer-Verlag, New York, 2004. [12] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algo- rithms. MIT Press, 2001. 94 [13] Alain Denise and Paul Zimmermann. Uniform random generation of decomposable structures using oating-point arithmetic. Theoret. Comput. Sci., 218(2):233{248, 1999. Caen '97. [14] Persi Diaconis and Arun Ram. A probabilistic interpretation of the macdonald polynomials. To Appear, 2010. [15] Philippe Duchon, Philippe Flajolet, Guy Louchard, and Gilles Schaeer. Boltzmann samplers for the random generation of combinatorial structures. Combin. Probab. Comput., 13(4-5):577{625, 2004. [16] P. Erd os. On a lemma of Littlewood and Oord. Bull. Amer. Math. Soc., 51:898{ 902, 1945. [17] Bert Fristedt. The structure of random partitions of large integers. Trans. Amer. Math. Soc., 337(2):703{735, 1993. [18] Jason Fulman, Jan Saxl, and Pham Huu Tiep. Cycle indices for nite orthogonal groups of even characteristic. Transactions of the American Mathematical Society, 2011. [19] Paul Garcia. The Life and Work of Major Percy Alexander Macmahon. PhD Thesis. Open University, 2006. [20] Kimmo Eriksson George E. Andrews. Integer Partitions. Cambridge University Press, 2004. [21] Alison L. Gibbs and Francis Edward Su. On choosing and bounding probability metrics. International Statistical Review, 70(3):419{435, 2002. [22] G H Hardy and S Ramanujan. Asymptotic formulae in combinatory analysis. Proc. London Math. Soc, pages 75 { 115, 1918. [23] Michael T. Heideman, Don H. Johnson, and C. Sidney Burrus. Gauss and the history of the fast Fourier transform. Arch. Hist. Exact Sci., 34(3):265{277, 1985. [24] Je Kahn, J anos Koml os, and Endre Szemer edi. On the probability that a random 1-matrix is singular. J. Amer. Math. Soc., 8(1):223{240, 1995. [25] A. Karatsuba and Yu. Ofman. Multiplication of many-digital numbers by automatic computers. Proceedings of the USSR Academy of Sciences, 145:293{294, 1962. [26] A. A. Karatsuba. The complexity of computations. Trudy Mat. Inst. Steklov., 211(Optim. Upr. i Dier. Uravn.):186{202, 1995. [27] Charles Knessl and Joseph B. Keller. Partition asymptotics from recursion equations. SIAM J. Appl. Math., 50(2):323{338, 1990. [28] Donald E. Knuth. The art of computer programming. Vol. 4, Fasc. 3. Addison- Wesley, Upper Saddle River, NJ, 2005. Generating all combinations and partitions. 95 [29] Donald E. Knuth and Andrew C. Yao. The complexity of nonuniform random number generation. In Algorithms and complexity (Proc. Sympos., Carnegie-Mellon Univ., Pittsburgh, Pa., 1976), pages 357{428. Academic Press, New York, 1976. [30] J anos Koml os. Circulated manuscript. Unpublished, 1977. [31] D. H. Lehmer. On the series for the partition function. Trans. Amer. Math. Soc., 43(2):271{295, 1938. [32] D. H. Lehmer. On the series for the partition function. Trans. Amer. Math. Soc., 43(2):271{295, 1938. [33] D. H. Lehmer. On the remainders and convergence of the series for the partition function. Trans. Amer. Math. Soc., 46:362{373, 1939. [34] D. H. Lehmer. On the remainders and convergence of the series for the partition function. Trans. Amer. Math. Soc., 46:362{373, 1939. [35] Mathematica. Mathematica Edition: Version 8.0. Wolfram Research, Inc., Cham- paign, IL, 2010. [36] MATLAB. version 7.12.0.635 (R2011a). The MathWorks Inc., Natick, Mas- sachusetts, 2011. [37] N. Metropolis and P. R. Stein. On a class of (0; 1) matrices with vanishing determi- nants. J. Combinatorial Theory, 3:191{198, 1967. [38] A. Nijenhuis and H. S. Wilf. A method and two algorithms on the theory of parti- tions. J. Combinatorial Theory Ser. A, 18:219{222, 1975. [39] Albert Nijenhuis and Herbert S. Wilf. Combinatorial algorithms. Academic Press Inc. [Harcourt Brace Jovanovich Publishers], New York, second edition, 1978. For computers and calculators, Computer Science and Applied Mathematics. [40] A. M. Odlyzko. On subspaces spanned by random selections of 1 vectors. J. Combin. Theory Ser. A, 47(1):124{133, 1988. [41] Igor Pak. Partition bijections, a survey. Ramanujan J., 12(1):5{75, 2006. [42] Boris Pittel. On a likely shape of the random Ferrers diagram. Adv. in Appl. Math., 18(4):432{488, 1997. [43] Boris Pittel. Conrming two conjectures about the integer partitions. J. Combin. Theory Ser. A, 88(1):123{135, 1999. [44] James Gary Propp and David Bruce Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. In Proceedings of the Seventh In- ternational Conference on Random Structures and Algorithms (Atlanta, GA, 1995), volume 9, pages 223{252, 1996. 96 [45] Hans Rademacher. A convergent series for the partition function p(n). Proceedings of the National Academy of Sciences, 23:78{84, 1937. [46] Jerey B. Remmel. Bijective proofs of some classical partition identities. J. Combin. Theory Ser. A, 33(3):273{286, 1982. [47] Terence Tao and Van Vu. On random1 matrices: singularity and determinant. Random Structures Algorithms, 28(1):1{23, 2006. [48] Terence Tao and Van Vu. On the singularity probability of random Bernoulli matri- ces. J. Amer. Math. Soc., 20(3):603{628 (electronic), 2007. [49] Terence Tao and Van H. Vu. Additive combinatorics, volume 105 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2010. Paperback edition [of MR2289012]. [50] John von Neumann. Various techniques used in connection with random digits. monte carlo methods. National Bureau of Standards, pages 36{38, 1951. [51] Herbert Wilf. Lectures on Integer Partitions. University of Victoria, 2000. [52] Herbert S. Wilf. Generatingfunctionology. A. K. Peters, Ltd., Natick, MA, USA, 2006. 97
Abstract (if available)
Abstract
This thesis is divided into two areas of combinatorial probability: probabilistic divide-and-conquer, and random Bernoulli matrices via novel integer partitions. ❧ Probabilistic divide-and-conquer is a new method of exact sampling that simulates from a set of objects by dividing each object into two disjoint parts, and pieces them together. ❧ The study of random Bernoulli matrices is driven by the asymptotics of the probability that a random matrix whose entries are independent, identically distributed Bernoulli random variables with parameter 1/2 is singular. Our approach is an inclusion-exclusion expansion for this probability, defining a necessary and sufficient class of integer partitions as an index set to characterize all of the singularities.
Linked assets
University of Southern California Dissertations and Theses
Asset Metadata
Creator
DeSalvo, Stephen Anthony
(author)
Core Title
Probabilistic divide-and-conquer -- a new method of exact simulation -- and lower bound expansions for random Bernoulli matrices via novel integer partitions
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
05/03/2012
Defense Date
02/22/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
asymptotic expansion,Bayes' theorem,Bernoulli orthogonal complement,combinatorial structures,confidence interval,grand canonical ensemble,inclusion-exclusion,integer partitions,lower bound expansions,OAI-PMH Harvest,probability,random Bernoulli matrices,random sampling,Sperner's lemma
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Arratia, Richard (
committee chair
), Goldstein, Larry (
committee member
), Ross, Sheldon M. (
committee member
)
Creator Email
sdesalvo@usc.edu,stephendesalvo@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-27289
Unique identifier
UC11289012
Identifier
usctheses-c3-27289 (legacy record id)
Legacy Identifier
etd-DeSalvoSte-737-0.pdf
Dmrecord
27289
Document Type
Dissertation
Rights
DeSalvo, Stephen Anthony
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
asymptotic expansion
Bayes' theorem
Bernoulli orthogonal complement
combinatorial structures
confidence interval
grand canonical ensemble
inclusion-exclusion
integer partitions
lower bound expansions
probability
random Bernoulli matrices
random sampling
Sperner's lemma