Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Probabilistic methods and randomized algorithms
(USC Thesis Other)
Probabilistic methods and randomized algorithms
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PROBABILISTIC METHODS AND RANDOMIZED ALGORITHMS by Majid Nemati Anaraki A Thesis Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful¯llment of the Requirements for the Degree MASTER OF SCIENCE (STATISTICS) May 2007 Copyright 2007 Majid Nemati Anaraki Dedication To my wife Azita and to my father, my mother and my brothers ii Acknowledgements I would like to express my most profound gratitude to my advisor, Professor Larry Goldstein, for his patience and guidance in this thesis. Working with Professor Gold- steinandlearningfromhimwasavaluableexperience. ProfessorGoldsteinisabrilliant mathematician and a great gentleman. I've bene¯ted tremendously from his knowl- edge, in this thesis and all the classes I had with him. It has been an honor and a privilege to be Professor Goldstein's student. I would like to extend my great appreciation to my other thesis committee mem- bers, Professor Fengzhu Sun and Professor Peter Baxendale of the Department of Mathematics for helping to review this thesis. I would like to take this opportunity to thankProfessorRobertScholtzandProfessorUrbashiMitraofElectricalEngineering, and Professor Jianfeng Zhang, Professor Lei Li, Professor Gary Rosen and Professor Mohammed Ziane of the Department of Mathematics, and Professor Micheal Schneir of the School of Dentistry for all that I learned from them. They always made time to meet with me, discuss and answer all my questions. They are great Professors and have wonderful personalities. MydeepestgratitudegoestotheDepartmentsofMathematicsandalsoCommuni- cation Sciences Institute sta®, especially Amy Yung, Arnold Deal, Milly Montenegro, iii Mayumi Thrasher and Gerrielyn Ramos, for their exceptional administrative help along the way. My deepest appreciation are also dedicated to my cousin Rezvan Nemati and my friends: Behnam Salemi, Reza Omrani, Terry Lewis and all my dear friends in USC and Ultra-Wideband Radio laboratory for their friendship and thoughtful comments and suggestions. Finally, this acknowledgement would not be complete without expressing my sin- cere appreciation to my family, who bring harmony to my life and for their continuous love, support and encouragement. My deepest thanks to my parents Ali and Tahereh, for all the sacri¯ces they've made which enabled me to continue my studies. My Spe- cial thanks to my wife, Azita Taheri for always being there and supporting me. I also would like to thank my brothers Dr. Saeid Nemati and Mr. Amir Nemati for their love and support. None of my achievements would be possible without them. Majid Nemati Anaraki Los Angeles, California May 2007 iv Table of Contents Dedication ii Acknowledgements iii List of Figures vii Abstract viii Chapter 1: Introduction 1 1.1 Motivation for Randomized Algorithms . . . . . . . . . . . . . . . . . 1 1.2 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2: Probability Tools and Techniques 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Examples of Randomness Applications . . . . . . . . . . . . . . . . . . 7 2.2.1 Verifying Matrix Multiplication . . . . . . . . . . . . . . . . . . 7 2.2.2 A Randomized Min-Cut Algorithm . . . . . . . . . . . . . . . . 9 2.2.3 Ramsey Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Cherno® Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1 Martingale Tail Inequalities . . . . . . . . . . . . . . . . . . . . 22 2.5 Poisson Approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5.1 Birthday Problem . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.6 Hash Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter 3: Probabilistic Methods 34 3.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Moment Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.1 First Moment, Expectation . . . . . . . . . . . . . . . . . . . . 35 3.2.2 Second Moment . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Lovasz Local Lemma (LLL) . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.1 Packet Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.2 Hamilton Cycle Problem . . . . . . . . . . . . . . . . . . . . . . 46 v 3.4.3 Vertex-Disjoint Cycles . . . . . . . . . . . . . . . . . . . . . . . 49 Chapter 4: Markov Chains and Random Walks 53 4.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Random Walks on Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Electrical Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4 Markov Model for Fading Communication Channels . . . . . . . . . . 57 4.4.1 Signal Detection using Finite State Markov Chain Model . . . 61 4.5 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.5.1 Markov Chain Monte Carlo (MCMC) Methods . . . . . . . . . 64 4.6 Pure and Accelerated Random Search . . . . . . . . . . . . . . . . . . 68 Bibliography 72 vi List of Figures 2.1 A graph with min-cut set of size 2, and two sample executions of edge reduction iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 A K 5 coloring which does not include a K 3 . . . . . . . . . . . . . . . . 12 2.3 The behavior of function lnn=lnlnn and the probability of maximum load exceeds lnn=lnlnn. . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 The probability that the maximum load of any bin exceeds the value 2. This probability is the ratio of the expected values of functions of the random variables X (m) and Y (m) . . . . . . . . . . . . . . . . . . . . . . 31 3.1 Sketch of the random algorithm to ¯nd the Hamilton cycle. . . . . . . 49 3.2 Maximum number of the events that E v might depend on is k(k+1). 51 4.1 The cover time for networks of n node with di®erent graph topologies. Cover time is not monotone in the number of edges. . . . . . . . . . . 55 4.2 An electric circuit and its equivalent graph of random walk. . . . . . . 56 4.3 BER simulation results for the ¯rst order channel Markov model with D =1, f D T s =0:01. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 vii Abstract An algorithm can be de¯ned as a set of computational steps that transform the input to the output. In the analysis of the e±ciency for algorithms, usually upper bounds on the running time are being considered as the criteria. Probabilistic analysis of algorithms is the method of studying how algorithms perform when the input is taken fromwellde¯ned probabilistic space. In design of algorithms to solve many important problems,randomizedalgorithmsareeitherthefastest,orthesimplestalgorithms,and often both. Hence, randomization becomes an inevitable part of modern algorithm design. In this study, di®erent probabilistic methods and their usefulness will be investi- gated. Useful and well known bounds such as Cherno® bounds and Ramsey theory, ¯rst and second moment methods and Lovasz local lemma (LLL) are considered to analyze the algorithmic and combinatorial problems of interest. Matrix multiplication and randomized min-cut algorithms will be considered to show the power of these methods. Reviewing the properties of Markov chains and random walks on random graphs, and electrical networks models for random walks on random graphs, we analyze and viii obtain some of the parameters which are important in computer networks, commu- nications and signal detection. Pure and accelerated random search algorithms and some of their convergence properties will be considered in the last section. We'll see that the gains in random algorithms come at a price that the answers may have some positive probability of being incorrect. Although it may seem unusual to design an algorithm that may be incorrect, if the probability of error is su±ciently smallthentheimprovementinspeedormemoryrequirementsmaywellbeworthwhile. ix Chapter 1 Introduction 1.1 Motivation for Randomized Algorithms Informally, an algorithm is any well-de¯ned computational procedure that takes some value or set of values as input and produces some value or set of values as output [15]. An algorithm is thus a set of computational steps that transform the input to output. A sorting problem is an example of type of the problems that arises frequentlyinpracticeandprovidesfertilegroundforintroducingmanystandarddesign techniquesandanalysistools. Onekeyaspectofthedesignedalgorithmsistheresulted complexity to get the required solution. Sorting is by no means the only computational problem for which such algorithms have been developed. In fact, analysis of algorithms also provides a way to the theo- retical study of computer program performance and resource usage. In many computer science and mathematical problems, we should be able to solve di®erentproblemse±ciently. Intheanalysisofthee±ciency,usuallyupperboundson the running time is considered as the criteria. This guarantees that the problem can 1 be solved in the estimated time, since the actual running time often depends on the actual input or the set of inputs or the order in which the set of inputs are considered. This worst case analysis provides us with the maximum running time for any input of size n. Other analyses may also be used, e.g., average case analysis, for which we need some kind of statistics about the problem, or occasionally the best case analysis. In general, in the analysis of algorithm the asymptotic calculus is being considered, i.e., the feeling that which term is important and which term may safely be disregarded is crucial. But while there are many e±cient algorithms for many important problems, the question arises \why random?". Why should mathematicians and computer scientists study and use randomness? Computers, as they are, appear to behave far too unpre- dictably [34]. Adding randomness would seemingly be a disadvantage, adding further complications to the already challenging task of e±ciently utilizing the randomness. Science has learned in the last century to accept randomness as an essential com- ponent in modelling and analyzing nature. In physics for example, although Newton's lawsledpeopletobelievethattheuniverseisadeterministicplace,thedevelopmentof quantum theory suggested a rather di®erent view. The universe still behaves accord- ing to laws, but the backbone of these laws is probabilistic. The prevailing theories for sub-particle physics today is based on random behavior and statistical laws. Ran- domness plays a signi¯cant role in almost every ¯eld of science ranging from genetics and evolution in biology to modelling price °uctuation in a free market economy. 2 In design of algorithms to solve many important problems, randomized algorithms are either the fastest, or the simplest algorithms, and often both. Hence, randomiza- tion became the an inevitable part of modern algorithm design. Randomized algo- rithms break the patterns we don't know are already there. These are algorithms that make random choices during their execution. In practice, a random algorithm would use values generated by a random number generator to decide the next step at several branches of its execution. For example, the protocol implemented in an Ethernet card uses random numbers to decide when it should try to access the shared medium for the next time, thereby, breaking the memories and patterns, which prevent di®erent cards from repeatedly trying to access the medium at the same time. Other applications of randomized algorithms can be seen in Monte Carlo simula- tions and primarily testing in cryptography. In these and many other applications, randomized algorithms are signi¯cantly more e±cient than the best known determin- isticsolution. Besides, inmostcasestherandomizedalgorithmsaresimplerandeasier to program. The gains come at a price though; the answers may have some positive probability ofbeingincorrect,orthee±ciencyisguaranteedonlywithsomeprobability. Although it may seem unusual to design an algorithm that may be incorrect, if the probability of error is su±ciently small then the improvement in speed or memory requirements may well be worthwhile. Complexity theory tries to classify computations problem according to their com- putational complexity. In particular, it tries to distinguish between easy and hard 3 problems. For example, complexity theory shows that the travelling salesman prob- lem is an NP-hard problem [15]. It is therefore very unlikely that we can ¯nd an algorithm with can solve the problem in a subexponential time with respect to the number of cities to which the salesman want to travel. Anembarrassingphenomenonfortheclassicalworst-casecomplexitytheoryisthat the problems it classi¯es as hard can usually be solved in practice easily. Probabilistic analysis of algorithms gives theoretical explanation for this phenomenon. Although the problem might be hard to solve in some set of inputs, for most inputs (and these inputs usually occur in real-life applications) the problem is actually easy to solve. More precisely, if we think of the input as being randomly selected according to some probability distribution on the collection of all possible inputs, we are very likely to obtain a problem instance that is easy to solve, and the instances that are hard to solve appear with relatively small probability. Probabilistic analysis of algorithms is the method of studying how algorithms per- form when the input is taken from a well de¯ned probabilistic space. Even NP-hard problems might have algorithms that are extremely e±cient on almost all inputs [34]. 1.2 Organization of the Thesis InChapter2,westartbyreviewingsomeoftheusefulprobabilitytoolsandtechniques which will be the key tools in this thesis. Then in Chapter 3, we focus on probabilistic methods. Theusefulnessandthepowerofthesetechniqueswillbediscussedandsome applications that show the power of these methods will be presented. 4 Finally,Chapter4talksaboutMarkovchainsandrandomwalksonrandomgraphs. We'll discuss some examples, and applications of these methods in signal communi- cation and detection. Modelling random walks on graphs as electrical networks, to estimatesomeof theimportantcharacteristicsofrandomwalks, willalsobepresented in this chapter. Pure and accelerated random search algorithms will conclude the thesis. 5 Chapter 2 Probability Tools and Techniques 2.1 Introduction ErdÄ os is credited as being the pioneer of the probabilistic method, beginning with his seminal 1947 paper [19], Although the probabilistic method had been used in at least two previous occasions by Tur¶ an in 1934 [45] and by Szele in 1943 [42]. The probabilistic method is a combinatorial tool with many applications in com- puter science. In order to apply the probabilistic method, which is simple in concept but often needs a sophisticated combinatorial argument, we need a few extra tools, most notably concentration bounds [22]. This chapter, as a short survey to related materials, will discuss a few of the basic tools and describe some of the ideas which will be used in the later chapters. 6 2.2 Examples of Randomness Applications 2.2.1 Verifying Matrix Multiplication As our ¯rst example, we show how randomness can be used to verify matrix multipli- cation faster than deterministic algorithms. Suppose that we want to verify whether AB=C: (2.1) One straight forward method is to perform the left-hand side matrix multiplication andcompareitwithC. IfA,BandCarealln£nmatrices,thecorrespondingsimple matrixmultiplicationcanbedoneinO(n 3 )time. Thealgorithmwiththelowestknown exponent, which was presented by Don Coppersmith and Samuel Winograd in 1990, has an asymptotic complexity of O(n 2:376 ) [14]. However, randomization helps us to verify (2.1) faster. Consider the random vector r = (r 1 ;r 2 ;¢¢¢ ;r n ), where r i 2f0;1g;8i = 1;¢¢¢ ;n. (2.1) can be veri¯ed by checking the following identity ABr=Cr: (2.2) To verify (2.2), we can ¯rst performBr and thenA(Br), and compare the result with Cr. Thisnewalgorithmrequiresthreematrix-vectormultiplication,whichcanbedone inO(n 2 )timeintheobviousway. TheprobabilitythatthealgorithmreturnsAB=C when they actually are not equal can be bounded using the following theorem [34]. 7 Theorem 2.1. IfAB6=C and ifr is chosen uniformly at random fromf1;ng n , then Pr(ABr=Cr)· 1 2 : Proof. We note that the sample space for the event r is the n-tuple space 0;1 n and the event under consideration is ABr = Cr. Since the r i 's are chosen uniformly at random, then each of the 2 n possible vectors are chosen with probability 2 ¡n . Let D=AB¡C6=0. Then ABr=Cr implies that Dr=0. Since D6=0, it must have a nonzero entry. Without loss of generality, suppose d 11 6= 0. therefore, since Dr=0 n X j=1 d 1j r j =0 hence r 1 =¡ P n j=2 d 1j t j d 11 : (2.3) We use the useful idea which is called principle of di®ered decisions. We choose each element of r (i.e., r k ) independently and uniformly at random from the set f0;1g instead of considering the whole vector r at once. We mentioned that choosing r k in this way is equivalent to choosing a vector r uniformly at random. Now consider the situation just before selecting r 1 . There is at most one choice for which (2.3) holds. Since there are two choices for r 1 , the probability that ABr=Cr is at most 1/2. By 8 considering all variables besides r 1 as have been set, we reduce the sample space to the set of two values f0;1g for r 1 . Formally, this corresponds to conditioning on the previousvaluesforthesetoftheanalysis,whichisamethodforupperboundanalysis. 2.2.2 A Randomized Min-Cut Algorithm Consider a graph example. Let G = (V;E) be a connected, undirected multigraph with n vertices. A multigraph is a graph which may contain multiple edges between any pair of vertices. A cut in G is a set of edges whose removal results in G being broken into two or more separated graphs. A min-cut problem is to ¯nd the minimum cardinality cut-set in G. This problem is of interest in many areas, including the computer network reliability. Having a network of computers according to a given graph with edges corresponding to connections between machines, we are interested in the minimum number of the links such that their break would result in failure in communication of certain parts of the network. Min-cut problems are also of interest in clustering applications. Fig. 2.1 shows an example of a multigraph with min-cut set size of 2. The main operation in the random min-cut algorithm is called edge reduction. At each iteration of edge reduction, we merge two random selected vertices u and v, eliminate all the edgesconnectingthemtoeachotherandretainallotheredgestotherestofthegraph. Thealgorithmconsistsofn¡2iterations,wherenisthenumberofvertices. Theedges remained between the last two remaining vertices are considered as the output of the algorithm. Fig 2.1(a) shows an example of the above process ending to a cut which is not minimum, while in Fig. 2.1(b) the result is a min-cut. 9 1 2 3 4 5 1,2 5 4 3 1,2 3,4 5 3,4 1,2,3 (a) An unsuccessful execution 1 2 3 4 5 1 2,3 4 5 1,2,3 4 5 1,2,3 4,5 (b) A successful execution Figure 2.1: A graph with min-cut set of size 2, and two sample executions of edge reduction iterations. It is easy to verify that any cut-set in the intermediate iterations is a cut-set of the original graph, while the reverse is not true, i.e., not every cut-set in the original graph is a cut-set of the intermediate graphs. Consider a graph G with a particular min-cut set C of size k. G has at least kn=2 edges, otherwise there would be a vertex with less than k edges, which contradicts with min-cut set of size k. Now let the event E i denotes the event of not picking an edge in C in the ith iteration. Since the number of total iterations is i-2, we have prf n¡2 \ i=1 E i g¸ n¡2 Y i=1 µ 1¡ 2 n¡i+1 ¶ = 2 n(n¡1) ¸ 2 n 2 : (2.4) Therefore, theprobabilityof ¯ndinga particular min-cut is larger than 2=n 2 . Suppose we repeat the above algorithm 2=n 2 times with random independent choices at each 10 time. The probability that a min-cut is not found in any of the n 2 =2 attempts is at most µ 1¡ 2 n 2 ¶ n 2 =2 · 1 e . It is seen that the repetition process reduces the probability of failure to 1=e. Furtherrepetitionandmoresophisticatedimplementationsofthisalgorithmwillresult in an acceptably low probability of error while avoiding complications of deterministic algorithms. 2.2.3 Ramsey Theory Schur(1916)provedthatifthenaturalnumbersaredividedinto¯nitelymanyclasses, thenx+y =z hasasolutioninoneclass[10]. Thefundamentalsforpartitiontheorem is the classical Ramsey theory (1930) which considers simple structures. Ramsey theory, named for Frank P. Ramsey, typically ask a question of the form: how many elements of some structure must there be to guarantee that a particular property will hold? In another words,\complete disorder is impossible" (T. S. Motzkin). There is an immense literature about Ramsey theory, its importance and applications. Paul ErdÄ oswastheonewhohasmanycontributionsinmajorresultstoRamseytheoryand recognized for the ¯rst time the importance of partition theorem. To explain the main idea of Ramsey theory, suppose that n pigeons are housed in m pigeonholes. How big must n be to make sure that at least one pigeonhole houses at least two pigeons? The answer is the pigeonhole principle: if n > m, then at least 11 5 4 3 2 1 Figure 2.2: A K 5 coloring which does not include a K 3 . one pigeonhole will have at least two pigeons in it. Ramsey's theory generalizes this principle as explained below. A typical result in Ramsey theory starts with some mathematical structure (e.g., solvability of x+y = z in natural numbers), which is then cut into pieces. How big must the original structure be, in order to ensure that at least one of the pieces has a given interesting property (i.e., x+y =z is solvable in one partition)? As another example, consider a complete graph of order n, i.e., the graph has n vertices (dots) and each vertex is connected to every other vertex by an edge (a line). Acompletegraphoforder3isatriangle. Nowcoloreveryedgeredorblue. Howlarge must n be in order to ensure that there is either a blue triangle or a red triangle? Fig. 2.2 shows an edge coloring of K 5 (i.e., n = 5) with colors red and blue, but does not have a complete monochromatic graph K 3 . Another example is as follows: at any party with at least six people, there are either three people who are all mutual acquaintances (each one knows the other two) or mutual strangers (each one does not know either of the other two). This also is a special case of Ramsey's theorem, which says that for any given integer c, any given integers n 1 ;¢¢¢ ;n c , there is a number, R(n 1 ;¢¢¢ ;n c ), such that if 12 the edges of a complete graph of order (number of vertices) R(n 1 ;¢¢¢ ;n c ) are colored with c di®erent colors, then for some i between 1 and c, it must contain a complete subgraph of order n i whose edges are all color i. The special case above has c = 2 and n 1 = n 2 = 3. We can examine lower bounds to the Ramsey function R(k;t). R(k;t)>n means there exists a two coloring of K n with neither red K k nor blue K t . Theorem 2.2. If µ n k ¶ 2 1¡( k 2 ) <1 then R(k;k)>n. Proof. The proof of this theorem can be done in two methods. In his original paper, ErdÄ os used a counting method for his arguments. The second method is probabilis- tic [41]. Although this theorem can be proved by counting methods as well, the probabilistic methods give us powerful tools speci¯cally for asymptotic and bounding analysis. Method 1, Probabilistic method: Color K n randomly, i.e., assign to each edge a color independently of the other edges, say red or blue, with probability 1 2 . For the set of k vertices S, let A s be the event that S is monochromatic subgraph. Therefore Pr[A s ]=2 1¡( k 2 ) : (2.5) 13 We are interested in calculating the probability that there exist a subset S of vertices of size k such that S includes a monochromatic subgraph. Pr[ [ s A s ]· X s Pr[A s ]= µ n k ¶ 2 1¡( k 2 ) ; (2.6) since there are ¡ n k ¢ terms in the summation. From the assumption, (2.6) is less than one, which means 1¡Pr " [ s A s # >0; (2.7) or 1¡Pr " [ s A s # =Pr " [ s A s # =Pr " \ s A s # >0: (2.8) Equation (2.8) states that the event B = T s A s has a positive probability, and hence, is not a null event. Therefore, there is at least one point in the probability space for which B holds. This point corresponds to a coloring for which no monochromatic subgraph K k exists. Therefore R(k;k)>n. Method 2, Counting method: A counting method also can be used to show the same result. There exist 2 ( n k ) di®erent 2-color edge coloring methods. The number of 2-coloring which contain a monochromatic K k is bounded above by 2 ¡ n k ¢ 2 ( n 2 )¡( k 2 ) . If 2 ¡ n k ¢ 2 ( n 2 )¡( k 2 ) < 2 ( n k ) , there exists a 2-color edge coloring K k which does not contain a monochromatic subgraph. Simpli¯cation of 2 ¡ n k ¢ 2 ( n 2 )¡( k 2 ) <2 ( n k ) results in ¡ n k ¢ 2 1¡( k 2 ) < 1. 14 2.3 Cherno® Bounds Cherno® bounds are extremely powerful methods which give exponentially decreasing bounds on tail distributions. These bounds are derived using Markov's inequality on the moment generating function of a random variable. Here we state some of the related theorems which will be used in later chapters [34]. Theorem 2.3. Let X and Y be two random variables and M X (t) = Efe tX g is the moment generating function (MGF) of X. If M X (t)=M Y (t) for all t2(¡±;±) for some ± >0, then X and Y have the same distribution. For later reference, here we mention the Markov's inequality which is usually the starting point in deriving Cherno® bounds. Markov's Inequality: letX bearandomvariabletakingonlypositivevalues. Then Pr[X ¸a]· EfXg a 8a>0: (2.9) Cherno®Bound forarandomvariableX isobtainedbyapplyingMarkov'sinequal- itytotherandomvariablee tX forsomewellchosenvaluet. Thefollowinginequalities, derived from Markov's inequality, will be used in most cases. Pr[X ¸a]=Pr[e tX ¸e ta ]· Efe tX g e ta ;8t>0; (2.10) Pr[X ·a]=Pr[e tX ¸e ta ]· Efe tX g e ta ;8t<0: (2.11) 15 Byselectingappropriatevaluesfort,boundsforspeci¯cdistributionscanbeobtained. Values if t which minimizes the value of Efe tX g e ta , hence tightening the bound, or simply valuesthatgivestheconvenientsolutionmightbeofinterest. Boundsobtainedin this way are usually called Cherno® bounds. Often, the variables we'll be dealing with are the sum of independent Bernoulli trials. Therefore, knowing their behavior would be extremely useful for the later analysis. Let X 1 ;¢¢¢ ;X n be a sequence of independent Poisson trials, i.e., pr[X i = 1] = p i and pr[X i = 0] = 1¡p i , for 1· i· n. Let X = P n i=1 X i . Note that in the special case thatp i =p, for all 1·i·n, X i 's are called Bernoulli trials and X is the number of success in the Bernoulli trials, i.e., X has Binomial distribution. Apparently EfXg= n X i=1 p i ,¹: (2.12) To compute the moment generating function of X, we have M X i (t)=Efe tX i g=p i e t +(1¡p i ) =1+p i (e t ¡1)·e p i (e t ¡1) : (2.13) Onusingthemomentgeneratingfunctionproperties,sinceX isthesumofindependent random variables, 16 M X (t)= n Y i=1 M X i (t)· n Y i=1 e p i (e t ¡1) =exp ( n X i=1 p i (e t ¡1) ) =e ¹(e t ¡1) : (2.14) TheMGFupperboundonsumoftheindependentpoissontrialsprovidesususeful information to extend the upper bounds to the sum of the Poisson trials. Theorem 2.4. let X 1 ;¢¢¢ ;X n be independent Poisson trials such that Pr[X i ] = p i . Let X = P n i=1 X i and ¹=EfXg. Then the following Cherno® bounds hold: Pr[X ¸(1+±)¹]< µ e ± (1+±) (1+±) ¶ ¹ ;8± >0 (2.15) Pr[X ¸(1+±)¹]·e ¡¹± 2 =3 ;8±; 0<±·1 (2.16) Pr[X ¸R]·2 ¡R ;8R¸6¹ (2.17) Pr[X ·(1¡±)¹]· µ e ¡± (1¡±) (1¡±) ¶ ¹ ;8±; 0<±·1 (2.18) Pr[X ·(1¡±)¹]·e ¡¹± 2 =2 ;8±; 0<±·1 (2.19) Pr[jX¡¹j¸±¹]·2e ¡¹± 2 =3 ;8±; 0<±·1 (2.20) Proof. Proof of these relations can be found in [34,36]. The sketch of the proof is to apply Markov's inequality, for any t>0 Pr[X ¸(1+±)¹]=Pr[e tX ¸e t(1+±)¹ ] · Efe tX g e t(1+±)¹ · e (e t ¡1)¹ e t(1+±)¹ : (2.21) 17 Set t = ln(1+±), 8± > 0 to get (2.15). To prove (2.16), enough to show that for 0<±·1, e ± (1+±) (1+±) ·e ¡± 2 =3 Eq. (2.17) can be proved by setting R = (1 + ±)¹. The rest of the equations are straightforward to be driven from previous ones. The following theorem gives tighter bounds for the special cases. Theorem 2.5. Let X 1 ;¢¢¢ ;X n be independent random variables with Pr[X i =1]=Pr[X i =¡1]= 1 2 ; and let X = P n i=1 X i . Consider the transform Y i = (X i +1)=2 (³.e, Y i 2f0;1g), and let Y = P n i=1 Y i , with ¹ =EfYg =n=2. Then the following relations hold for 8a> 0 and 8± >0: Pr[X ¸a]·e ¡a 2 =2n ; (2.22) Pr[jXj¸a]·2e ¡a 2 =2n ; (2.23) Pr[Y ¸¹+a]·e ¡2a 2 =n ; (2.24) Pr[Y ¸(1+±)¹]·e ¡± 2 ¹ : (2.25) 18 and for any 0<a<¹ and 0<± <1, the following relations hold: Pr[Y ·¹¡a]·e ¡2a 2 =n ; (2.26) Pr[Y ·(1¡±)¹]·e ¡± 2 ¹ : (2.27) Proof. Eq. (2.22) can be proved recognizing that for8a>0, Efe tX i g= 1 2 e t + 1 2 e ¡t : Using the Taylor series expansion for s t , we get Efe tX i g= X i t 2i (2i)! · X i (t 2 =2) i i! =e t 2 =2 : Therefore, Efe tX g= n Y i=1 Efe tX i g·e t 2 n=2 ; and hence, Pr[X ¸a]=Pr[e tX ¸e ta ]· Efe tX g e ta ·e t 2 n=2¡ta : On selecting t=a=n, (2.22) is obtained. Also from symmetry, we get Pr[X ·¡a]·e ¡a 2 =2n : Combining these two inequalities, results in (2.23). The rest of the equations are the straightforward results of the previous ones. 19 Example 2.1. Set balancing [34,36] is an example of how to apply the above bounds. Considerann£mmatrixAwhoseentriesare0or1. Theobjectiveisto¯ndacolumn vector b with entries inf¡1;1g which minimizes kAbk 1 = max i=1;¢¢¢;n jc i j; where c i 's are the elements of c=Ab. This problem usually arises in statistical experiment design. Columns of A represent subjects and the rows of A represent the feature we're considering. Vector b partitions the subjects into two (disjoint) groups, therefore, we can roughly balance the feature of our interest in each group. One of these groups might be the control group in the experiment. Consider the algorithm that chooses b i 's independently from the set f¡1;1g with equal probabilities. With this selection, Efc i g = 0. Therefore the possible deviation of the actual c i 's and kAbk 1 will be important. We consider the ith row of matrix A, a i = (a i1 ;¢¢¢ ;a im ). Let k be the number of 1's in this row. For k · p 4mlnn, ja i :bj = jc i j · p 4mlnn. However, when k ¸ p 4mlnn, the k nonzero terms in the summation corresponding to c i are independent random variables taking values from the setf¡1;1g. Therefore, (2.23) with the choice of a= p 4mlnn results Pr h jc i j> p 4mlnn i ·2e ¡4mlnn=2k · 2 n 2 : The last inequality holds since m ¸ k. if E i is the bad event that jc i j > p 4mlnn, then the probability of the union of bas events is no more than the sum of their probabilities (union bound) is 2=n. In other words, using the simple algorithm of 20 selecting the elements ofb independently and equiprobably from the setf¡1;1g, with probability at least 1¡2=n we ¯nd a vector b which satis¯eskAbk 1 · p 4mlnn. 2.4 Martingales In the simplest de¯nition, a sequence of random variables Z 1 ;Z 2 ;¢¢¢ is a Martingale with respect to the sequence X 1 ;X 2 ;¢¢¢ if, for all n ¸ 0, the following conditions hold [34] ² Z n is a function of X 0 ;X 1 ;¢¢¢ ;X n ; ² EfjZ n jg<1; ² EfZ n+1 jX 0 ;¢¢¢ ;X n g=Z n . A sequence of random variables Z 0 ;Z 2 ;¢¢¢ is called Martingale when it is Martingale with respect to itself. A Martingale sequence can have ¯nite or countably in¯nite number of elements. Some of the properties of Martingale sequences that is useful for our discussion can be found in [26,34,36,37]. Stopping time for the sequence fZ n ;n ¸ 0g is de¯ned as a nonnegative integer- valuedrandomvariableT,iftheeventT =ndependsonlyonthevalueoftherandom variables Z 0 ;Z 1 ;¢¢¢ ;Z n . let Z 0 ;Z 1 ;¢¢¢ be a martingale sequence with respect to the sequence X 1 ;X 2 ;¢¢¢ and T be the stopping time. Then for all i¸0, 1. EfZ i g=EfZ 0 g. AlsoEfZ T g=EfZ 0 g. 21 2. Wald's equation: Let X 1 ;X 2 ;¢¢¢ be nonnegative, iid random variables with dis- tribution X. When T and X have bounded expectation, Ef T X i=1 X i g=EfTggEfXg: 2.4.1 Martingale Tail Inequalities In the previous section, Cherno® bounds of sum of random variables were considered. WesawthatoneofthemainassumptionswhichallowstheapplicationoftheCherno® the bound is that the random variables making up the sum must be independent. However, one of the most useful properties of Martingale sequences for the analysis of algorithm is that Cherno®-like tail inequalities can be applied even when the un- derlying random variables are not independent. The main result can be seen in the Azuma-Hoe®ding inequality [34]. Theorem 2.6. (Azuma-Hoe®dinginequality) Let X 0 ;¢¢¢ ;X n be a martingale such that jX k ¡X k¡1 j·c k : Then for all t¸0 and any ¸>0, Pr[jX t ¡X 0 j¸¸]·2e ¡¸ 2 =(2 P t k=1 c 2 k ) Corollary 2.7. Let X 0 ;¢¢¢ ;X n be a martingale such that for all k¸1, jX k ¡X k¡1 j·c; 22 Then for all t¸1 and ¸>0, Pr[jX t ¡X 0 j¸¸c p t]·2e ¡¸ 2 =2 And the generalized form of the theorem may be expressed as: Theorem 2.8. (Azuma-Hoe®ding inequality-general form) Let X 0 ;¢¢¢ ;X n be a mar- tingale such that B k ·X k ¡X k¡1 ·B k +d k for some constants d k and for some random variables B k that may be function of X 0 ;X 1 ;¢¢¢ ;X k¡1 . Then for all t¸0 and any ¸>0, Pr[jX t ¡X 0 j¸¸]·2e ¡2¸ 2 =( P t k=1 d 2 k ) : For analyzing sums, random walks, or other additive functions of independent random variables, often the central limit theorem, law of large numbers, Chebyshev's inequalityandCherno®'sinequalityarethemaintools. However,foranalyzingsimilar objects where the di®erences are not independent, the main tools are martingales and Azuma's inequality. We can construct a Martingale using the following general approach. The result is called a Doob Martingale. Let X 0 ;X 1 ;¢¢¢ ;X n be a sequence of random variables, 23 and let Y be a random variable with EfjYjg < 1. In general, Y might depend on X 0 ;¢¢¢ ;X n . Then Z i =EfYjX 0 ;¢¢¢ ;X i g ;i=0;1;¢¢¢ ;n; (2.28) gives a martingale with respect to X 0 ;X 1 ;¢¢¢ ;X n , since EfZ i+1 jX 0 ;¢¢¢ ;X i g=E ½ EfYjX 0 ;¢¢¢ ;X i+1 g ¯ ¯ ¯ ¯ X 0 ;¢¢¢ ;X i ¾ =EfYjX 0 ;¢¢¢ ;X i g=Z i : Hence, Z i is Martingale according to de¯nition. Note that on proving that Z has Martingale property, we used the conditional expectation property, i.e., EfVjWg=E © EfVjU;Wg ¯ ¯ W ª : Example 2.2. Pattern Matching [34,40]: Pattern matching is the act of checking for the presence of the constituents of a given pattern. In contrast to pattern recognition, the pattern is rigidly speci¯ed. Pattern matching is used to check that things have the desired structure, to ¯nd relevant structure, to retrieve the aligning parts, and to substitute the matching part with something else. In many applications, including examining DNA structure, a goal is to ¯nd inter- esting patterns in a sequence of characters. Interesting patterns often refer to strings that occur more often than one would expect if the characters were simply generated randomly. The notion of "interesting" is reasonable if the number of occurrences of 24 a string is concentrated around the expectation of the random model. The Azuma- Hoe®ding inequality can show concentration for a random model. Let X =(X 1 ;¢¢¢ ;X n ) be a sequence of characters chosen independently and uni- formly at random from an alphabet §, wherej§j=s. Let B =(b 1 ;¢¢¢ ;b k ) be a ¯xed string of k characters from § and F be the number of occurrences of the ¯xed string B in the random string X. We have EfFg=(n¡k+1) µ 1 s ¶ k : UsingtheDoobMartingale, theAzuma-Hoe®dinginequalitycanbeusedtoshowthat if k is relatively small with respect to n, then the number of occurrences of B in X is highly concentrated around its mean. Set Z 0 =EfFg, and for i·i·n, let Z i =EfFjX 1 ;¢¢¢ ;X i g: The sequence Z 0 ;¢¢¢ ;Z n is Doob Martingale, and Z n =F. Since the length of the ¯xed string is k, each character in the string X can partic- ipate in no more than k matches. Therefore, for any 0·i·n we have ¯ ¯ EfFjX 1 ;¢¢¢ ;X i+1 g¡EfFjX 1 ;¢¢¢ ;X i+1 g ¯ ¯ =jZ i+1 ¡Z i j·k: 25 Applying Theorem 2.6 results Pr £ jF ¡EfFgj¸² ¤ ·2e ¡² 2 =2nk 2 : 2.5 Poisson Approximation Before presenting some useful results for Poisson approximation, we consider the fol- lowing famous example and the di±culty in their analysis. 2.5.1 Birthday Problem Suppose m items are randomly distributed into n bins so that each item has an inde- pendent probability of 1=n of being put to any particular bin. What is the probability that exactly K bins contain exactly J items. This is sometimes called the general- ized birthday problem, because it can be expressed as a question about coincidences between the birthdays of N people for n=365 days. Speci¯cally, we're interested to the questions like \How many people should be in a room before the probability that two of them have the same birthday is at least p", assuming that birthdays are distributed uniformly at random. With m person and n possible birthdays, the probability that all m have di®erent birthday is p= µ 1¡ 1 n ¶µ 1¡ 2 n ¶ ¢¢¢ µ 1¡ m¡1 n ¶ = m¡1 Y j=1 µ 1¡ j n ¶ : Theequationslikethisisdi±culttosolve,althoughinthiscasetheapproximation1¡ k=n¼e ¡k=n givesusacloseansweraslongask¿n. Inthe\ballsintobins"problem, 26 after throwing m balls into n bins, we are interested in general to the distribution of balls and bins. Analyzing the above type of problems is usually hard based on the natural depen- dency that exists in the system. For example if we throw m balls into n bins and ¯nd that bin 1 is empty, it is less likely that bin 2 is also empty since the m balls now should have been distributed in n¡1 bins. Likewise, if we know the number of balls in the ¯rst n¡1 bins, the number of balls in bin n is deterministically known. Itcanbeproved[12,34]thatifX n isabinomialrandomvariablewithparametersn and p, where p is a function of n and lim n!1 np=¸ is a constant that is independent of n, then for any ¯xed k, lim n!1 Pr[X n =k]= e ¡¸ ¸ k k! : This means that after randomly throwing m balls independently and uniformly into n bins, the distribution of the number of balls in a given bin is approximately Poisson with mean m=n. However, one might be interested in approximating the joint distri- bution of the number of balls in all the bins, assuming the number of balls at each beans is an independent Poisson random variable with mean m=n. Suppose that m balls are randomly thrown into n bins independently and uni- formly. Let the random variable X (m) i denotes the number of balls in the ith bin, where 1· i· n. Let Y (m) 1 ;¢¢¢ ;Y (m) n be independent Poisson random variables with mean m=n. The di®erence between throwing m balls randomly into the bins and as- signing a Poisson random variable with mean m=n to each bin is that in the former, thetotalnumberoftheballsisalwaysm,whileinthelattercase,theexpectednumber 27 of balls in all the bins will be m, but the actual number of the total balls is itself a random variable. The following theorems and corollaries are useful in comparing the two case and ¯nding bounds for the exact case for analyzing the related approximate problem [34]. Theorem 2.9. The distribution of (Y (m) 1 ;¢¢¢ ;Y (m) n ) conditioned on P i Y (m) i =k is the same as (X (m) 1 ;¢¢¢ ;X (m) n ), regardless of the value of m. Theorem 2.10. Let f(x 1 ;¢¢¢ ;x n ) be a nonnegative function. Then Eff(X (m) 1 ;¢¢¢ ;X (m) n )g·e p mEff(Y (m) 1 ;¢¢¢ ;Y (m) n )g: Corollary 2.11. Any event that takes place with probability p in the Poisson case, takes place with probability at most pe p m in the exact case. Theorem 2.12. Letf(x 1 ;¢¢¢ ;x n )beanonnegativefunctionsuchthatEff(X (m) 1 ;¢¢¢ ;X (m) n )g is either monotonically increasing or monotonically decreasing in m. Then Eff(X (m) 1 ;¢¢¢ ;X (m) n )g·2Eff(Y (m) 1 ;¢¢¢ ;Y (m) n )g: Corollary 2.13. When n balls are randomly thrown independently and uniformly into n bins, the maximum load (maximum number of balls in the bins) is at least lnn=lnlnn with probability at least 1¡1=n for n su±ciently large. Consider randomly throwing n balls into n bins independently and uniformly. To investigatethee®ectofthevalueofn,wenotethatthefunctionlnn=lnlnngrowsvery slowlywithnaftern=10(seeFig. 2.3(a)). Fig. 2.3(b)showstheempiricalprobability 28 thatthemaximumloadislargerthanlnn=lnlnn,obtainedfromsimulation. Itisseen that the probability drops at n = 94, which is due to the increase the in integer part of the function value at n = 94. Note that in this case m = n according to Corollary 2.13. Nowconsiderthatintheaboveexperimentweareinterestedintheprobabilitythat the maximum load of any bin exceeds the value 2. Fig. 2.4(a) shows this empirical probability for both the original function X (m) and its Poisson approximation Y (m) versusn. Theseprobabilitiesareobtainedfromaveragingonthesampleeventfunction f = max(X (m) i ) > lnn=lnlnn, and therefore Theorem 2.12 holds. Fig. 2.4(b) shows the ratio of R, Eff(X (m) 1 ;¢¢¢;X (m) n )g Eff(Y (m) 1 ;¢¢¢;Y (m) n )g . Clearly the Theorem 2.12 upper bound of R· 2 holds. 2.6 Hash Function A hash function is a deterministic mapping from one set into another that appears random. For example mapping people into their birthdays can be thought as of a hash function. In general a hash function is a mapping f : f0;1;2;¢¢¢ ;n¡ 1g ! f0;1;2;¢¢¢ ;m¡1g. Generally nÀm, for example the number of people in the world is much bigger than the number of possible birthdays. A fundamental property of all hash functions is that if two hashes (according to the same function) are di®erent, then the two inputs are di®erent in some way. This property is a consequence of hash functions being deterministic. On the other hand, a hash function is not injective (i.e., does not associates distinct arguments to distinct values). Thismeansthattheequalityoftwohashvaluesideallystronglysuggests, but 29 0 200 400 600 800 1000 −2 0 2 4 6 8 10 12 n log(n)/log(log(n)) (a) Function lnn=lnlnn 0 100 200 300 400 0 0.2 0.4 0.6 0.8 1 n Probability Process X (n) Process Y (n) (b) Probability that maximum load is at least lnn=lnlnn for random variables X (n) and Y (n) with n bins (Corollary 2.13) Figure2.3: Thebehavioroffunctionlnn=lnlnnandtheprobabilityofmaximumload exceeds lnn=lnlnn. 30 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 n P Process X (m) Process Y (m) (a) Probabilitythatthemaximumloadofanybinexceedsthevalue2fordi®erent values of n and m 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 1.2 1.4 n P (b) The ratio R, Eff(X (m) 1 ;¢¢¢;X (m) n )g Eff(Y (m) 1 ;¢¢¢;Y (m) n )g Figure 2.4: The probability that the maximum load of any bin exceeds the value 2. Thisprobabilityistheratiooftheexpectedvaluesoffunctionsoftherandomvariables X (m) and Y (m) . 31 does not guarantee, the equality of the two inputs. If a hash value is calculated for a piece of data, and then one bit of that data is changed, a hash function with a strong mixing property usually produces a completely di®erent hash value. Although in practice hash functions are deterministic functions, considering them as random function, which uniformly map each member of their domain with proba- bility 1=m into one of the members of their range, provides a good idea of how hash functions work. Suppose that in a computer network we want to make sure that nobody uses a common word (e.g., a name or a common dictionary word) as a password in order to protect the system against hackers. Since hackers can determine the passwords by gaining the password encrypted ¯le and exhaustive search. When a user wants to set his/her password, we want to check the selected password against a dictionary of common words as quickly as possible. One way to do this would be to use the standard search techniques, such as binary search techniques. This would have two negative e®ects. First we need to keep all the dictionary words, which needs a large memory. Second, the search might take a long time. Instead, we can generate and store a table of size m; m¿n, often called Bloom ¯lter, and compare the selected password with the entries of this table. In fact, hash function maps the words of the dictionary to Bloom ¯lter. As explained before, the mapping is such that every several words of the dictionary are mapped to one entry of the Bloom ¯lter. When a password is selected, the system converts it with the same rules as a hash function. If the result is an entry of the Bloom ¯lter, the password will be rejected, 32 otherwise it will be accepted. In this way, it can guaranteed that a common word can not be selected as a valid password, although the probability of rejecting a valid password is 1¡e ¡m=n . One can see that in order to make the probability of rejecting a potentially good password small, m should be selected fairly large. To use space more e±ciently, we can usejHj=h tables, each of size m with di®erent hash functions to make the valid password rejection probability ¡ 1¡e ¡m=n ¢ h . A family of hash functions H = fH i : N !Mg is said to be a perfect hash family if for each set SµN of size s<m there exists a hash function H i 2H that is perfect for S. 33 Chapter 3 Probabilistic Methods 3.1 introduction Probabilistic method is a way of proving the existence of an object. The idea behind the probabilistic method is that in order to prove the existence of a combinatorial object with certain properties (e.g., a proper coloring of the vertices of a graph), we choose our object at random and prove that with positive probability, it satis¯es the required property [22]. The two most fundamental method to prove the positiveness of the probability are the First Moment Method and Lovasz Local Lema. In the context of algorithm, we are frequently interested in explicit construction of the objects rather than just a proof of the existence. However, in many cases the proof of existence obtained by probabilistic methods can be e±ciently converted into randomized construction algorithms. Also in some cases these proofs can be converted into e±cient deterministic construction algorithms. This process is called derandomization [34]. 34 3.2 Moment Methods 3.2.1 First Moment, Expectation One of the basic methods in probabilistic methods is the use of expectation. In fact, in order to prove a certain property in a class (set) of objects, we prove that if we select an object from the set, with positive probability, the object has that property. Anotherwayofdoingthisisto¯ndasubsetoftheobjectssuchthatthepropertyholds on average for the members of the set. Shannon used this method for proving results on communication channels capacity [39]. By calculating the average probability of error in a random class of codebooks we can calculate the channel capacity (i.e., the maximum data rate that can be transmitted through a communication channel with arbitrary low probability of error). Since this capacity is obtained on the average sense, there should be at least one good code with that property. It can be proved [34] that when we have a probability space S and a random variable X de¯ned onS such thatEfXg=¹, then Pr[X ·¹]>0 and Pr[X ¸¹]>0 We can see the immediate application of the above method in graph theory. As it wasmentionedbefore,thecutsareimportantincomputernetworksbothforreliability and diversity calculations. It is well known (see e.g., [15]) that ¯nding the max-cut in an undirected graph is an NP-hard problem. Using the probabilistic First Moment method, it can be shown that the value of maximum cut should be at least half of the total number of edges in the graph. To show this, consider the undirected graph G(V;E) with n vertices and m edges. We randomly divide the vertices in the graph into two subsets, A and B uniformly 35 and independently. This means that each vertex in the graph will be assigned to set A or B with equal probabilities. On the average, the number of vertices in each set will be n=2. Pr[speci¯c edge connects set A to set B] =Pr[¯rst end point in A]Pr[second end point in Bj¯rst end point in A] +Pr[¯rst end point in B]Pr[second edge in Aj¯rst end point in B]= 1 2 : Having m edges, the expected number of the edges connecting set A to B will be m=2. Therefore, at least one partitioning of vertices into two sets A and B with at least m=2 edges connecting them exists. We can ¯nd a lower bound on the probability p that we get a cut larger or equal to m=2. We have m 2 = m X i=1 iPr[cut=i] = X i¸m=2 iPr[cut=i]+ X i·m=2¡1 iPr[cut=i] ·m X i¸m=2 Pr[cut=i]+ ³ m 2 ¡1 ´ X i·m=2¡1 Pr[cut=i] =mp+ ³ m 2 ¡1 ´ (1¡p) Therefore, p¸ 1 1+m=2 : 36 Since if we just randomly divide the graph into two sets, with at least probability p, we ¯nd a cut larger or equal to m=2, therefore the average number of random trials to ¯nd such a cut is m=2+1, which is polynomial time. 3.2.2 Second Moment Markov's inequality Pr[jXj>t]· EfjXjg t ; (3.1) and Chebychev's inequality Pr[jX¡EfXgj¸t]· varfXg t 2 ; (3.2) and its generalized form Pr[jXj¸t]· EfjXj i g t i ; (3.3) are the main tools in the moment methods. It can also be proved [34] from the above inequalities that for a nonnegative integer-valued random variable, Pr[X =0]· varfXg (EfXg) 2 ; (3.4) In many computer networking problems, as we will see in the later chapters, we deal with random variables that take the values from the set f0;1g. For example 37 two nodes in a graph might be connected or disconnected by an edge. The following theorems will be used in the related problems. Let Y i , i = 1;¢¢¢ ;m takes the values from the set f0;1g, and let Y = P m i=1 Y i . Then the following theorems hold [34]. Theorem 3.1. For the random variables as above, var[Y]·EfYg+ X i6=j cov(Y i ;Y j ) Proof. The proof of the theorem comes from the fact that given Y i as above,EfY 2 i g= EfY i g, therefore, varfY i g=EfY 2 i g¡(EfY i g) 2 ·EfY i g. Theorem 3.2. For the random variables as above, Pr[Y >0]¸ m X i=1 Pr[Y i =1] EfYjY i =1g Proof. De¯ne another random variable Z as Z = 8 > > > < > > > : 1=Y ;Y >0 1 ;Y =0 38 Then Pr[Y >0]=EfYZg=E ( m X i=1 Y i Z ) = m X i=1 EfY i Zg = m X i=1 EfY i ZjY i =1gPr[Y i =1]+EfY i ZjY i =0gPr[Y i =0] = m X i=1 EfZjY i =1gPr[Y i =1]= m X i=1 Ef1=YjY i =1gPr[Y i =1] From Jensen's inequality for the convex function f(x)=1=x, we get Pr[Y >0]¸ m X i=1 Pr[Y i =1] EfYjY i =1g : Theorem 3.3. General Cherno®-Hoe®ding Bound: Let Y 1 ;¢¢¢ ;Y n be two valued [0;1] random variable with EfY i g = p i . Let Y = P m i=1 Y i , ¹ = EfYg = P m i=1 p i and let p=¹=m be the average of p i . Then Pr[Y ¸¹+¸]·exp ½ mH p (p+ ¸ m ) ¾ ;0<¸<m¡¹ Pr[Y ¸¹¡¸]·exp ½ mH 1¡p (1¡p+ ¸ m ) ¾ ;0<¸<¹ where H p (y) = yln ³ p y ´ +(1¡y)ln ³ 1¡p 1¡y ´ is the relative entropy of x with respect to p. Note that relative entropy is a non-positive quantity. 39 Proof. The second bound can be obtained by replacing Y by m¡ Y, therefore we prove only the ¯rst bound. Take n = ¹+¸. From Markov's inequality for any t > 0 we have Pr[Y >n]=Pr[e Yt >e nt ]·e ¡nt Efe Yt g Since Y i 's are independent, Pr[Y >n]·e ¡nt Efe t P m i=1 Y i g=e ¡nt m Y i=1 Efe tY i g (3.5) Replacing the values of MGF for each Y i , Pr[Y >n]·e ¡nt m Y i=1 (e t p i +1¡p i )=e ¡nt m Y i=1 ¡ (e t ¡1)p i +1 ¢ : (3.6) Now, by taking the ln(¢) of Q m i=1 ¡ (e t ¡1)p i +1 ¢ , using the concavity of the function f(x)=ln ¡ (e t ¡1)p i +1 ¢ , and considering the monotonicity of ln(¢) function, we get Pr[Y >n]·e ¡nt ¡ (e t ¡1)p+1 ¢ m =expfmln ¡ pe t +(1¡p) ¢ ¡ntg which is true for all values of t > 0. To get the tightest bound, we should ¯nd the minimum value of the righthand side of the above inequality. Di®erentiating with respect to t,we ¯nd that the righthand side is minimum for t = ln ³ n(1¡p) (m¡n)p ´ . We note that since n = ¹+¸, and since 0 < ¸ < m¡¹, then n < m, and therefore, the argument of ln(¢) is positive. On the other hand, n = ¹+¸ > ¹ and p = ¹=m, therefore, p < n=m. Hence n(1¡p) (m¡n)p > 1, which yields that the obtained t is positive 40 and an acceptable value. Therefore, by substituting it into the above inequality, we get Pr[Y >n]·exp ½ mlnm µ pn(1¡p) (m¡n)p +(1¡p) ¶ ¡nln µ n(1¡p) (m¡n)p ¶¾ =exp ½ m(1¡ n m )ln µ (1¡p) 1¡n=m ¶ + n m ln µ p n=m ¶¾ From our de¯nition of n, i.e., n=¹+¸, n m = ¹ m + ¸ m =p+ ¸ m . By substituting in the above inequality, the theorem is proved. The above theorem holds for the random variable X i which 0 < X i < 1, and X i can take any value in the interval [0;1]. The proof will remain the same except that for the MGF in (3.5), we have · instead of = sign, which is compatible with other inequalities. There are other versions of the Cherno®-Hoe®ding which might be more useful. Theirproofandalsomoredi®erentversionsandspecialcasescanbefoundforexample in [22]. We mention two versions of them which will be found useful in the next discussions. Theorem 3.4. Let X 1 ;¢¢¢ ;X n be iid random variables over the bounded domain [0;1], withEfX n g = ¹, 8i. Let ¹ X = 1 n P n i=1 X i be the average of the n terms. Then for all 0<"<1, Pr[j ¹ X¡¹j>"¹]·2e ¡ " 2 n¹ 3 41 Theorem 3.5. Let the random variables X 1 ;¢¢¢ ;X n be independent, with a k ·X k ·b k for each k, for suitable constants a k and b k . Let S n = P X k and let ¹=EfS n g. Then for any t¸0, Pr[jS n ¡¹j¸t]·2e ¡2t 2 = P (b k ¡a k ) 2 : 3.3 Lovasz Local Lemma (LLL) SupposethatanexperimentcanfailifanyoneofthenbadeventsinS =fE 1 ;¢¢¢ ;E n g occurs. We are interested to know if there is a nonzero possibility that the experiment succeeds. Before formally stating the Lovasz Local Lemma, we need the following de¯nitions. We say the eventE is mutually independent of a setfE 1 ;¢¢¢ ;E n g of events if Pr 2 4 E ¯ ¯ \ E i 2T E i 3 5 =Pr[E]; whereT isanysetofeventsortheircomplementsfromS. Ifsomeeventsaremutually independent, then so are their complements. A dependency graph for a set of events E 1 ;¢¢¢ ;E n is a graph G = (V;E) such that V = f1;¢¢¢ ;ng and for i = 1;¢¢¢ ;n, event E i is mutually independent of the eventsfE j j(i;j) = 2Eg. In other words, after representing each event by a vertex,E i is mutually independent of the eventsE j such that (E i ;E i ) is not an edge of the graph. Theorem 3.6. Lovasz Local Lemma [22,34,36]: LetfE 1 ;¢¢¢ ;E n g be a set of events and assume that the following hold: 1- Pr[E i ]·p, 42 2- The degree of dependency graph given by fE 1 ;¢¢¢ ;E n g is bounded by d, 3- 4dp·1. then, Pr " n \ i=1 ¹ E i # >0; where ¹ E i is the complement of the event E i . 3.4 Applications In this section, we apply the previous theorems and probabilistic methods to real problems in engineering and computer networking. We will see that although the problems are hard to solve, probabilistic methods can provide useful bounds. The methods presented to obtain bounds can also be modi¯ed to o®er some algorithm through derandomization. 3.4.1 Packet Routing We apply LLL and probabilistic methods on the Leighton, Maggs and Rao routing algorithmincomputernetworks[31]toobtainboundsontheimportantparametersof the network and the existence of such algorithms. Assume that we have a network of n nodes (switches) and m edges (communication channels). The network is modelled by a undirected graph G. Each node in the graph can ba a source or a destination. Thenodes i wantstosendanarbitrarynumberofpackagestothenode t i . Weassume that the path p i is a pre-speci¯ed path that connects (s i ;t i ). The reason for this assumption is that most routing strategies are divided into two phases, routing the 43 packet through canonical paths to some intermediate destination, and routing the packet through canonical path of intermediate destinations to the ¯nal destination. Therefore this algorithm bene¯ts two times of pre-speci¯ed paths. Further,weassumethateachedgetakesunittimetotraversethepackage,andthat each edge can carry only one packet per unit (of synchronous) time. Also we assume that the paths p i are all edge-simple, i.e., they do not repeat any edge. Our objective is to minimize the maximum time taken by any package to go from the source to its destination. The related parameters for analysis are the congestion c, the maximum number of paths that use the same edge, and the dilation d, which is the length of the largest path among all the paths. Apparently, c and d pose a lower bound on our objective function, since the minimum time for all the packages can not be less than the dilation d or the congestion c. A greedy algorithm (i.e., the algorithm that works based on preventing an edge to be idle) terminates the process in at most cd steps, since any package can be delayed by at most c¡1 other packages at any edge. The question is, "can we do any better than cd?". Ref. [31] shows that there is an schedule of length O(d+c) with constant queue size at each edge, independent of the topology of the network and the number of total packages. Without loss of generality, assume that d =c. Assume that each package is given aninitialrandomdelaychosenuniformlyandindependentlyfromthesetf1;2;¢¢¢ ;acg, for a suitable constant a>1. After the initial delay, each package starts traversing its path without interruption. Although it is likely that our schedule is not infeasible, the LLL can prove that there exist such a schedule (without further interruption) with 44 positive probability. Clearly, such a schedule has an upper bound of ac+d for its number of steps. The following analysis shows that tighter upper bounds are also obtainable. Let's partition the running period into contiguous frames of length (number of time steps) lnc. The idea is that, using LLL, we can prove there exists a schedule with positive probability that at each frame, every edge has a congestion of at most lnc. Assuming that this is proved, select such a schedule (¯x the assigned delays). This means that we've broken up the original problem into (ac+d)=lnc independent sub-problems, each corresponds to a frame, wherein the congestion and the delation is at most lnc. Then we can solve each sub-problem independently and paste together the resulting schedule. Now we use LLL to prove that with positive probability there exists a schedule with the above algorithm such that at each frame, every edge has a congestion of at most lnc. Let E e be the (bad) event that for edge e there is a congestion of more than lnc at any frame. We prove that Pr £T e ¹ E e ¤ >0, and therefore conclude that such an event exist for some scheduling algorithm. Let E 1 (e;F) denote the event that at frame F the congestion of edge e is more than lnc. Since the total number of frames is ac, and congestion c is de¯ned as the maximum number of the paths that use any given edge during the entire transition time;therefore,intherandomizedschedule,theexpectedcongestiononanygivenedge at any given time is at most c=ac = 1=a. Thus the expected value of C(e;F), i.e., the congestion of edge e at frame F, is at most lnc a . Knowing that 0 < C(e;F) < c 45 therefore, 0 < C n (e;F) = C(e;F)=c < 1, and from Cherno®-Hoe®ding bound on Theorem 3.4, we get Pr[E 1 (e;F)],Pr[C(e;F)¸lnc]=Pr · C n (e;F)¸ lnc c ¸ =Pr · C n (e;F)¡ lnc ac ¸ (a¡1)lnc ac ¸ ·2exp ½ ¡ (a¡1) 2 lnc 3ac ¾ ; (3.7) using Union bound, Pr[E e ]· X F Pr[E 1 (e;F)]·2acexp ½ ¡ (a¡1) 2 lnc 3ac ¾ : (3.8) Eq. (3.8) shows that by selecting the constant a large enough, we can take the probability of the bad events E e smaller than any p as required in Theorem 3.6. To upper-bound the degree of dependency graph, we note that any event E e j depends only on other events E e k for which a packet passes both of the edges e j and e k . This degree could at most be cd, which happens when all the packets have the same source, destination and path, with a path dilation of d. Therefore the LLL applies which means that with positive probability there exists a schedule with the above algorithm for which every edge has a congestion of at most lnc t each frame. 3.4.2 Hamilton Cycle Problem A Hamiltonian cycle is a cycle that goes through every vertex of a directed or undi- rected graph exactly once. The Hamiltonian cycle problem (HCP) is the problem of determining whether a Hamiltonian cycle exists in a given graph. 46 In aHamilton path problem the question is to ¯nd the Hamilton path. There is a relation between the two problems. The Hamiltonian path problem for graph G is equivalent to the HCP in a graph H obtained from G by adding a new vertex and connecting it to all vertices of G. HCP is a special case of the travelling salesman problem, in which the distance between two cities is set to a ¯nite constant if they are adjacent and to in¯nity otherwise. HCP was one of Karp's 21 NP-complete problems. In 1974, Garey and Johnson showed that the directed HCP is NP-complete for planar graphs and the undirected HCP is NP-complete for cubic planar graphs [20]. HCT has gained lots of attention in mathematics partly because of its many applications in computer science, DNA studies, etc. Let G n;p denote the model that consists of all the graphs with vertex set V = f1;2;¢¢¢ ;ngwhereeachedgeispresentwithprobabilityp,independentofotheredges. Therefore if G 0 is a graph with vertex set V and it has m edges, then Pr[fG 0 g]=Pr[G=G 0 ]=p m (1¡p) ( n 2 )¡m (3.9) Considering the NP-completeness of ¯nding a Hamiltonian cycle (path), there has been tremendous work on proposing such algorithm, among them random algorithms and random graphs theory proposes algorithms with reasonable complexity (see, e.g., [1,9,22,24,34]). It is known [29] that for p= lnn+lnlnn+c(n) n ; 47 we have, Pr[G2G n;p has a Hamilton cycle ]= 8 > > > > > > > > < > > > > > > > > : 0 ;c(n)!+1 1 ;c(n)!¡1 e ¡e ¡c ;c(n)!c: Ballobas, Fenner and Frieze [9] gave an algorithm that ¯nds a Hamiltonian cycle with high probability (whp), i.e., with probability 1¡O(1), if p > lnn+lnlnn+c(n) n . HereweuseprobabilisticmethodtoprovetheexistenceofHamiltoniancycleingraphs G2G n;p with p¸ 72lnn n¡1 , which was o®ered in [4]. In the proof, we also use the result ofthefamouscouponcollector'sproblem,whichisstatedinthefollowingtheorem[36]. Theorem 3.7. Let the random X denote the number of trials for collecting each of the n types of coupons. Then for any constant C2R, and m=nlnn+cn, lim n!1 Pr[X >m]=1¡e ¡e ¡c : To analyze the limit in the situation where p ¸ 72lnn n¡1 , consider the suggested randomized algorithm for the undirected graph G. The algorithm maintains a path P = fv 1 = s;¢¢¢ ;v k g, where v i 's are all distinct vertices for i = 1;¢¢¢ ;k. Then randomly and uniformly select a vertex v j from the set Vnfsg, such that (v k ;v j )2E. If this vertex is not already in the path, connect this vertex to the previous path, so that the new path is P = fv 1 = s;¢¢¢ ;v k ;v j g. Otherwise, since v j is already in the path, connect v k and v j and disconnect v j and v j+1 . Therefore the resulted path will 48 2 v 1 v 2 - k v k v 1 - k v j v … (a) vj is not in the previous selected vertices 2 v 1 v 1 - j v 1 + j v j v k v … … x (b) vj is in the previous selected vertices Figure 3.1: Sketch of the random algorithm to ¯nd the Hamilton cycle. be P =fv 1 =s;v 2 ;¢¢¢ ;v j¡1 ;v j ;v k ;v k¡1 ;¢¢¢ ;v j+1 g. Fig. 3.1 shows the process of this algorithm. Since for p¸ 72lnn n¡1 , there exists a polynomial time randomized implementation of the node selection part of the algorithm such that every Vnfsg is equally likely to be the new path endpoint [4,7], we can analyze the above algorithm using the coupon collectors problem. Doing so, we see that after the ¯rst 2(n¡1)ln(n¡1) steps, whp every vertex in the Vnfsg has been used as the path endpoint at least once. At this point, although wehaveaHamiltonpath, thereisnoguaranteethatthereisanedgebetweenthepath endpoint and v 1 . The second phase of the algorithm is to repeat the previous phase. Again whp all the vertices in Vnfsg become the path endpoint. The algorithm ends as soon as there is a path between one of the endpoints v k in this phase and v 1 = s, i.e., (v k ;v 1 )2E. Based on high probability of ¯nding the Hamilton cycle, we deduce that such a cycle exists in the class ofG n;p . 3.4.3 Vertex-Disjoint Cycles In the design of telecommunication networks, reliability and survivability plays an importantrole. Theseissuesbecomemoresevereinhighcapacitybutsparsenetworks. 49 Di®erentanalysisandmodelshavebeenproposedbasedondi®erentcostandcapacity models[2]. Inmanyofthesemodels,survivabilityisimprovedbydesigningthenetwork with certain number of disjoint paths between each pair of communications node. The maximum number of sets with vertex-disjoint length restricted paths and vertex disjoint cycles also gets their attention in these studies. Consider a directed graph. Vertex-disjoint cycles in this graph are a collection of cycleswithoutanycommonnode. Everyk-regulardirectedgraphG,i.e.,agraphwith inandoutdegreeequaltok forallvertices,containsatleastc=b k 3lnk cvertex-disjoint cycles [3]. We use the LLL theorem to prove the above bound. ConsiderthepartitionoftheverticesofGintoccomponentsbyrandomlyassigning each vertex to a component independently and uniformly. We show that with positive probability,eachcomponentcontainsacycle. Toprovethis,weshowthateveryvertex in each component has an edge leading to another vertex in the same component, i.e., there exist a path of arbitrary length that does not leave that component. Since each component has ¯nite number of vertices, the arbitrary long path must be a cycle. The bad events here are de¯ned as E v =fvertex v has no output egde to another vertex in the same componentg: Therefore, Pr[E v ]= Y u:(u;v)2E Pr[u and v are in di®erent components] = µ 1¡ 1 c ¶ k <e ¡k=c ·e ¡3lnk =k ¡3 >0; 50 … 2 k k 2 1 1 v k M k 2 1 M 2 1 M k v 2 v 1 v Figure 3.2: Maximum number of the events that E v might depend on is k(k+1). where E is the set of all edges of the graph. Now, we upper-bound the degree d of the dependency graph of the bad events, as stated in Theorem 3.6. In general, Events correspond to out edges. Since the assignments of the nodes to components has been done independently, at most only theeventsthathaveacommondestinationaredependenttoeachother. Therefore,for each nodev, the dependent nodes will be at most the events related tofv;v 1 ;¢¢¢ ;v k g. Toeachofthesek+1nodesk event(outgoingedges)arerelated. Hencethemaximum number of events that event E v might be dependent on, including events of the node v, are k(k+1) (See Fig. 3.2). With d = k(k +1) and p = k ¡3 , the conditions on Theorem 3.6 holds if 4pd = 4k ¡3 (k(k+1)) · 1, which is true for k ¸ 5. Therefore, Pr[ T v2G ¹ E v ] > 0 for k ¸ 5, and hence, the bound c =b k 3lnk c holds. It can be trivially veri¯ed that for k· 4 the bound c=b k 3lnk c is true. A related problem to the one above is the vertex cover problem. In the vertex cover problem, the objective is to ¯nd the smallest set of the vertices so that every edge in the graph has a vertex which is a member of this set. In other words, for the 51 undirected graph G=(V;E), we want to ¯nd U µV such that every edge e2E has anendpointinU. ThevertexcoverproblemisanNP-completeproblemincomplexity theory, and was one of Karp's 21 NP-complete problems. A natural greedy algorithm for vertex cover is to repeatedly choose a vertex with the highest degree, and then delete the vertex and all its adjacent edges from the graph. This method gives poor results and may terminate in a wrong set of vertices. However, the best known algorithm for this problem is as follows. Repeatedly choose an edge at random from the graph, put both of its endpoints in the cover and throw out of the graph these vertices and their adjacent edges, and continue. This method is simple, yet no other algorithm has been proven to de better than within a factor of two (e.g., see [23]). 52 Chapter 4 Markov Chains and Random Walks 4.1 introduction A simple, yet powerful method for modelling random processes is through studying Markov chains. There are vast amount of valuable studies, articles and textbooks discussing about the Markov chains, their properties and their applications to science and engineering (for example in [16,27,28,34,36]). A Markov chain is de¯ned as a Markovprocesswhoserandomvariablescan(withprobability1)onlyassumevaluesin a certain ¯nite or countably in¯nite set [16].This set is usually taken, for convenience, to be the integers 1;¢¢¢ ;N (¯nite case) or integers 1;2;¢¢¢ (in¯nite case). In this chapter we consider some analyses and applications of Markov chains to randomized algorithms and calculating related bounds. Random walks on graphs are usually of special interest in the analysis of randomized algorithms based on Markov chains [32]. 53 4.2 Random Walks on Graphs A random walk on G is a Markov chain de¯ned by the sequence of moves of a particle between vertices of G. In this process, the place of the particle at a given time step is the state of the system. If the particle is at vertex i and if i has d(i) outgoing edges, then the probability that the particle follows the edge (i;j) and moves to a neighbor j is 1=d(i) [34]. Randomwalkongraphshavestrongalgorithmicandnetworkingapplications. Gen- eral related questions of interest (each corresponds to an application or algorithm) are questions like, what is the expected number of steps to go from u to v? How fast does the distribution of the walking point tend to its limit distribution? What is the expected time to visit all the nodes starting at u? Is there any loss of information? One of the important parameters of a random walk on graph is the cover time. Cover time is important in computer networks since it is a criteria related to the possible delays in the network. Letting C u be the expected time (number of steps) for a random walk starting at u to visit all the nodes, the cover time is de¯ned as C =max v2V C v : Cover time of a graph is not a monotone in the number of edges. For example Fig 4.1 shows network of n nodes with di®erent graph topologies. It is known [36] that the covertimeisoftheorderofO(n 3 )forthelollipoptopology,whilethecovertimeforthe 54 n (a) Full connected graph, cover time is O(nlogn) 2 n … … … … 2 n (b) Lollipop graph, cover time is O(n 3 ) … … … … n (c) Linear graph, cover time is O(n 2 ) Figure 4.1: The cover time for networks of n node with di®erent graph topologies. Cover time is not monotone in the number of edges. same number of nodes for a linear and completely connected network is of the order of O(n 2 ) and O(nlogn) respectively. In general, the upper bound C(G)·2m(n¡1) holds for the cover time of the graph G, where n is the number of nodes and m is the number of edges is known. Another important parameter of the random walks is the mixing time. Mixing time refers to the question of how large must t be until the time t the distribution approximates ¼, the unique stationary distribution. 4.3 Electrical Network There is an interesting and detailed relation between probability and potential theory in electric circuits [8,17]. Doyle and Snell also presented methods for analyzing some parameters of random walks on random graphs using electrical networks [18]. Supposethatinarandomwalkprocess,wearewalkingona¯nitenumberofpoints 0;1;¢¢¢ ;N¡1;N. Thepoints1;¢¢¢ ;N¡1arecalledtheinteriorpointsandthepoints 1;N are called the boundary point. The probability P(x) is de¯ned as the probability 55 1 = R 1 = R 1 = R 2 1 = R 2 1 = R 1 = v a b c d (a) Electric circuit 1 1 2 1 5 2 5 2 4 1 4 1 5 1 a b c d (b) Random walk graph Figure 4.2: An electric circuit and its equivalent graph of random walk. that, starting from the node x, reaching to the point N before reaching to the point 0 and stay there. Apparently, P(0) = 0, P(N) = 1 and it satis¯es the averaging property P(x)= 1 2 P(x¡1)+ 1 2 P(x+1), therefore, is a harmonic function. In general, from the uniqueness of harmonic function solutions, and assuming a connected graph, a network can be assigned to an electric circuits which lets us analyze many of the interested properties of some networks easily as follows. To each resistance in the circuit R xy , we assign a graph edge with a transition probability obtained from the following relations. The Markov chain random walk with transition matrix P on G is equivalent to this resistance circuit through P=[P ij ]= · C ij C i ¸ (4.1) where C i = P j C ij and R ij =1=C ij . 56 Then if we have a unit voltage between the nodes a and b, i.e., v a =1 and v b =0, the voltage v x at each node x on the circuit represents the probability that a walker starting from x reaches to b before a. Fig. 4.2 shows the electric circuit and its equivalent random walk model. 4.4 Markov Model for Fading Communication Channels Modelling the time-varying channels typically encountered in mobile communications has received considerable attention [35,38,44]. The importance of an appropriate design for the wireless channel is that with a good model, it is possible to design e±cient receivers. One important model for wireless communications is the fading model. Let a(t) be the low-pass equivalent of a transmitted signal with in-phase a I (t) = Refa(t)g and quadrature a Q (t)=Imfa(t)g components. The channel is considered to be non -frequency selective. The channel also adds the white Gaussian noise to the signal. Therefore, the received signal can be modelled as r(t)=g(t)a(t)+n(t); (4.2) where r(t) is the received signal, n(t) is the additive white Gaussian noise (AWGN) with variance N 0 =2, and g(t) is the random fading process, which is a complex sta- tionary zero mean gaussian process with iid in-phase and quadrature parts q I (t) and 57 g Q (t). In Jakes-Clarke model [25], each component of the complex fading process has an autocorrelation of R g I (¿)=R g Q (¿)= ¾ 2 2 J 0 (2¼f D ¿); (4.3) where J 0 (¢) is the zeroth order Bessel function of the ¯rst kind, f D is the maximum doppler frequency and ¾ 2 is the total power in the fading channel. When the trans- mitted signal is normalized to unit energy, the average signal to noise ratio (SNR) is given by ¾ 2 =N 0 . Su±cient statistics used for signal detection at the receiver are obtained by sam- plingthereceivedsignalatasuitablesamplingperiodT s togetr N¡1 0 =(r 0 ;r 1 ;¢¢¢ ;r N ¡ 1). In the slow fading model, channel fading is assumed to be constant over each sym- bol interval, therefore, r k = g k a k +n k , and the autocorrelation of each component of the discrete fading process is a sampled version of the continuous autocorrelation R g I (kT s )=R g Q (kT s )= ¾ 2 2 J 0 (2¼f D kT s ); k =¢¢¢ ;¡1;0;1;¢¢¢ : The vector g k¡D+1 I;k =fg I;k ;g I;k¡1 ;¢¢¢ ;g I;k¡D+1 g is a D dimensional Gaussian with probability density f G I (g I )= 1 (2¼) D=2 j§j 1=2 exp µ ¡g I § ¡1 g t I 2 ¶ 58 where § is a D£D covariance matrix with elements § i;j = £ R g I ¡ (j¡i)T s ¢¤ D£D : Since the process is stationary, g k¡D+1 I;k can be replaced by D dimensional vector g I . Note that the above discussion also holds for g Q , and the imaginary and real components of the fading are iid Markov. Kong[30]usedaninformation-theoreticconcepttodeterminethattheorderofthe MarkovmodeloftheJakes-Clarkefadingspectrumshouldbe4or5. Incontrast,Wang and Chang [46] used a di®erent information measure to conclude that a ¯rst order Markov process is su±cient to represent a fading channel, although Chu et al. [13] conjectured that a ¯rst order Markov model is insu±cient for Jakes-Clarke fading. They introduce a new criteria which directly related to the receiver error probability. Zhang and Kassam [47] extend the methodology of signal-to-noise ratio partitioning to determine the states of the ¯nite state Markov chain, but did not explicitly address the model order [38]. LetS =fS 0 ;S 1 ;¢¢¢ ;S K D ¡1 gbetheequivalentstatespacefora D th orderstation- ary Markov process with K states. By D th order Markov process we mean that the transition probabilities depend on the past D states. The corresponding transition probability matrix of this process is K D £K D . To construct the Markov model from the Jakes-Clarke model, ¯rst we should con- struct the state space from the D dimensional probability distribution, made of K D 59 non-overlapping regions determined by quantization of the density of f g I (G I ). For region R i , the steady state probabilities can be determined directly from ¼ i =Pr[g I 2R i ]= Z R i f G I (g I )dg I : (4.4) On determining the state transition probabilities, we have P ij =Pr[g k¡D+1 I;k 2R j jg k¡D I;k¡1 2R i ]: (4.5) Bayes rule gives P ij = Pr[g k¡D+1 I;k 2R j ;g k¡D I;k¡1 2R i ] Pr[g k¡D I;k¡1 2R i ] = Pr[g k¡D+1 I;k 2R j ;g k¡D I;k¡1 2R i ] ¼ i : But in the numerator, g k¡D+1 I;k and g k¡D I;k¡1 share D¡1 components. Therefore, the probability in the numerator is the area under a region of D+1 dimension Gaussian density Pr[g k¡D+1 I;k 2R j ;g k¡D I;k¡1 2R i ]= Z R 0 i f G 0 I (g 0 I )dg 0 I where g 0 I =g k¡D I;k =fg I;k ;g I;k¡1 ;¢¢¢ ;g I;k¡D g 60 and f G 0 I (g 0 I )= 1 (2¼) (D+1)=2 j§ 0 j 1=2 exp à ¡g 0 I § 0 ¡1 g t I 2 ! : The entries of the covariance matrix § 0 are § 0 i;j = £ R g I ¡ (j¡i)T s ¢¤ D£D : As before, the above formulation also holds for quadrature component. 4.4.1 Signal Detection using Finite State Markov Chain Model With the above formulation, joint maximum aposteriori probability (MAP) sequence detection and channel estimation on this ¯nite state Markov chain can be performed as follows (^ a N¡1 0 ;^ g n¡1 o ) MAP = max a N¡1 0 ;g n¡1 o Pr[a N¡1 0 ;g n¡1 o ¯ ¯ r N¡1 0 ]: (4.6) Since we assume that g is Markov, g and a are independent and all transmitted sequences are equiprobable, the following term should be maximized N¡1 Y k=0 1 2¼ N 0 2 exp µ ¡ jr k ¡a k g k j 2 N 0 ¶ Pr[g k jg k¡1 ]; (4.7) 61 5 10 15 20 25 30 35 10 −4 10 −3 10 −2 10 −1 10 0 SNR (dB) BER coherent 2 states 4 states 8 states 16 states Figure 4.3: BER simulation results for the ¯rst order channel Markov model with D =1, f D T s =0:01. where Pr[g k jg k¡1 ]=Pr[g I;k jg I;k¡1 ]Pr[g Q;k jg Q;k¡1 ]: (4.8) The decision parameter in (4.7) is the standard distance metric which is multiplied by the Markov transition probabilities. Taking the natural logarithm of (4.7), these ad- ditional multiplications appear as additive terms. If the cardinality of the transmitted signal set is M, a Viterbi algorithm with MK 2D can be used to implement this joint MAP estimates. Fig4.3 shows the bit error probabilities (BEP) simulation results for a¯rstorderMarkovmodelforthechannel,asafunctionofsignal-to-noiseratio(SNR) for D =1, f D T s =0:01. 62 4.5 Monte Carlo Method The Monte Carlo method refers to a collection of tools for estimating values through sampling and simulation. These methods are a widely used class of computational algorithmsforsimulatingthebehaviorofdi®erentphysicalandmathematicalsystems. They are distinguished from other simulation methods by being stochastic, usually by using (pseudo) random numbers, as opposed to deterministic algorithms. A randomized algorithm gives an (²;±) approximation for the value of V if the output X of the algorithm satis¯es Pr[jX¡Vj·²V]¸1¡± Using Chebychev's inequality it can easily be shown that for independent random variables X 1 ;¢¢¢ ;X m with ¹=EfX i g, if m¸ 3ln(2=±) ² 2 ¹ , then [34] Pr "¯ ¯ ¯ ¯ ¯ 1 m m X i=1 X i ¡¹ ¯ ¯ ¯ ¯ ¯ ¸²¹ # ·±: That is, m samples provide an (²;±) approximation for ¹. We should note that there isalowerlimit onthenumberofexperiments mtomakesurethatthealgorithm gives a (²;±) approximation. For example, if in assigning the values for n variables in a boolean formula, the number of satisfying assignments is polynomial in n, and if at each step we sample uniformly at random from one of 2 n possible assignments, then with high probability we must sample an exponential number of assignments before ¯nding the ¯rst satisfying assignment [34]. 63 In fact, when the satisfying assignment is not su±ciently dense in the set of all possible assignments, just uniformly sampling from the possible values may not be su±cient as long as we're not sampling in exponential times. In other words, the ¹ that we attempt to approximate should be su±ciently large so that the sampling is e±cient. 4.5.1 Markov Chain Monte Carlo (MCMC) Methods The Monte Carlo method is based on sampling. It is often di±cult to generate a random sample with the required probability distribution. For example, to count the number of independent sets in a graph we should generate an almost uniform sample fromthesetofindependentsets[34]. Buthowcanwegeneratetherequiredprobability distribution for sampling? The MCMC method provides a general approach to sampling from a general de- sired distribution. The basic idea is to de¯ne an ergodic Markov chain whose set of states is the sample state and whose stationary distribution is the required sampling distribution. Usually it is not di±cult to construct a Markov Chain with the desired properties. The more di±cult part is to determine how many steps are required to converge to the stationary distribution within an acceptable error. A good chain will have rapid mixing, i.e.,the stationary distribution is reached quickly starting from an arbitrary position. ThemostcommonapplicationofMCMCisnumericallycalculatingmulti-dimensional integrals. In these methods, an ensemble of walkers move around randomly. At each point where the walker steps, the integrand value at that point is counted towards 64 the integral. The walker then may make a number of tentative steps around the area, lookingforaplacewithreasonablyhighcontributiontotheintegraltomoveintonext. It can be proved [34] that if we modify the random walk by giving each vertex an appropriate self loop probability, a uniform stationary distribution can be obtained. To do so, consider a ¯nite state space and neighborhood structure fN(X)jx2 g. Let N = max x2 jN(x)j. Let M be any number such that M ¸ N. Consider the Markov chain where P x;y = 8 > > > > > > > > < > > > > > > > > : 1 M ;x6=y and y2N(x) 0 ;x6=y and y = 2N(x) 1¡ N(x) M ;x=y Ifthischainisirreducibleandaperiodic,thenthestationarydistributionistheuniform distribution. Besides,ifweareinterestedinMarkovchainwiththestationarydesiredprobability ¼ x > 0 for the state x, and assuming the chain is irreducible and aperiodic, the following transition probabilities generate such state probability (Metropolis-Hasting algorithm). P x;y = 8 > > > > > > < > > > > > > : 1 M min(1; ¼y ¼x ) ;x6=y and y2N(x) 0 ;x6=y and y = 2N(x) 1¡ P y6=x P x;y ;x=y: BayesianstatisticsandBayesianstatisticalmethods,ingeneral,assumeaparamet- ric model for the data, with the parameters being random variables with some prior 65 knowledge about them. Associated with how much information we have apriori about the parameters, the priors can be vague or informative. The posterior distributions of the parameters are what we're interested in, which depend on the parametric model chosen, the data and the prior distribution. From the joint distribution of the data X and the parameters µ we have p(x;µ)=p(xjµ)p(µ)=p(µjx)p(x): Since the date is observed, it is a ¯xed and known value, i.e., p(x)=1. Hence we get p(µjx)=p(xjµ)p(µ): However,typicallywedon'tknowp(xjµ),butthisisproportionaltolikelihoodfunction, and hence p(µjx)=Kp(µ)£ likelihood function: Calculating K is quite di±cult, and in many cases analytically impossible. Therefore, Bayesian method has many limitations in practice. MCMC methods, on the other hand, canbeusedto¯ndsamplesfromtheposteriordistributiondirectly, withoutthe need to calculate K. MCMC methods are extensively useful when we deal with high dimensional problems, and need to simulate samples from the posterior distribution. TherearemanyvariationstoMCMCmethods,andGibbssamplerisoneofthesimpler ones. 66 Supposethatwewanttoobtainasamplefromthemultivariatedistributionp(µ 1 ;¢¢¢ ;µ k ). The Gibbs sampler generates the samples by repeatedly simulating the samples from the conditional distributions of each component, given the other components. The algorithm can be stated as follows 1-Initialization: initialize the vector µ =(µ (0) 1 ;¢¢¢ ;µ (0) k ). 2- Simulation: a. Simulate µ (1) 1 from the conditional µ 1 j(µ (0) 2 ;¢¢¢ ;µ (0) k ) b. Simulate µ (1) 2 from the conditional µ 2 j(µ (1) 1 ;µ (0) 3 ;¢¢¢ ;µ (0) k ) . . . c. Simulate µ (1) k from the conditional µ k j(µ (1) 1 ;¢¢¢ ;µ (1) k¡1 ) 3- Iteration of the above process. Under some regularity condition, the convergence of the Markov chain to the poste- rior distribution of interest is guaranteed, and therefore, we'll have the µ from that distribution. Suppose that a sequence generator can generate 3 symbols a, b or c, according to a Markov process with the following transition probabilities. P = 0 B B B B B B @ 0:5 0:25 0:25 0:5 0 0:5 0:25 0:25 0:5 1 C C C C C C A which means for example p(ajprevious symbol is a)=0:5: 67 The resulted Markov process is irreducible since all states communicate with each other. If the symbol generator generates b, then what will be the expected symbol which will be generated two symbols later? Apparently, in this case, the apriori probability is p(0) = (0;1;0), and the probability of di®erent symbols will be p(2) = p(0)P 2 = (0:375;0:250;0:375). Assuming the initial state (the current symbol) is c, i.e., p(0) = (0;1;0), we get p(2) = p(0)P 2 = (0:375;0:188;0:438). The probability of di®erent symbols after a su±cient amount of time, e.g., 10 symbols, for both cases that the current symbol is b or c will be p(10) = p(0)P 10 = (0:4;0:2;0:4), which is independent of the original state. 4.6 Pure and Accelerated Random Search Consider a real valued function f with compact support D½R d . Our objective is to ¯nd max x2D f(x): (4.9) Purerandomsearch(PRS)iseasytoimplementforthisproblem. Itconsidersastream of independent and identically distributed random vectors fX i g n i=1 with uniform dis- tribution on D, and then computes M n = maxff(X i ) : 1 = 1;¢¢¢ ;ng. The sequence fM n g is guaranteed to converge, with probability of one, to the essential supremum of f, where esssupf = supfr : Pr[x : f(x) > 0] > 0g, if f is merely measurable. In fact, in the above maximization problem, the essential supremum is equivalent to the maximum when maximum exists [5]. 68 The problem with PRS is its extremely low speed of convergence. Many studies havebeendonetoimprovethespeedofconvergence. Inmostcases,inordertoincrease the speed of convergence, the following framework is used. Given an initial sequence fX i g k i=1 of candidates for argmax x2D f(x), and an associated sequence of probability distributions fP i g k i=1 on D, a new candidate Y is obtained by ¯rst sampling from a distribution P k+1 and then taking X k+1 = 8 > > < > > : Y ;f(Y)>f(X k ) X k ;f(Y)·f(X k ) (4.10) The allowable changes in P K are discussed in di®erent references (e.g., [6,21,33]). Some adaptive methods that adapt more mass to the promising regions of the search space have been discussed by Brunelli et al. [11] and Tang [43]. To speed up the convergence properties of ¯nding the maximum value of f(x), as expressed in (4.9), the accelerated random search (ARS) was proposed [5]. We assume that D is the d-dimensional unit hypercube [0;1] d . For a contraction factor c¸1 and a precision factor ½¸0, the ARS can be stated as Step 0: Set n=1 and r 1 =1. Generate X 1 from a uniform distribution on D. Step 1: Given X n 2D and r n 2(0;1], generate Y n from a uniform distribution on B(X n ;r n ). Step 2: If f(Y n )>f(X n ), then let X n+1 =Y n and r n+1 =1. Else if f(Y n )·f(X n ), then let X n+1 =X n and r n+1 =r n =c. If r n+1 <½, then r n+1 =1. Increment n=n+1 and go to step 1. 69 Generally, the sequence fX n g is called the sequence of record generators and the sequence fM n = f(X n )g is called the record sequence [5]. In order to prove the con- vergence properties of the above algorithm, considerfX n g as a sequence of integrable random variables de¯ned on a probability space (;F;P). Let ¾ n be a sequence of subsigma ¯elds and let N be n integer stopping timefN =ng2¾ n , for all n. De¯ne ¾ N =¾fC :C\fN =ng2¾ n for each ng: First, we prove that the random variables X N and X n satisfy EfX N j¾ N g=EfX n j¾ n g a.s. onfN =ng: (4.11) To do so, consider any A in ¾ N E © EfX N j¾ n g:1 fA\fN<1gg ª =EfX N :1 fA\fN<1gg g = X n EfX N :1 fA\fN=ngg g = X n E © EfX N j¾ N g:1 fA\fN=ngg ª =E (à X n E © EfX N j¾ N g:1 fN=ng ª ! :1 fA\fN<ngg ) (4.12) Since A is arbitrary, we conclude that EfX N j¾ n g= X n EfX N j¾ N g:1 fN=ng a.s. onfN <1g which is equivalent to (4.11). 70 Now, let L = sup n M n . By monotonicity, M n " L as n ! 1. We assume without loss of generality that Pr[X n+1 = X n eventually] < 1, and that fX n+1 6= X n in¯nitely ofteng=fX n+1 =X n eventuallyg c occurs. Let N 1 =minfn:X n+1 6=X n g; and likewise, N k+1 =minfn>N k :X n+1 6=X n g: Clearly, N k is not a stopping time, but N + k = N k +1 is a stopping time. The ran- dom indices N k " 1 as k ! 1. By compactness of [0;1] d , there is a convergent subsequence X N + k i of X N + k which converges to some X as i ! 1. By monotonicity, M N + k i " L as i ! 1 as well. Therefore, we can conclude that the record gener- ating sequence in ARS converges, with probability one, to ess sup f on the event fX n+1 6=X n in¯nitely ofteng. It should be mentioned that di®erent versions of ARS with di®erent convergence properties appropriate for speci¯c application have been proposed in the literature. The concept of many of these algorithms is to sample the regions with higher proba- bility of containing the answer more densely, as we did in ARS. 71 Bibliography [1] M. Ajtai, J. Komolos, and E. Szemeredi. The ¯rst occurence of Hamilton cycles in random graphs. Annals of discrete mathematics, (27):173{178, 1985. [2] D. Alevraz, M. Grostschel, and R. wessaly. Capacity and survivability models for telecommunications networks. Tech. ReportSC 97-24 Konrad-Zuse-Zentrum fur Informationstechnik, Berlin, 1997. [3] N. Alon and C. McDiarmid M. Molloy. Edge-disjoint cycles in regular directed graphs. Journal of graph theory, (22):231{237, 1996. [4] D. Angluin and L. Valiant. Fast probabilistic algorithms for Hamiltonian circuits and matchings. STOC 77: Proceedings of the ninth annual ACM symposium on theory of computing, pages 30{41, 1977. [5] M.J.Appel,R.Labarre,andD.Radulovic. Onacceleratedrandomsearch. SIAM J. Optim, 14(3):708{731, 2003. [6] N. Baba, T. Shoman, and Y. Sawaragi. A modi¯ed convergence theorem for a random optimization algorithm. Inform, (13):159{166, 1977. [7] E.BalasandN.Simoneti. Lineartimedynamicprogrammingalgorithmsforsome new class of restricted TSP's. Proc. IPCO V, LNCS, (1084):316{329, 1996. [8] Richard F. Bass. Probabilistic techniques in analysis. (Probability and its Appli- cations), Springer Verkag, 1995. [9] B. Bollobas, T. I. Fenner, and A. M. Frieze. An algorithm for ¯nding Hamilton paths and cycles in random graphs. Combinatorica, 7(4):327{341, 1987. [10] Bela Bollobas. Random Graphs, Second Edition. Cambridge University Press, 2001. [11] R. Brunelli and G. P. Tecchiolli. Stochastic minimization with adaptive memory. J. Comput. Appl. Math., (57):329{343, 1995. [12] G. Casella and R. L. Berger. Statistical inference. Duxbury, 2002. 72 [13] J.Chu,D.L.Goeckel,andW.E.Stark. OnthedesignofMarkovmodelforfading channels. Proceedings of IEEE Veh. Tech. Conf., VTC99, pages 2372{2376, Sep. 1999. [14] D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progres- sions. J. Symbolic Comput, 9:251{280, 1990. [15] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introdiction to algo- rithms, second edition. Massachusetts Institute of Technology, 2001. [16] J. L. Doob. Stochastic Processes. John Wiley and sons, Inc., 1953. [17] Joseph L. Doob. Classical Potential Theory and Its Probabilistic Counterpart. (Classics in Mathematics), Springer Verkag, reprinted of 1984, 2001. [18] Peter G. Doyle and J. Laurie Snell. Random walks and electric networks. First published in 1984 by the Mathematical Association of America; now freely redis- tributable under the terms of the GNU General Public License, 1984. [19] P. Erdos. Some remarks on the theory of graphs,. Bull. Amer. Math Soc, 53:292{ 294, 1947. [20] M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simpli¯ed NP-complete problems. Proceedings of the sixth annual ACM symposium on Theory of com- puting, pages 47{63, 1974. [21] M.Gaviano. Somegeneralresultsontheconvergenceofrandomsearchalgorithms in minimization problems, in towards the global optimization, L. Dixon and G. Szego eds. North Holland Amsterdam, pages 149{157, 1975. [22] M.Habib,C.McDiarmid,J.Ramirez-Alfonsin,andB.Reed.Probabilisticmethods for algorithmic discrete mathematics. Springer-Verlag, 1998. [23] Eran Halperin. Improved approximation algorithms for the vertex cover problem in graphs and hypergraphs. SIAM Journal on Computing, 31(5):1608 { 1623, 2002. [24] R. W. Hung, S. C. Wu, and M. S. Chang. Hamilton cycle problem on distance heredity graphs. Journal of information science and engineering, (19):827{838, 2003. [25] W. C. Jakes. Microwave Mobile Communication. IEEE Press, Piscataway, NJ, 1993. [26] IoannisKaratzasandStevenE.Shreve. Brownian motion and stochastic calculus, Second edition. Springer, 1991. [27] Samuel Karlin and Howard M. Taylor. A second course in stochastic processes. Academic Press, 1975. 73 [28] Samuel Karlin and Howard M. Taylor. A ¯rst course in stochastic processes, second edition. Academic Press, 1981. [29] J. Komolos and E. Szemeredi. Limit distributions for the existence of Hamilton circuits in a random graph. Discrete Mathematics, (43):55{63, 1983. [30] H. Kong and E. Shwedyk. A measure for the length of probabilistic dependence. Proceedings of International Symposium on Information Theory, Ulm, Germany, (4):469, July 1997. [31] F. T. Leighton, B. Maggs, and S. Rao. Packet routing and jobshop scheduling in O(congestion+dilation) steps. Combinatorica, (14):167{186, 1994. [32] L. Lovasz. Random walks on graphs: A survey. In Combinatorics, Paul Erdos is eighty, 2:353{3973, 1993. [33] J. Matyas. Random optimization. Auton. remote control, (26):246{253, 1965. [34] M.MitzenmacherandE.Upfal.Probabilityandcomputing,randomizedalgorithms and probabilistic analysis. Cambridge University Press, 2005. [35] Andreas F. Molish. Wireless Communications. John Wiley and Sons, 2005. [36] Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1995. [37] Daniel Revuz and Marc Yor. Continuous Martingales and Brownian motion, Third edition. Springer, 1999. [38] M.RiedigerandE.Shwedyk. Communicationsreceiversbasedonmarkovmodels on the fading channel. IEEE Proc. Comm, 150(4):275{279, August 2003. [39] C. E. Shannon. A mathematical theory of communication. Bell Systems tech. Journal, (27):379{423, 623{656, 1948. [40] Shuo-Yen and Robert Li. A Martingale approach to the study of occurrence of sequencepatternsinrepeatedexperiments. The Annals of Probability, 8(6):1171{ 1176, Dec. 1980. [41] Joe Spencer. Ten lectures on the probabilistic method. Society for Industrial and Applied Mathematics (SIAM), 1994. [42] T. Szele. Kombinatorikai vizsgalatok az iranyitott teljes gr'a®al kapcsolatban. Mat. Fiz. Lapok, 50:233{256, 1843. [43] Z.B.Tang. Optimalsequentialsamplingpolicyofpartitionedrandomsearchand its approximation. J. Optim. Theory Appl., (98):431{448, 1998. [44] David Tse and Parmod Viswanath. Fundamentals od Wireless Communications. Cambridge University Press, 2005. 74 [45] P. Turan. On a theorem of Hardy and Ramanujan. J. London Math Soc., 9:274{ 276, 1976. [46] H. S. Wang and P. Chang. On verifying the ¯rst order Markovian assumption for a Rayleigh fading channel model. IEEE Trans. Veh. Tech., 45(2):353{357, 1996. [47] Q. Zhang and S. A. Kassam. Finite state Markov model for Rayleigh fading channels. IEEE Trans. on Comm., 47(11):1688{1692, 1999. 75
Abstract (if available)
Abstract
An algorithm can be defined as a set of computational steps that transform the input to the output. Probabilistic analysis of algorithms is the method of studying how algorithms perform when the input is taken from well defined probabilistic space. In design of algorithms to solve many important problems, randomized algorithms are either the fastest, or the simplest algorithms, and often both.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
UWB indoor diffusion channel model and its application to receiver design
PDF
Application of Bayesian methods in association mapping
PDF
On spectral approximations of stochastic partial differential equations driven by Poisson noise
PDF
On top to random shuffles, no feedback card guessing, and fixed points of permutations
PDF
Stein's method via approximate zero biasing and positive association with applications to combinatorial central limit theorem and statistical physics
PDF
Geometric bounds for Markov Chain and brief applications in Monte Carlo methods
PDF
Exploration of parallelism for probabilistic graphical models
PDF
Probabilistic divide-and-conquer -- a new method of exact simulation -- and lower bound expansions for random Bernoulli matrices via novel integer partitions
PDF
Limiting behavior of high order correlations for simple random sampling
PDF
Optimizing statistical decisions by adding noise
PDF
Stein's method and its applications in strong embeddings and Dickman approximations
PDF
Genomic mapping: a statistical and algorithmic analysis of the optical mapping system
PDF
Analysis of robustness and residuals in the Affymetrix gene expression microarray summarization
PDF
Copula methods and dependence concepts
PDF
The impact of statistical method choice: evaluation of the SANO randomized clinical trial using two non-traditional statistical methods
PDF
Developing statistical and algorithmic methods for shotgun metagenomics and time series analysis
PDF
A correction to the functional enrichment test used in computational biology
PDF
Finite dimensional approximation and convergence in the estimation of the distribution of, and input to, random abstract parabolic systems with application to the deconvolution of blood/breath al...
PDF
Symmetric and trimmed solutions of simple linear regression
PDF
Geometric interpretation of biological data: algorithmic solutions for next generation sequencing analysis at massive scale
Asset Metadata
Creator
Anaraki, Majid Nemati
(author)
Core Title
Probabilistic methods and randomized algorithms
School
College of Letters, Arts and Sciences
Degree
Master of Science
Degree Program
Statistics
Publication Date
04/20/2007
Defense Date
03/28/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,probabilistic methods
Language
English
Advisor
Goldstein, Larry (
committee chair
), Baxendale, Peter H. (
committee member
), Sun, Fengzhu Z. (
committee member
)
Creator Email
manemati@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m426
Unique identifier
UC1117345
Identifier
etd-Anaraki-20070420 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-477861 (legacy record id),usctheses-m426 (legacy record id)
Legacy Identifier
etd-Anaraki-20070420.pdf
Dmrecord
477861
Document Type
Thesis
Rights
Anaraki, Majid Nemati
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
probabilistic methods