Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The theory and practice of benchmarking quantum annealers
(USC Thesis Other)
The theory and practice of benchmarking quantum annealers
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
The theory and practice of benchmarking quantum annealers Joshua Job A Dissertation presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements of the Degree DOCTOR OF PHILOSOPHY (PHYSICS) Copyright Joshua Job December 2018 Degree Conferral Date December 2018 Abstract I present an overview of my work on benchmarking quantum annealing devices, both the experimental work, and my work on understanding the fundamental principles and guidelines required. Here, I provide an overview of much of the work done in the eld, dene various forms of quantum speedup one may search for and the appropriate statistical methods for benchmarking noisy quantum systems with systematic errors dependent on many free parameters. I then apply much of this thinking in several test cases, including random Ising prob- lems, Ising problems with planted solutions (which address a major challenge of benchmarking | namely the diculty of success verication for arbitrary large problems), and the training of a binary classier in the context of a high energy physics seeking to distinguish Higgs boson decays from background processes. I then summarize the lessons learned, and present them to the community in the hopes they can improve the quality and speed of future work in this area. Acknowledgements I would like to thank my advisor Professor Daniel Lidar for his support and mentorship throughout my PhD, as this thesis would not have been possible without him and his dedication to accuracy and tracking down all possible leads. I also want to thank all my collaborators, as they were critical to the completion of our research projects together. I want to thank all the teachers I've had throughout the years, who helped get me to my PhD. I also want to thank my whole family, particularly my parents who indulged my (rather intense) interests in science from a very young age. Without your support of my interests, goals, and development, I wouldn't be where or who I am. To my friends, who I will not name for fear of oending anyone, thank you for serving as walls to bounce ideas o of, and for many a relevant and utterly irrelevant discussion. Finally, I'd like to thank all the brilliant physicists and scientists about whom and about whose work I read about in books and all the authors of science ction that collectively inspired my imagination as a child, and who continue to do so today. Contents 1 Introduction 19 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 The Ising Hamiltonian and quantum annealing . . . . . . . . . . 21 1.2.1 Quantum annealers . . . . . . . . . . . . . . . . . . . . . . 21 1.3 Classical solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.3.1 A brief description of some of the most relevant algorithms for solving Ising-type problems . . . . . . . . . . . . . . . 25 1.4 A quick review of quantum validation testing . . . . . . . . . . . 28 1.4.1 Types of validation: proof of quantumness, quantum supremacy, speedup-inferred quantumness, and classical model rejection 29 1.4.2 Experimental implementations of quantum validation tests 31 1.5 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.5.1 Prior work on benchmarking quantum annealers . . . . . 36 2 Speedup 38 2.1 Classical and quantum annealing of a spin glass . . . . . . . . . . 40 2.2 Considerations when computing quantum speedup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.2.1 Resource usage and speedup from parallelism . . . . . . . 41 2.3 Performance of D-Wave Two versus SA and SQA . . . . . . . . . 41 2.3.1 Performance as an optimizer: comparing the scaling of hard problem instances . . . . . . . . . . . . . . . . . . . 41 2.3.2 Instance-by-instance comparison . . . . . . . . . . . . . . 42 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.5 Supplementary Information . . . . . . . . . . . . . . . . . . . . . 48 2.5.1 Annealing methods . . . . . . . . . . . . . . . . . . . . . . 48 2.5.2 The D-Wave Two Vesuvius device . . . . . . . . . . . . . 50 2.5.3 Gauge averaging . . . . . . . . . . . . . . . . . . . . . . . 51 2.5.4 Annealing and wall-clock times . . . . . . . . . . . . . . . 52 2.5.5 Optimal annealing times . . . . . . . . . . . . . . . . . . . 55 2.5.6 Resource usage and speedup from parallelism . . . . . . . 55 2.5.7 Arguments for and against a speedup on the DW2 . . . . 58 2.5.8 Additional Discussion . . . . . . . . . . . . . . . . . . . . 59 2.5.9 Additional scaling data: range 3 . . . . . . . . . . . . . . 61 1 3 Benchmarking 64 3.1 Please lie down on the couch: Data analysis . . . . . . . . . . . . 65 3.1.1 Paradiso: the ideal solution . . . . . . . . . . . . . . . . . 66 3.1.2 The realistic solution . . . . . . . . . . . . . . . . . . . . . 67 3.1.3 I was blind but now I see: Bayesian nonparametrics . . . 68 3.1.4 Why the Bayesian bootstrap? . . . . . . . . . . . . . . . . 70 3.1.5 To Bayes, or not to Bayes, that is the question: Bayesian v. classical bootstrap . . . . . . . . . . . . . . . . . . . . . 71 3.2 Goldilocks and the three samples sizes: why there is no universal \right" sample size . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.3 The proof is in the pudding: a simulation study . . . . . . . . . . 78 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4 Planted 87 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.2 Frustrated Ising problems with planted solutions . . . . . . . . . 90 4.3 Algorithms and scaling . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4 Probing for a quantum speedup . . . . . . . . . . . . . . . . . . . 98 4.4.1 Dependence on clause density . . . . . . . . . . . . . . . . 98 4.4.2 General considerations concerning scaling & speedup . . . 99 4.4.3 Scaling and speedup ratio results . . . . . . . . . . . . . . 100 4.4.4 Scaling coecient results . . . . . . . . . . . . . . . . . . 101 4.4.5 DW2 vs SAA . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.6.1 Experimental details . . . . . . . . . . . . . . . . . . . . . 111 4.6.2 Error estimation . . . . . . . . . . . . . . . . . . . . . . . 113 4.7 Additional results . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.7.1 Degeneracy-hardness correlation . . . . . . . . . . . . . . 116 4.7.2 Additional easy-hard-easy transition plots . . . . . . . . . 116 4.7.3 Optimality plots . . . . . . . . . . . . . . . . . . . . . . . 116 4.7.4 Additional speedup ratio plots . . . . . . . . . . . . . . . 117 4.7.5 Additional correlation plots . . . . . . . . . . . . . . . . . 120 4.7.6 Additional scaling analysis plots . . . . . . . . . . . . . . 120 4.7.7 SQAS vs SAS . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.7.8 Scale factor histograms . . . . . . . . . . . . . . . . . . . 120 5 Higgs 133 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.2.1 Variable inclusion . . . . . . . . . . . . . . . . . . . . . . 135 5.2.2 Classier performance . . . . . . . . . . . . . . . . . . . . 136 5.3 A note on data collction and error analysis, ie benchmarking, on this problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3.1 Receiver operator characteristic curves and their uncer- tainty analysis . . . . . . . . . . . . . . . . . . . . . . . . 138 2 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.5 Acknowledgements for this chapter and author contributions state- ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.6.1 Problem Construction . . . . . . . . . . . . . . . . . . . . 151 5.6.2 Data collection and analysis . . . . . . . . . . . . . . . . . 152 5.6.3 Weak classier construction . . . . . . . . . . . . . . . . . 152 5.6.4 Mapping weak classier selection to the Ising problem . . 153 5.6.5 Instances and variable inclusion . . . . . . . . . . . . . . . 155 5.7 Additional background and pedagogy . . . . . . . . . . . . . . . . 155 5.7.1 DNN and XGB optimization procedure . . . . . . . . . . 155 5.7.2 Robustness of QAML to MCMC mismodelling . . . . . . 156 5.7.3 Quantum annealing and D-Wave . . . . . . . . . . . . . . 156 5.7.4 Simulated annealing . . . . . . . . . . . . . . . . . . . . . 158 5.7.5 Eect of noise on processor . . . . . . . . . . . . . . . . . 158 5.7.6 Sensitivity to variation of the parameters of weak classier construction . . . . . . . . . . . . . . . . . . . . . . . . . . 159 5.7.7 Dierence between ROC curves plots . . . . . . . . . . . . 159 6 Conclusion 173 6.1 Recent progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.2 Guidelines for benchmarking quantum annealing and related noisy quantum computational devices . . . . . . . . . . . . . . . . . . . 175 A MAB 179 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 A.2 Gauges, gauge-averaging, and gauge selection . . . . . . . . . . . 180 A.2.1 Prior work on gauge selection: the performance estimator 182 A.3 Multi-Armed Bandits, their algorithms, and the MAB properties of the gauge selection problem . . . . . . . . . . . . . . . . . . . 182 A.3.1 Gauge selection as a bandit problem . . . . . . . . . . . . 183 A.3.2 Algorithms for many-armed bandits . . . . . . . . . . . . 184 A.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 A.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 A.5.1 -greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 A.5.2 Boltzmann exploration . . . . . . . . . . . . . . . . . . . . 190 A.6 UCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A.6.1 Thompson sampling . . . . . . . . . . . . . . . . . . . . . 197 A.6.2 Comparing dierent algorithms . . . . . . . . . . . . . . . 199 A.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 3 List of Figures 1.1 An 1152 qubit Chimera graph describing the D-Wave Two X pro- cessor at the University of Southern California's Information Sci- ences Institute. Inactive qubits are marked in red, active qubits (1098) are marked in green. Black lines denote active couplings (where J ij is programmable to be in the range [1; 1]) between qubits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.2 Annealing schedules for the D-Wave Two X processor described in Fig. 1.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3 An example of the view of the Chimera graph in the HFS algo- rithm. Each vertex is one half of a unit cell, and is 2 4 -dimensional. The algorithm repeatedly nds the minimum energy congura- tion of the graph over trees (like the one highlighted) given the state of the remaining vertices (in gray). . . . . . . . . . . . . . . 27 1.4 The eight-spin Ising quantum signature Hamiltonian introduced in Ref. [1]. The inner \core" spins (green circles) have local elds h i = +1 while the outer spins (red circles) have h i =1. All couplings are ferromagnetic: J ij = 1 (black lines). . . . . . . . . . 32 1.5 The 16-spin Ising Hamiltonian composed of two K 4;4 unit cells introduced in Ref. [2]. All couplings are set toJ = 1, all qubits in the left unit cell have a local eld 0<h L < 0:5 applied to them while all spins in the left unit cell have h R = 1 applied to them. Two local minima form, one with the cells internally aligned but in opposite states from each other (a local minimum) and the other with all states aligned with h R (the global minimum). By tightly binding each unit cell, they eectively act like single large spins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.1 Pitfalls when detecting speedup. A) The typical (median) time to nd a ground state at least once with 99% probability for spin glasses with1 couplings using SQA at constant annealing time. The lower envelope of the curves at constantt a corresponds to the total eort at an optimal size-dependent annealing time t opt a (N) and can be used to infer the asymptotic scaling. The initial, relatively at slope at xed N is due to suboptimal per- formance at small problem sizes N, and should therefore not be 4 interpreted as speedup. Annealing times are given in units of Monte Carlo steps (MCS), corresponding to one update per spin. B) The speedup of SQA over SA for two cases. If SQA is run sub- optimally at small sizes by choosing a xed large annealing time t a = 10000 MCS (dashed line) a speedup is feigned. This is due to suboptimal performance on small sizes and not indicative of the real asymptotic behavior when both codes are run optimally (solid line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.2 Scaling of time to solution for the ranges r = 1 (panel A) andr = 7 (panel B). Shown is the scaling of the pure annealing time to nd the ground state at least once with a probability p = 0:99 for various quantiles of hardness, for simulated annealing (SA, dashed) and the DW2 (solid). The solid lines terminate for the highest quantiles because the DW2 did not solve the hardest instances for large problem sizes within the maximum number of repetitions (at least 32000) of the annealing we performed. . . . 46 2.3 Speedup of the DW2 compared to SA. A) and B) for the ratio of the quantiles (RofQ), C) and D) for the quantiles of the ratio (QofR). For a random variable X with distribution p(x) and values in x2 [0;1) we dene, as usual, the qth quantile as R xq 0 p(x)dx =q=100, which we solve forx q and plot as a function of p N. In the QofR case we use x 100q so that high quantiles still correspond to instances that are hard for the DW2. We terminate the curves when the DW2 does not nd the ground state for largeN at high percentiles. In these plots we multiplied Eqs. (2.3) and (2.4) by 512 so that the speedup value atN = 512 directly compares one DW2 processor against one classical CPU. An overall positive slope suggests a possible limited quantum speedup, subject to the caveats discussed in the text. A negative slope indicates that SA outperforms the DW2. . . . . . . . . . . 47 2.4 Instance-by-instance comparison of annealing times. Shown is a scatter plot of the pure annealing time for the DW2 compared to SA using an average over 16 gauges (see 2.5) on the DW2 for A) r = 1 and B) r = 7. The color scale indicates the number of instances in each square. Instances below the diagonal red line are faster on the DW2, those above are faster using SA. In- stances for which the DW2 did not nd the solution with 10000 repetitions per gauge are shown at the top of the frame (no such instances were found for SA). . . . . . . . . . . . . . . . . . . . . 48 2.5 Annealing schedules. A) The amplitudes of the transverse eldsA i (t) [decreasing, blue] and the longitudinal couplingsB(t) (increasing, red) as a function of time. The device temperature ofT = 18mK is indicated by the black horizontal dashed line. B) The linear annealing schedule used in simulated quantum annealing. 49 2.6 Scaling of time-to-solution for SQA. Shown is the time-to- 5 solution A) for range r = 1 and B) for range r = 7. . . . . . . . . 49 2.7 Qubits and couplers in the D-Wave Two device. The DW2 \Vesuvius" chip consists of an 88 two-dimensional square lattice of eight-qubit unit cells, with open boundary conditions. The qubits are each denoted by circles, connected by programmable inductive couplers as shown by the lines between the qubits. Of the 512 qubits of the device located at the University of Southern California used in this work, the 503 qubits marked in green and the couplers connecting them are functional. . . . . . . . . . . . . 50 2.8 Comparing wall-clock times. A comparison of the wall-clock time to nd the solution with probabilityp = 0:99 for SA running on a single CPU (dashed lines) compared to the DW2 [solid lines] using a single gauge choice in the left column and 16 gauges in the right column. A) for range r = 1, B) for range r = 3, C) for range r = 7. Shown are curves from the median (50th quantile) to the 99th quantile. The large constant programming overhead of the DW2 masks the exponential increase of time-to-solution that is obvious in the plots of pure annealing time. . . . . . . . . 53 2.9 Instance-by-instance comparison. Shown is a scatter plot of the total time for the DW2 device (DW) compared to a simulated classical annealer (SA) A and D) for r = 1, B and E) for r = 3, and C and F) for r = 7. A, B and C) wall-clock time using a single gauge on the DW2, and, D, E and F) wall-clock time using 16 gauges on the DW2. The color scale indicates the number of instances in each square. Instances below the diagonal red line are faster on the DW2, those above are faster classically. Instances for which the DW2 device did not nd the solution are shown at the top. SA found a solution for every instance of this benchmark. 56 2.10 Optimal annealing times for the simulated annealer and for the D-Wave device. Shown is the total eort R(t a )t a as a function of annealing time t a for various quantiles of problems withr = 1 andr = 7. A) and B) SA, where the minimum of the total eort determines the optimal annealing time t opt a . C) and D) DW2, where we nd a monotonically increasing total eort, meaning that the optimal time t opt a is always shorter than the minimal annealing time of 20s. . . . . . . . . . . . . . . . . . . . 57 2.11 Scaling of time-to-solution for r = 3. Shown is the scaling for the time to nd the ground state with a probability of 99% for various quantiles of hardness for A) the simulated annealer, B) the simulated quantum annealer, and C) the DW2 device. . . 61 2.12 Speedup for the DW2 device compared to SA for in- stances with r = 3. 16 gauges were used. Left: the ratio of quantiles (RofQ). Right: quantiles of the ratio (QofR). The re- sults are intermediate between the r = 1 and r = 7 results as discussed in the text. . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.13 Instance-by-instance comparison for r = 3. Comparison 6 between time-to-solution for SA and DW2 using pure annealing times and using 16 gauges. . . . . . . . . . . . . . . . . . . . . . 62 2.14 Optimal annealing times for SA. Shown is the time-to-solution for various quantiles as a function of annealing time (Monte Carlo steps) for SA ranger = 1. This supplements Figure 1 in the main text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.1 Results from 10000 frequentist bootstrap samples from a sample of 10 gauges, nine with zero successes and one with all successes. If one tried to interpret this as a distribution of one's uncertatiny about the mean success probability, averaged over all gauges, then one has signicant weight on the mean success rate is 0, even though one actually observed successes, a clear contradiction. 72 3.2 Results from 1000 frequentist bootstrap samples from a set of 100 gauges whose success probability was sampled from a mix- ture distribution of Normal(0:05; 0:003) with weight 0:95 and Normal(0:4; 0:01) with weight 0:05. The multiple peaks arise from the discrete binomial distribution over samples from the high probability component. . . . . . . . . . . . . . . . . . . . . . 74 3.3 Results from 10000 Bayesian bootstrap samples from a set of 100 gauges whose success probability was sampled from a mix- ture distribution of Normal(0:05; 0:003) with weight 0:95 and Normal(0:4; 0:01) with weight 0:05. It is now a smooth density, unlike in the frequentist case. . . . . . . . . . . . . . . . . . . . . 75 3.4 A comparison of the approximate number of gauges required to gather sucient data to know TTS for the easiest 95% of in- stances for various sizes of range-1 Ising problems. Optional stop- ping consumes more than an order of magnitude less time for the largest size, and we should expect this advantage to continue to grow as we go to larger systems with broader distributions for p s . 77 3.5 Wall clock time, estimated based on 15 ms programming time and 180 ms per anneal, vs inverse mean for the various distributions at R=100. The vertical variation along a single mean line are caused by variations in the variation among instances at a par- ticular mean. It is an approximately linear relationship, meaning it will take linearly more time to gather sucient statistics as the probability decreases. . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.6 Wall clock time, estimated based on 15 ms programming time and 180 ms per anneal, vs inverse mean for the various distributions for a threshold of 0.1. The vertical variation along a single mean line are caused by variations in the variation among instances at a particular mean. We notice that R=100 is optimal or near optimal across the distributions. . . . . . . . . . . . . . . . . . . 81 3.7 Histogram of the average relative bias of the optional stopping estimator for 100 simulations across all the listed distributions for dierent thresholds and numbers of runs. We see that a threshold 7 of 0.2 can lead one fairly far astray, while the number of runs per gauge doesn't make a very signicant dierence. However, given that the wallclock time is inversely proportional to the square of the threshold, it may be deemed worth the trade o to accept a slightly greater degree of bias in exchange for signicant reduction in cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.8 Histogram of the performance of the 67 percent credible interval out of 100 simulations for each distribution, across thresholds and runs. The x-axis in each subplot denotes the number of 100 simulations for which the mean was inside the interval. The histogram is out of the 12 distributions presented in this paper. . 83 3.9 Histogram of the performance of the 95 percent credible interval out of 100 simulations for each distribution, across thresholds and runs. The x-axis in each subplot denotes the number of 100 simulations for which the mean was inside the interval. The histogram is out of the 12 distributions presented in this paper. . 84 3.10 Histogram of the performance of the 99 percent credible interval out of 100 simulations for each distribution, across thresholds and runs. The x-axis in each subplot denotes the number of 100 simulations for which the mean was inside the interval. The histogram is out of the 12 distributions presented in this paper. . 85 4.1 Examples of randomly generated loops and couplings on the DW2 Chimera graph. Top: qubits and couplings partic- ipating in the loops are highlighted in green and purple, respec- tively. Only even-length loops are embeddable on the Chimera graph. Bottom: distribution of J values for a sample problem instance with N = 126 spins and edges, and 101 loops. It is virtually impossible to recover the loop-Hamiltonians H j from a given H Ising . The couplings are all eventually rescaled to lie in [1; 1]. We always set the local eldsh i to zero as non-zero elds tended to make the problems easier. . . . . . . . . . . . . . . . . 89 4.2 Time to solution as a function of clause density. Shown is TTS(L;; 0:5) (log scale) for (a) DW2 and (b) HFS, as a function of the clause density. The dierent colors represent the dierent Chimera subgraph sizes, which continue to L = 12 in the HFS case. In both cases there is a clear peak. From the HFS results we can identify the peak position as being at = 0:17 0:01, which is consistent with the peak position in the DW2 results. Error bars represent 2 condence intervals. . . . . . . . . . . . 94 4.3 Frustration fraction. Shown is the fraction of frustrated cou- plings (the number of frustrated couplings divided by the total number of couplings, where a frustrated coupling is dened with respect to the planted solution) as a function of clause density for dierent Chimera subgraphs C L , in the case of loops of length 8, averaged over the 100 instances for each given and N. 8 There is a broad peak at 0:25. This is the clause density at which there is the largest fraction of frustrated couplings, and is near where we expect the hardest instances to occur, in good agreement with Fig. 4.2. . . . . . . . . . . . . . . . . . . . . . . 95 4.4 Sketch of the relation between the log(TTS) curves for optimal and suboptimal annealing times. The annealing time needs to be optimized for each problem size. Blue repre- sents the TTS with a size-independent annealing time t a . Red represents the optimal TTS corresponding to having an optimal size-dependent annealing time t opt a (L), i.e. the lower envelope of the full series of xed annealing time TTS curves. This curve need not be linear as depicted, though we expect it to be linear for NP-hard problems. The blue line upper bounds the red line since by denition TTS DW2 (;t opt a (L)) TTS DW2 (;t a ). The vertical dotted line represents the problem sizeL at whicht a =t opt a (L ). To the left of this linet a >t opt a and the slope of the xed-t a TTS curve lower-bounds the slope of the optimal TTS curve, since for very small problem sizes a large t a results in insensitivity to problem size, and the success probability is essentially constant. The opposite happens to the right of this line, where t a < t opt a , and where the success probability rapidly drops with L at xed t a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.5 Median time-to-solution over instances. Plotted is TTS(L;; 0:5) (log scale) as a function of the Chimera subgraph size C L , for a range of clause densities and for all solvers we tested. Note that only the scaling matters and not the actual TTS, since it is deter- mined by constant factors that vary from processor to processor, compiler options, etc. All algorithms' timing re ects the result after accounting for parallelism, as described in 4.6. Error bars represent 2 condence intervals. The DW2 annealing time is t a = 20s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.6 Speedup ratio. Plotted is the median speedup ratioS X (L;; 0:5) (log scale) as dened in Eq. (4.2) for all algorithms tested. A neg- ative slope indicates a denite slowdown for the DW2. A posi- tive slope indicates the possibility of an advantage for the DW2 over the corresponding classical algorithm. This is observed for > 0:4 in the comparison to SAS, SQA, SSSV, and HFS (see Fig. 4.10 for a more detailed analysis). Error bars represent 2 condence intervals. . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.7 Success probability correlations. The results for all instances at = 0:35 are shown. Each datapoint is the success probability for the same instance, (a) for DW2 at t a = 20s and t a = 40s, (b) for DW2 att a = 20s and SAA at 50000 sweeps and f = 5 [in dimensionless units, such that max(J ij ) = 1]. Perfect correlation means that all data points would fall on the diagonal, and a strong correlation is observed in both cases. The data is colored by the 9 problem size L and shows a clear progression from high success probabilities at small L to large success probabilities at small L. Qualitatively similar results are seen for t a = 20s vs t a = 40s at all values, and for DW2 vs SAA at intermediate values (see Figs. 4.21 and 4.22 in Section 4.7). . . . . . . . . . . . . . . 104 4.8 Correlation between the DW2 data for two dierent an- nealing times and SAA. Plotted is the normalized Euclidean distance D(~ p 1 ;~ p 2 ) for ~ p 1 and ~ p 2 being, respectively, the ordered success probability for DW2 at t a = 20s and t a = 40s (blue circles), DW2 at t a = 20s and SAA (yellow squares), DW2 at t a = 40s and SAA (green diamonds). For comparison, the Euclidean distance between two random vectors with elements 2 [0; 1] is 0:4. SAA data is for 50;000 sweeps and f = 5. The correlation with SAA degrades slightly for t a = 40s. Error bars represent 2 condence intervals and were computed using boot- strapping (see Appendix 4.6.2 for details). In each comparison, to construct~ p 1 and~ p 2 we xed and used half the instances (for bootstrapping purposes) for L2 [2; 8]. . . . . . . . . . . . . . . . 105 4.9 Scaling coecients of the number of runs. Plotted here is b() [Eq. (4.6)] for the DW2 at all annealing times (overlapping solid lines and large symbols), for the HFS algorithm, for SAS with an optimal number of sweeps, for SAS with noise and an optimal number of sweeps, and for SAA with a large enough num- ber of sweeps that the asymptotic distribution has been reached at f = 5. The scaling coecients of HFS and of optimized SAS each set an upper bound for a DW2 speedup against that par- ticular algorithm. In terms of the scaling coecient the DW2 result is statistically indistinguishable (except at = 0:1) from SAA run at S = 50;000 and f = 5. The coecients shown here are extracted from ts withL 4 (see Fig. 4.25a in Section 4.7). Error bars represent 2 condence intervals. . . . . . . . . . . . . 106 4.10 Dierence between the scaling coecients. Plotted here is the dierence between the scaling coecients in Fig 4.9, b X () b DW2 (), where X denotes the HFS algorithm, SAS with an op- timal number of sweeps, SAS with noise and an optimal number of sweeps, or SAA with S = 50; 000 and f = 5. When the dif- ference is non-positive there can be no speedup since optimizing t a can only increase b DW2 (); conversely, when the dierence is positive a speedup is still possible, i.e., not accounting for the error bars, for 0:05 and 0:4 for HFS, and for 0:15 and 0:35 for SAS without noise. These ranges shrink if the error bars are accounted for, but notably, for most values SAS with noise does not disallow a limited speedup, suggesting that control noise may play an important factor in masking a DW2 speedup. Error bars represent 2 condence intervals. . . . . . . 107 4.11 Scaling of SAA at dierent nal temperatures. SAA is run 10 at S = 50;000 and various nal inverse temperatures. The peak at 0:2 remains a robust feature. The data marked \Noisy" is with 5% random noise added to the couplings J ij . . . . . . . . 108 4.12 Annealing schedule of the DW2. The annealing curves A(t) and B(t) are calculated using rf-SQUID models with indepen- dently calibrated qubit parameters. Units of~ = 1. The operat- ing temperature of 17mK is also shown. . . . . . . . . . . . . . . 112 4.13 The DW2 Chimera graph. The qubits or spin variables oc- cupy the vertices (circles) and the couplings J ij are along the edges. Of the 512 qubits, 503 were operative in our experiments (green circles) and 9 were not (red circles). We utilized subgraphs comprising LL unit cells, denoted C L , indicated by the solid black lines. There were 31; 70; 126; 198; 284; 385; 503 qubits in our C 2 ;:::;C 8 graphs, respectively. . . . . . . . . . . . . . . . . . . . 115 4.14 Ground state degeneracy. Number of unique solutions as a function of clause density for dierent Chimera subgraph sizes C L . As the clause density is increased, the number of unique solu- tions found decreases to one (up to the global bit ip symmetry). Shown is the median degeneracy, i.e., we sort the degeneracies of the 100 instances for each value of L and , and nd the me- dian. Our procedure counts the degenerate solutions and stops when it reaches 10 5 solutions. If the median has 10 5 solutions then we assume that not all solutions were found and hence the degeneracy for that value of L and is not plotted. These are solutions on the used qubits (e.g., there are many instances for each at L = 8 that use < 503 qubits); to account for the n uq unused qubits we multiply the degeneracy by 2 nuq . . . . . . . . . 117 4.15 Scatter plot of the TTS for the HFS algorithm and the degeneracy atL = 8 and = 0:4 (100 instances total). Even though there is a wide range of degeneracy over several orders of magnitude, we do not observe any trend in the TTS. The degeneracy accounts for the fact that some qubits are not coupled into the problem (e.g., ifn qubits are not specied for that particular problem, then the degeneracy is 2 n times the directly counted degeneracy). The Pearson correlation coecient is0:046. . . . . . . . . . . . . . . 118 4.16 Comparison of the 25th (left) and 75th (right) percentiles of the TTS (log scale) for all algorithms as a function of clause density. The dierent colors represent the dierent Chimera sizes tested. All solvers show a peak at the same density value of 0:2. . . 119 4.17 Suboptimal annealing time and optimal sweeps for = 0:2. Plotted is the TTS (log scale) as a function of size L for (a) the DW2, with all available annealing times, (b) SQA, (c) SA, and (d) SSSV, for many dierent sweep numbers. The lower envelope gives the scaling curves shown in Fig. 4.5 for = 0:2. The TTS curves atten at high L for the following reason: each classical annealer was run N X times (N SA =N SSSV = 10 4 , N SQA = 10 3 ), 11 and our distribution is (0:5;N X + 0:5) for 0 successes, which has an average value of 1=(2N X ). This re ects the (Bayesian) information acquired after N X runs with 0 successes (one would not expect the probability to be 0). The attening has no impact on the scale of the optimal number of sweeps. . . . . . . . . . . . 122 4.18 TTS (log scale) for L = 8 as a function of number of sweeps for DW2, SQA, SA, and SSSV used to identify the optimal number of sweeps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.19 Scaling of the SAA optimal number of sweeps. The op- timal number of sweeps is extracted for each L from Fig. 4.18. The scaling is roughly exponential for the smaller values, and appears to be close to exponential for the larger values. Lines are guides to the eye. . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.20 The speedup ratios (log scale) for the 25th and 75th percentile of time-to-solution as a function of system size for various . The dierent colors denote a representative sample of clause densities. 125 4.21 Success probability correlations. The results for all instances at all values are shown for the DW2 data att a = 20s andt a = 40s. This complements Fig. 4.7; the color scheme corresponds to dierent sizes L as in that gure. . . . . . . . . . . . . . . . . 126 4.22 Success probability correlations. The results for all instances at all values are shown for the DW2 data att a = 20s and SAA withS = 50;000 and f = 5. This complements Fig. 4.7; the color scheme corresponds to dierent sizes L as in that gure. The correlation gradually improves from poor for the lowest clause densities to strong at = 0:35, then deteriorates again. . . . . . 127 4.23 Success probability correlations. The results for all instances at all values are shown for the DW2 data at t a = 20s and SQAA with S = 10;000 and = 5. This complements Fig. 4.7; the color scheme corresponds to dierent sizes L as in that g- ure. The correlation gradually improves from poor for the lowest clause densities to strong at = 0:35, then deteriorates again. . . 128 4.24 Success probability correlations. The results for all instances at all values are shown for the SQAA data at S = 10;000 and = 5 and SAA with S = 50;000 and f = 5. This complements Fig. 4.7; the color scheme corresponds to dierent sizes L as in that gure. The correlation gradually improves from poor for the lowest clause densities to strong at = 0:35, then deteriorates again. Note that we did not attempt to optimize the correlations between the two methods. . . . . . . . . . . . . . . . . . . . . . . 129 4.25 Exponential ts to the DW2 number of runs. In accor- dance with Eq. (4.6), the least-squares linear ts to log[r(L;; 0:5)] are plotted in (a) for t a = 20s. To reduce nite-size scaling ef- fects we exclude L = 2; 3 and perform the t for L 4. The intercept of the linear ts is the the DW2 scaling constant a() from Eq. (4.6), and is shown in (b). Error bars represent 2 12 condence intervals. . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.26 (a) The DW2 scaling coecients b() from Eq. (4.6) for SAA at various sweep numbers and f = 5, along with the DW2 scaling coecients for all t a s we tried. (b) The SQAS and SAS scaling coecient b() at the optimal number of sweeps for each. SAS had a nal temperature of f = 5 and SQAS was operated at = 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 4.27 Histograms of the scale factor of the instances as a function of system size for various values of . All problems passed to DW2 must have all coupler in the range [1; 1], so all couplers in a problem are rescaled down by a factor equal to the maximum absolute value of the couplers in the problem (and hence this quantity is called the scale factor). Since internal control error (ICE) is largely instance-independent, the larger the scaling fac- tor of an instance, the worse the relative impact of ICE will be. We see a drift to larger scaling factors for with increasing size L and increasing clause density. Larger values of obviously will have larger scale factors as there are on average more loops per qubit (and thus, per edge) and thus a larger maximum potential coupler strength. Since the edges included in a loop are generated randomly, the more edges available at a xed clause density, the more opportunities for a single edge to be included in many loops by chance, resulting the average scale factor to drift upward as function of problem size. . . . . . . . . . . . . . . . . . . . . . . 132 5.1 Representative Feynman diagrams contributing to the sim- ulated distributions for Higgs signal and Standard Model back- ground. The signal is a Higgs production through gluon-fusion which decays into two photons (top). Representative Leading Or- der and Next-to-Leading Order background processes are Stan- dard Model two photon production processes (bottom). . . . . . 142 5.2 Distributions of the eight kinematical variables we used to construct weak classiers. The signal distribution is in green and solid, background in blue and dotted. The vertical axis is the raw count of the number of events. The total number of events simulated in each case is 307732. . . . . . . . . . . . . . . . . . . 143 5.3 ROC curves for the annealer-trained networks (DW and SA) at f = 0:05, DNN, and XGB. Error bars are dened by the variation over the training sets and statistical error. Both panels show all four ROC curves, with solid lines for DW (green) and SA (blue), and dotted lines for DNN (red) and XGB (cyan). Panels (a) and (c) [(b) and (d)] includes 1 error bars only for DW and DNN [SA and XGB], in light blue and pale yellow, respectively. Results shown are for the 36 variable networks at = 0:05 trained on 100 events for panels (a) and (b), and on 20; 000 events for panels (c) and (d). For 100 events the annealer trained networks have a 13 larger area under the ROC curve, as shown directly in Fig. 5.4. The situation is reversed for 20; 000 training events, where the error bars are too small to be visible. The much smaller error bars are due to the increased number of events. . . . . . . . . . . 144 5.4 Area under the ROC curves (AUROCs) for the annealer- trained networks (DW (green) and SA (blue), solid lines) at f = 0:05, and the conventional approaches (DNN (red) and XGB (cyan), dotted lines). The vertical lines denote 1 error bars, dened by the variation over the training sets (grey) plus sta- tistical error (green); see the SI (Sec. 6) for details of the uncer- tainty analysis. While DNN and XGB have an advantage at large training sizes, we nd that the annealer-trained networks perform better for small training sizes. Results shown are for the 36 vari- able networks at = 0:05. The overall QAML performance and its features, including the advantage at small training sizes and saturation at 0:64, are stable across a range of values for. An extended version of this plot with various values of is available in the SI in Fig. 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.5 Dierence between AUROCs of (a) DW vs DNN, (b) DW vs XGBoost, and (c) DW vs SA, as a function of training size and fraction f above the minimum energy returned (the same values of f are used for DW and SA in (c)). Formally, we plot R 1 0 [r (DW) B ( S ) r (i) B ( S )]d S , where i 2 fDNN, XGBoost, SAg. The vertical lines denote 1 error bars. The large error bars are due to noise on the programmed Hamiltonian. . . . . . . . . . . . 146 5.6 The ROC curves for the annealer-trained networks (DW and SA) atf = 0:05, DNN, and XGB. Error bars are dened by the varia- tion over the training sets and statistical error. Both panels show all four ROC curves. Panel (a) [(b)] includes 1 error bars only for DW and DNN [SA and XGB], in light blue and pale yellow, respectively. Results shown are for the 36 variable networks at = 0:05 trained on 100 events. The annealer trained networks have a larger area under the ROC curve . . . . . . . . . . . . . . 148 5.7 A reproduction of Fig. 5.4 from the main text, now including the optimal strong classier found by SA at f = 0 for various values of the regularization parameter = 0:; 0:1; 0:2. We nd that this parameter has negligible impact on the shape of the AUROC curve, and that performance for SA always saturates at 0:64, with an advantage for QAML (DW) and SA over XGB and DNNs for small training sizes. . . . . . . . . . . . . . . . . . 149 5.8 An 1152 qubit Chimera graph, partitioned into a 12 12 array of 8-qubit unit cells, each unit cell being a K 4;4 bipartite graph. Inactive qubits are marked in red, active qubits in green. There are a total of 1098 active qubits in the DW processor used in our experiments. Black lines denote active couplers. . . . . . . . . . . 161 14 5.9 Annealing schedule used in our experiments. . . . . . . . . . . . . 162 5.10 A plot of the minimum energy returned by the DW as a function of chain strength, rescaled by the number of training samples. I.e., for training size N, we plot E m =N for minimum return en- ergy E m , where N is given in the legend. . . . . . . . . . . . . . 162 5.11 Plot of (E m E 0 )=E 0 for minimum energy returned E m and true ground state energyE 0 , i.e., the minimum fractional reserve energy, averaged over the training sets, for each size and chain strength. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.12 The integral of the dierence of the ROC curves, i.e., the area between the ROC curves, for SA and SA100 for various thresholds of the energy and training size. SA at 100 and 1000 sweeps are eectively identical by this benchmark. . . . . . . . . . . . . . . . 164 5.13 Histograms for the true (peaked) distribution of local biases and couplers, and the same distribution subject to point-wise Gaus- sian noise with zero mean and standard deviation 0:025, which is approximately the magnitude of errors on the DW couplers. . . . 165 5.14 The maximum local bias and coupler term in the Hamiltonian across training sizes and training sets. . . . . . . . . . . . . . . . 166 5.15 The maximum local bias and coupler term in the Hamiltonian across training sizes and training sets, normalized by the number of events in the training set. This makes it clear that the scaling of the Hamiltonian coecients is linear in the training size, for training sizes 5000. . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.16 The ratio of the median coecient by the maximum coecient for the non-zero local biases, couplers, and both taken together. 167 5.17 Dierence between the ROC curve for SA at v cut at thexth per- centile during weak classier construction and the curve using the yth percentile during the same for the ground state conguration. In order of presentation (a) x = 70, y = 60, f = 0. (b) x = 70, y = 80, f = 0. (c) x = 70, y = 60, f = 0:05. (d) x = 70, y = 80, f = 0:05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 5.18 Dierence between the ROC curves for SA and DW using the minimum energy returned. . . . . . . . . . . . . . . . . . . . . . . 169 5.19 Dierence between the ROC curves for SA and DW using all states within 5% of the minimum return energy. . . . . . . . . . . 169 5.20 Dierence between the ROC curves for DW and DNN using the minimum energy conguration from DW. . . . . . . . . . . . . . 170 5.21 Dierence between the ROC curves for DW and XGB using the minimum energy conguration from DW. . . . . . . . . . . . . . 170 5.22 Dierence between the ROC curves between the true ground state conguration and the f = 0:05 composite classier from SA. . . . 171 5.23 Dierence between the ROC curves between the minimum energy state returned by DW and thef = 0:05 composite classier from DW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 15 A.1 The basic types of fundamental plots | The rst panel shows cu- mulative number of true successes in the simulation as a function of the number of gauges, the second shows the ratio between said cumulative successes for the MAB algorithm (here Boltzmann with energy 1) to that which one would expect from randomly selecting gauges. And nally there is the third panel, showing the MAB-to-blind cumulative successes against the number of successes observed by that point. This latter is quite nice, in that the value at any given point is an estimate of the magni- tude of the benet (the improvement in the rate of successes) of running the MAB. This particular plot is instance 10, the 60th percentile, using Boltzmann exploration at = 1 using an ad- justable number of arms. . . . . . . . . . . . . . . . . . . . . . . . 191 A.2 Plot of ratio of MAB to blind cumulative success as a function of the number of programming cycles for the instance representing the 5th percentile of the dataset for =f0:01; 0:05; 0:1g. We see a signicant improvement for the 25th percentile of trials, though not a very large dierence in the scaling of each algorithm individually. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 A.3 Plot of ratio of MAB to blind cumulative success as a function of the number of programming cycles for the instance representing the 90th percentile of the dataset for =f0:01; 0:05; 0:1g. We see fairly little variation, as easy instances have such high probabil- ities tend to have somewhat less variation in relative probability magnitudes than harder instances. . . . . . . . . . . . . . . . . . 193 A.4 Plot of the mean ratio over repeated trials of MAB to blind cumu- lative success as a function of the number of programming cycles for the 13 problems discussed here for -greedy. . . . . . . . . . . 194 A.5 Plot of the mean ratio over repeated trials of MAB to blind cu- mulative success as a function of the number of programming cycles for the instance equal to the 5th percentile in probability of success of the instance class. We see similar performance, but some small degradation as we go to higher temperatures. . . . . 195 A.6 Plot of the mean ratio over repeated trials of MAB to blind cu- mulative success as a function of the number of programming cycles for the hardest instance of the instance class, and see an improvement in performance as temperature increases. By in- creasing temperature, performance broadly improves, in partic- ular the 25th percentile more rapidly moves above breakeven (ie 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 A.7 Plot of the mean ratio over repeated trials of MAB to blind cumu- lative success as a function of the number of programming cycles for the 13 problems discussed here for Boltzmann exploration at temperature= 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A.8 Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of program- 16 ming cycles for the hardest instance of the instance class. Per- formance is widely overdispersed | there is a signicant chance of very high returns, but also very high probability of very poor returns, with the average case being unimpressive by comparison to other algorithms here. . . . . . . . . . . . . . . . . . . . . . . . 198 A.9 Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of program- ming cycles for the 10th easiest problem in the class. Compared to the same plot in Figure A.3, for instance, performance is poor. 198 A.10 Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of pro- gramming cycles for the 13 problems discussed here. It is fairly clearly quite consistent but much worse, on the whole, than other algorithms we test here. . . . . . . . . . . . . . . . . . . . . . . . 199 A.11 Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of program- ming cycles for the 5th hardest problem in the class. Performance collapses to blind guessing, as our prior density places signicant prior probability that there are lower energy states so long as the number of \arms" in our bandit is larger than the number of successes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 A.12 Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of pro- gramming cycles for the 20th hardest problem in the class. Per- formance now smoothly increases, in contrast to A.11. . . . . . . 200 A.13 Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of program- ming cycles for the 10th easiest problem in the class. Performance is still reasonble, and again rises smoothly across the whole space. 201 A.14 Plot of the mean ratio over repeated trials of MAB using Thomp- son sampling to blind cumulative success as a function of the num- ber of programming cycles for the 13 problems discussed here. It is fairly clearly quite consistent, and actually reaches by far the highest rates of return for certain problems as compared with UCB, Boltzmann exploration, and -greedy. . . . . . . . . . . . . 201 A.15 The maximum (long-run) ratio of the total return under the MAB algorithm compared to blind guesing for each of our three com- petitive algorithms. As we can see, while Thompson sampling ended up achieving the highest maximum returns for a few in- stances, it generally fell behind the more greedy algorithms such as -greedy and Boltzmann exploration. If one looks carefully, Boltzmann exploration almost universally slightly outperforms -greedy in this case. . . . . . . . . . . . . . . . . . . . . . . . . . 202 17 List of Tables 2.1 Repetitions of annealing runs used on the DW2. This table summarizes the total number of repetitions used to estimate the success probabilities on the DW2 for various system sizes. . . 50 2.2 Wallclock times on the DW2. Listed are measured program- ming times t p and annealing plus readout times t r (for a pure annealing time of 20s) on the DW2 for various problem sizes. . 54 5.1 The kinematical variables used to construct weak classi- ers. Here, m is the invariant mass of the diphoton pair . . . 147 5.2 Map from number to variable/weak-classier name . . . 150 5.3 Variable inclusion in the ground states of the Ising prob- lem instances. The variables listed are those from which we selected the various variables included in our tests with varying problem size. We list how many out of 20 training sets had the given variable turned on in the ground state conguration. Three of the 36 variables were included for all values of the penalty term and for all of the training sets [p 2 T , (Rp T ) 1 , and p 2 T p T ], the variablesp 2 T =(p 1 T p 2 T ) and (p 1 T +p 2 T )= were present in almost all, while seven were never included, among which the original kinematical variables p 1 T and . All momenta (p 1 T ;p 2 T ;p T ) are given in units of m . Variables are given in Table 5.2. . . . . . . 151 18 Chapter 1 Introduction How do we benchmark the performance of quantum annealers and similar sys- tems? How do we analyze our state of knowledge of that performance, if it is subject to a variety of nuisance parameters? How should we even dene quan- tum speedup? In this dissertation, I will present work that I in collaboration with colleagues have done to answer these questions and apply those answers to a few test cases. 1.1 Introduction As we look forward to the rapid development of new quantum computing de- vices with hundreds or thousands of qubits, particularly commercial devices and non-gate-based devices such as quantum annealers, we are faced with a chal- lenge. How does one ensure such devices really do what they claim, and aren't eectively classical? How does one evaluate the performance of such a device, what methods should one use to estimate performance on a given metric, and what metrics should one use? How do we do maintenance on the quantum state and ensure we can prevent or correct breakdowns and errors? These questions have to be settled before we can decide where to take our device on a test drive, and what problems we should use our quantum computing devices to try to solve. At this time, these new devices and plans for quantum annealing devices and various other quantum computing platforms are no longer the rst of their kind. Several generations of programmable quantum annealers from D-Wave Systems have been made available to a small community of researchers, which has worked hard to answer the aforementioned questions. This community began largely groping in the dark, and has over the last ve years answered many of the most basic questions, developing techniques to validate quantum annealers, methods to benchmark and estimate performance, and developing methods to suppress errors given the constraints of existing quantum annealers. I have been fortunate to be members of the aforementioned community, 19 which has given us an opportunity to work with the rst several generations of quantum annealers, starting from the rst commercially available such device, the 128-qubit D-Wave One \Rainier" processor, through two more generations of 512 and 1152 qubits, to the current 2048-qubit D-Wave 2000Q processor. 1 I have given signicant thought to many of the aforementioned questions { What is quantum speedup? How would we recognize it if it was there? Most importantly, how can we think rigorously about the state of our knowledge about the performance of a quantum annealer, particualrly when it is noisy or subject to a large space of nuisance parameters? The discussion will draw mainly from the research I and colleagues have done on quantum annealers, and I apologize in advance to the many others who have contributed to this enterprise for not doing their work justice. I expect that some of the lessons learned will inform studies of future classes of quantum computing devices with many qubits. My presentation of other's work aims to remain at a fairly high level, without giving a detailed technical account, for which I refer the reader to the original literature cited. In this chapter 1, I will review some of the theory and literature of quantum annealing, describe some of the solvers that are often used to solve the Ising- type problems that existing annealers are meant to solve, and review some of the work on characterizing these systems. This chapter was heavily based on sections of my publications, particularly Ref [3]. In chapter two 2, I will present work on dening and detecting quantum speedup on a quantum annealer, and the initial application of that work toward benchmarking the D-Wave Two annealer using random Ising problems. This chapter was originally published as Ref [4]. In chapter three 3, I will look in detail at the problem of estimating the performance of an algorithm (in particular a noisy quantum annealer). In par- ticular, I'll discuss why one needs to think carefully about one's state of knowl- edge about the performance of an algorithm and how that diers from certain standard statistical techniques. That discussion also addresses some of the con- cerns with eciency of benchmarking dicult problem classes, and proposes a mechanism to potentially signiciantly reduce the computational eort of that task. In chapter four 4, we'll again return to the application of some of the insights from chapters two and three, and give a partial solution to yet another problem in benchmarking { how to compare algorithms on problems so dicult you cannot solve them via bruteforce. This is done through the introduction of planted solutions in Ising problems. We'll also address a theorem concerning 1 A brief history: the Rainier processor (108 operational qubits) was the rst to be installed at the USC-Lockheed Martin Quantum Computing Center at the USC Information Sciences Institute in 2011. Upgrades to the \Vesuvius" (504 operational qubits) and \Washington" (1098 operational qubits) processors followed in 2013 and 2016, respectively. Google installed \Vesuvius" (509 operational qubits) and \Washington" (1097 operational qubits) processors in the same years at NASA Ames. Los Alamos National Lab installed a \Washington" processor (1095 operational qubits) in 2016. The 2000Q processor is now deploye at NASA Ames. 20 the need to nd an optimal annealing time for generic thermal or quantum annealing-type algorithms. This chapter was originally published as Ref [5]. In chapter ve 5, we apply all these insights again to a very new context | the use of quantum annealers in a machine learning context, namely binary classication. There we develop a model we call \quantum annealing for ma- chine learning" or QAML and apply it to the problem of identifying Higgs boson decays in a sea of background events. This provides some additional motivation for considering how to benchmark algorithms where the algorithm in question is embedded into a complex analysis pipeline. This chapter was originally pub- lished as Ref [6]. Finally, in chapter six 6, we review all we've learned about the task of bench- marking and summarize a set of principles to guide future eorts. This chapter is a partly based on Ref [3]. 1.2 The Ising Hamiltonian and quantum anneal- ing Consider the problem of nding the ground state of an Ising spin glass model described by a \problem Hamiltonian" H Ising = X i2V h i z i X (i;j)2E J ij z i z j ; (1.1) withN binary variables z i =1. The local eldsfh i g and couplingsfJ ij g are xed and dene a problem instance of the Ising model. The spins occupy the verticesV of a graph G =fV;Eg with edge setE. For most of the work here, the graph is some variant of the Chimera graph, though chapter 5 includes work on fully connected Ising models. For an extremely broad ensemble of graphs, the Ising problem is NP-hard [7]. 1.2.1 Quantum annealers Quantum annealers are envisioned as a way of solving (nding the ground state) of an Ising problem. To perform quantum annealing one maps the Ising variables z i to Pauli z-matrices and adds a transverse magnetic eld in the x-direction to induce quantum uctuations, thus obtaining the time-dependent quantum Hamiltonian H(t) =A(t) X i x i +B(t)H Ising ; t2 [0;t a ] : (1.2) The annealing schedule starts at time t = 0 with just the transverse eld term (i.e., B(0) = 0) and A(0) k B T , where T is the temperature, which is kept constant. The system is then in a simple quantum state with (to an excellent approximation) all spins aligned in the x direction, corresponding to a uniform superposition over all 2 N computational basis states (products of 21 eigenstates of the z i ). During the annealing process the problem Hamilto- nian magnitude B(t) is increased and the transverse eld A(t) is decreased, ending with A(t a ) = 0, and couplings much larger than the temperature: B(t a ) max(max ij jJ ij j; max i jh i j)k B T . At this point the system will again be trapped in a local minimum, and by repeating the process one may hope to nd the global minimum. Quantum annealing can be viewed as a nite-temperature variant of the adiabatic quantum algorithm [8], typically thought of as being restricted to classical nal Hamiltonians, as above. The D-Wave devices [9, 10, 11, 12] are designed to be physical realizations of quantum annealing using superconducting ux qubits and programmable elds fh i g and couplingsfJ ij g, and have connectivity matching the Chimera graph. The Chimera graph of a DW2X used in tests in chapter 5 is shown in Fig- ure 1.1, with it's annealing schedule in 1.2. The ideal Chimera graph is dened as follows. Each unit cell is a balanced K 4;4 bipartite graph. In the ideal Chimera graph the degree of each vertex is 6. The left-hand side of each unit cell has connections to corresponding qubits in unit cells above and below, while the right-hand side has connections to corresponding qubits in cells to the left and right.A N = 8L 2 -vertex Chimera graph comprises an LL grid of K 4;4 unit cells, and the (so-called TRIAD) construction of Ref. [13] can be used to embed the complete 4L + 1-vertex graph K 4L+1 . The treewidth of such a graph is 4L + 1O( p N) [13]. Dynamic programming can always nd the true ground state of the corresponding Ising model in a time that is exponential in the treewidth, i.e., that scales as exp a p N . 22 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 138 139 140 141 142 143 144 145 146 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 284 285 286 287 288 289 290 291 292 293 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 315 317 318 319 320 321 322 323 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 430 431 432 433 434 436 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 463 464 465 466 467 468 470 472 473 474 475 476 477 478 479 480 481 482 483 484 485 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 565 566 567 568 570 571 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 683 684 685 687 688 689 690 691 692 693 694 695 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 722 723 724 726 727 728 729 730 731 732 733 734 735 736 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 812 813 814 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 909 910 912 913 914 915 916 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 991 992 994 995 996 997 998 999 1001 1002 1003 1005 1006 1007 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 22 49 137 147 174 196 197 198 199 227 283 294 314 316 324 352 429 435 437 462 469 471 486 512 564 569 572 573 682 686 696 721 725 737 762 811 815 840 907 908 911 917 918 919 968 990 993 1000 1004 1008 1046 1079 1104 1128 Figure 1.1: An 1152 qubit Chimera graph describing the D-Wave Two X pro- cessor at the University of Southern California's Information Sciences Institute. Inactive qubits are marked in red, active qubits (1098) are marked in green. Black lines denote active couplings (where J ij is programmable to be in the range [1; 1]) between qubits. 23 Figure 1.2: Annealing schedules for the D-Wave Two X processor described in Fig. 1.1. 24 1.3 Classical solvers The choice of classical solvers against which to compare the quantum device involves a few considerations. It is important to perform an apples-to-apples comparison, in that if the device is probabilistic, it would be misleading to measure its performance against a deterministic algorithm [14, 15, 16]. For a quantum annealer, a performance comparison to known heuristic algorithms for sampling low-energy states from Ising models is natural, such as SA [17, 18], par- allel tempering (PT) [19, 20, 21], and the Hamze-Freitas-Selby (HFS) algorithm (which searches all states on nodes that make up induces trees or small-treewidth subgraphs of the Ising model's connectivity graph) [22, 23]. One might also com- pare to approximations of QA itself, in particular simulated quantum annealing (SQA) [24, 25, 26], or the SVMC algorithm [27]. All of these can be said to be \solvers" for the Ising problem on QA. But, to determine if the quantum device is truly useful in practice, it must also be compared to the best algorithm for solving the original (typically non-Ising problem) task. For example, when solv- ing the graph isomorphism problem [28], job-shop scheduling [29], operational planning [30], or portfolio optimization [31], the original problem must rst be mapped into an Ising problem [32] and then embedded using the existing hard- ware connectivity graph [33, 13, 34]; the performance of the quantum device must be compared to the best algorithm for solving the original problem, and the mapping plus embedding steps can severely reduce performance. Note also that determining what the truly optimal classical algorithm is can be a daunt- ing, or even impossible, challenge. In many cases one settles for an educated guess: the standard and/or currently best known algorithm(s). Finally, it is important to remember that any tests run on a quantum device that does not enjoy a fault tolerance guarantee cannot be reliably extrapolated to arbitrarily large sizes. I.e., in the absence of such a guarantee, a nite-size device provides evidence of what can be expected at larger sizes only provided that quantities such as the device temperature, coupling to the environment, and calibration and accuracy errors, can be appropriately scaled down. With this in mind, let us turn to a discussion of much of the benchmarking work done so far and some of the considerations that go into using large, noisy quantum devices. 1.3.1 A brief description of some of the most relevant al- gorithms for solving Ising-type problems HFS algorithm The HFS algorithm is due to Hamze & Freitas [22] and Selby [23]. This is a tree-based optimization algorithm, which exploits the sparsity and local con- nections of the Chimera (or other) graph to construct very wide induced trees and repeatedly optimizes over such trees until no more improvement is likely. It may also be used to sample from the marginal Gibbs state of each tree. Let's brie y discuss the tree construction of the HFS algorithm. It considers each part of the bipartite unit cell as a single 2 4 -dimensional vertex instead of 25 four distinct 2-dimensional vertices, resulting in a graph as depicted in Fig. 1.3. The tree represented by the dark vertices covers 78% of the graph, and such trees will cover 75% of the graph in the limit of innitely large Chimera graphs. Finding the minimum energy conguration of such a tree conditioned on the state of the rest of the graph can be done inO(N) time where N is the number of vertices. Since the tree encodes so much of the graph, optimizing over such trees can quickly nd a low-lying energy state. Since each sample takes a variable number of trees, we generally estimate time to solution as as the average number of trees multiplied by the number of operations per tree, with a time constant of 0:5s per operation (derived from experiment). More generally, one can also implement HFS in a graph-independent way, but building trees dynamically, however no studies in this work used this method. Simulated Annealing The simulated annealing algorithm [17, 18] typically uses a single spin- ip Metropolis update method. In a single sweep, each spin is updated once ac- cording to the Metropolis rule: the spin is ipped, the change in energy E is calculated. If the energy is lowered, the ip is accepted, and if not, it is accepted with a probability given by the Metropolis probability: P Met = min (1; exp(E)) (1.3) A linear annealing schedule in is typically used; Starting at i = 0:01, we increment in steps of = ( f i )=(S1) whereS is the number of sweeps, up to f . Parallel Tempering Parallel tempering (PT) [19, 20, 21] is very similar to SA, except that one en- codes many copies of the system at dierent temperatures, performing Monte Carlo update sweeps on each copy and then performing a swap between neigh- boring systems in temperature space with another Metropolis update. Choos- ing the ensemble of temperatures is relatively dicult, but there are heuristics which typically yield good results and are ecient [21]. Advanced versions with clever update mechanisms specically tailored for Ising models, such as the isoenergetic cluster move discussed in Ref [35], have proven to be exceptionally powerful. However, parallel tempering will receive no further attention here, as it was not used in any of the studies I've published. SSSV/SVMC The SSSV (Shin, Smith, Smolin & Vazirani) or SVMC (spin-vector Monte Carlo) model was rst proposed [27] as a classical model that reproduced the success probabilities of the DW1 device studied in Ref. [15], although there is growing evidence that this model fails to capture the behavior of the device for specic instances [36, 37, 2]. The model can be understood as describing coherent 26 Figure 1.3: An example of the view of the Chimera graph in the HFS algorithm. Each vertex is one half of a unit cell, and is 2 4 -dimensional. The algorithm repeatedly nds the minimum energy conguration of the graph over trees (like the one highlighted) given the state of the remaining vertices (in gray). 27 single qubits interacting incoherently by replacing qubits by O(2) rotors; the Hamiltonian can be generated by replacing x i 7! sin i and z i 7! cos i . The system is then \evolved" by Monte Carlo updates on the angles i 2 [0;]. Although SSSV is not designed to be a fast solver, we studied it in chapter 4 as a potential classical limit of the DW2 and checked what the scaling of such a classical limit would be. Simulated Quantum Annealing/Path Integral Monte Carlo SQA/PIMC [24, 25, 26] is an annealing algorithm based on discrete-time path- integral quantum Monte Carlo simulations of the transverse eld Ising model but using Monte Carlo dynamics instead of the open system evolution of a quantum system. This amounts to sampling the world line congurations of the quantum Hamiltonian (1.2) while slowly changing the couplings. SQA has been shown to be consistent with the input/output behavior of the DW1 for random instances [15], and we accordingly used a discrete-time quantum annealing al- gorithm. Generally 64 Trotter slices are used and an inverse temperature of = 10 (in dimensionless units, such that max(jJ ij j) = 1) with linearly decreas- ing and increasing A(t) and B(t), respectively are used. Cluster updates are performed only along the imaginary time direction. A single sweep amounts to the following: for each space-like slice, a random spin along the imaginary time direction is picked. The neighbors of this spin are added to the cluster (as- suming they are parallel) according to the Wol algorithm [38] with probability 1e 2J ? , where J ? =0:5 ln [tanhA(t)] is the spin-spin coupling along the imaginary time direction. When the cluster construction terminates, the clus- ter is ipped according to the Metropolis probability using the change in energy along the space-like direction associated with ipping the cluster. Therefore a single sweep involves a single cluster update for each space-like slice. 1.4 A quick review of quantum validation test- ing Perhaps the rst question one might ask when oered a quantum computational device is whether or not it is, in fact, quantum. In the case of quantum computa- tional devices based on the circuit model and/or gates for quantum computing, the task of validation can be reduced to a Clauser-Horne-Shimony-Holt (CHSH) test between two parts of the device that are treated as black boxes [39]. Alter- natively, one may opt for quantum process tomography [40, 41] or quantum gate set tomography [42, 43], wherein one applies many small computations and mea- sures the results, verifying that they match the predictions of quantum theory. These predictions are available because the quantum computations in question typically involve few qubits and are thus readily implementable [44, 45]. However, for other quantum computing paradigms, such as quantum anneal- ing (QA) [46, 47] and the broader eld inspired by adiabatic quantum comput- ing (AQC) [48, 49, 50, 51], quantum tomography is not currently available for 28 validation. This is for a variety of reasons. The key dierence is that gate- based computations are modular: they can be broken into discrete time-local and space-local operations, operating eectively on only one or two qubits at a time, with the others left essentially unaected, so the only requirement to val- idate even a long chain of computations is to validate those one- and two-qubit operations on individual qubits and pairs of qubits. For AQC-like platforms, the quantum computation is composed of a continuously time-varying Hamil- tonian with many computational operators acting on the system at the same time. They are non-modular in the sense that they cannot be easily broken down into discrete chunks which can be validated separately. Future versions of such platforms may be more exible and allow for approaches such as quantum tomography, but will still be unable to validate arbitrarily large computations due to the aforementioned nonmodularity of the computation. Meanwhile, par- tial alternatives such as tunneling spectroscopy have already been explored [52]. Of course, in the absence of error correction and fault tolerance neither the gate model nor AQC are guaranteed successful validation. Nevertheless, certain lessons can be ported over to non-gate-based approaches. One should, as in the circuit model, focus on small problems, with a small num- ber of qubits, and one may hope that by studying such problems applied to many such overlapping sets that one can at least partially validate the operation of the device. From here, two paths for validation become available, depending on whether one can \open the black box" and perform measurements during the anneal or use measurements beyond what may be considered \native" to the device, or whether one is only able to use the device's output at the end of complete runs for testing. 1.4.1 Types of validation: proof of quantumness, quan- tum supremacy, speedup-inferred quantumness, and classical model rejection In validating quantum annealers, one seeks to create an assignment to the h's and J's such that one can take some measurements which will conclusively demonstrate, for instance, quantum entanglement, in what might be called an experimental \proof of quantumness". A somewhat weaker and indirect type of validation is provided by \quantum supremacy" experiments [53], since they have the potential for complexity the- oretic guarantees. 2 More specically, quantum supremacy is a scenario where (part of) the polynomial hierarchy of complexity theory collapses if the quantum result could be replicated classically without slowdown [55, 56, 57, 58, 59, 60]. While weaker than a direct proof of quantumness, a demonstration of quantum supremacy would be considered strong evidence for quantum computational power of a device, which may be considered inherently more interesting than a 2 The term \supremacy" has generated considerable controversy [54]. While we would prefer the adoption of an alternative such as \hegemony" or \supremeness", we recognize that \supremacy" is likely here to stay due to its current widespread usage. 29 direct demonstration of, e.g., entanglement. \Speedup-inferred quantumness" is a related type of indirect validation based on a demonstration of quantum speedup [4]2 over the best classical solvers known for a task, which is often considered the holy grail of quantum infor- mation processing. Unlike quantum supremacy tests, speedup-inferred quan- tumness tests do not have complexity theoretic guarantees (an example in the circuit model would be Shor's algorithm [61]). It appears that an unqualied quantum speedup would necessarily have to invoke quantum properties, and this might happen even if these properties remain poorly understood or character- ized. Thus a certicate of quantumness might be assigned even in the absence of a direct demonstration of quantum properties such as entanglement. It should be recognized that this carries a certain element of risk. For example, suppose a new classical optimization is discovered that outperforms all other classical and quantum optimization algorithms known to date (this is in fact what hap- pened recently in a tug-of-war between quantum and classical optimization for the Max E3LIN2 problem [62]). This algorithm could be deceptively marketed as a quantum algorithm providing speedup-inferred quantumness by a shrewd company claiming to own quantum computers, that provides black-box access only to run the new optimization algorithm. Thus any claim of speedup-inferred quantumness should always be treated with a healthy degree of skepticism as related to its quantum underpinnings, until actual evidence of quantum eects driving the algorithm is presented. If none of prior three types (proof of quantumness, quantum supremacy, speedup-inferred quantumness) of validation are attainable, one may alterna- tively seek to show that on suciently small scale problems the results are only readily reproducible using a truly quantum model of the device, and cannot be replicated qualitatively using any existing classical model, in what might be called \classical model rejection". This type of validation experiment does not provide a certicate of quantumness, since one can always invent a new and better classical model. Instead, one can only hope to exclude all \physically reasonable" classical models for the device. Moreover, classical model rejection can only be performed as long as it is feasible to carry out quantum model sim- ulations, which limits system sizes to about 20 qubits for master equation type models, using the quantum trajectories method [63]. Extrapolations to larger sizes are, as always, risky in the absence of fault tolerance guarantees. One caveat regarding \proof of quantumness" experiments is noteworthy. While demonstrations of entanglement can be considered \proof of quantum- ness", they often require additional physical resources and measurement possi- bilities beyond those that may natively be embedded in a (commercial) quan- tum computational device or that are strictly required to implement the core algorithm, and thus may be impossible on certain platforms. Additionally, in practice, certain assumptions may be made in a \proof of quantumness" exper- iment which, when relaxed, render it eectively a \classical model rejection" experiment; we shall shortly see an example of this with the D-Wave quantum annealers. 30 1.4.2 Experimental implementations of quantum valida- tion tests The primary \proof of quantumness" experiment for quantum annealers was performed in Ref. [64], using an entanglement test on the D-Wave Two (DW2) generation of processors. Brie y, the work used quantum tunneling spectroscopy [52] to estimate the populations of the rst and second excited states of a com- bined probe-system Hamiltonian. They also measured the energy spectrum and found it to be consistent with the Hamiltonian the device was designed to im- plement, which provided a justication for the assumption that the measured populations were those of the energy eigenstates of the Hamiltonian. This al- lowed for a reconstruction of the density matrix under the assumption that it is diagonal in the energy eigenbasis, enabling a computation of the negativity [65] for all possible bipartitions of the system, the geometric mean of which was taken as a measure of the entanglement of the system. As it was found to be nonzero, the system is entangled. Further, by exploiting the theory of entan- glement witnesses [66], Ref. [64] was able to show that even if the diagonality assumption is relaxed, the entanglement remains. This was used to conclude that the DW2 system tested displays entanglement at least on the scale of a single 8-qubit unit cell. It was noted in Ref. [67] that these tests depended on the assumption that the device was well-described by Eq. (1.1) for an appropriate (programmed) choice of local elds and couplers for which the ground state is entangled, and that this assumption is not directly demonstrable by the experiments in [64]. Without that assumption, one must revert to a \classical model rejection" experiment in which one compares results of direct quantum simulations of the device and available classical alternatives to demonstrate that only the quantum model is consistent with the experimental observations. Ref. [67] provides a detailed description of the experiments, but for our purposes the key takeaway is that only the quantum adiabatic master equation [68] can reproduce the output distribution from experiments, validating the approach in Ref. [64]. Another branch of validation experiments of the classical model rejection type are the so-called \quantum signature" Hamiltonians and the consistency tests derived therefrom, introduced in Ref. [1], critiqued in Ref. [69], and further explored in Refs. [70, 36, 71]. Unlike the aforementioned entanglement tests, these experiments do not require access to the system during the annealing pro- cess, and are appropriate for cases in which the quantum device is a \black box" in which one can only control the inputs and measure the outputs. An example of a quantum signature Hamiltonian is shown in Fig. 1.4. These Hamiltonians take the form of a ring of tightly bound qubits each connected to a single outer qubit. The resulting Hamiltonian has the property that there is a large (2 N=2 - dimensional) degenerate subspace of ground state congurations corresponding to arbitrary assignments to the outer qubits where all qubits in the inner ring are in the state 0 (forming a \cluster" connected by single spin ips applied to the outer ring). There is one additional ground state corresponding to ip- ping all the inner qubits to the state 1, dubbed the \isolated" state, since for a 31 + + + + − − − − Figure 1.4: The eight-spin Ising quantum signature Hamiltonian introduced in Ref. [1]. The inner \core" spins (green circles) have local elds h i = +1 while the outer spins (red circles) have h i =1. All couplings are ferromagnetic: J ij = 1 (black lines). signature Hamiltonian with 2N qubits it is at least N spin ips away from all other ground states. Thermal algorithms such as classical simulated annealing (SA) [17] will be weighted toward the isolated state, such that it will have the highest probability of occurrence of any ground state conguration, whereas an adiabatic quantum evolution will nd the isolated state to be suppressed relative to the cluster states. Extensive simulations and experiments on a 108 qubit D-Wave One (DW1) \Rainier" processor matched qualitatively with the adiabatic master equation across all the parameters and statistics of the output distributions tested, though noise due to cross-talk made it very dicult to nd quantitative agreement; at the same time all existing classical models failed to qualitatively match the DW1 in at least one of the tests [36]. A dierent approach to classical model rejection was taken in Ref. [15], which used random J ij =1 instances of the Ising Hamiltonian in Eq. (1.1) to test the hypothesis that quantum annealing correlates well with two classical models: SA and classical spin dynamics [69] (also known as the Landau-Lifshitz-Gilbert model). The hypothesis was tested using the same DW1 processor. This work showed that these two classical models failed to correlate with the results for the distribution of ground state probabilities generated by the DW1 device, while the DW1 correlated very well with simulated quantum annealing (SQA), implemented using quantum Monte Carlo [24]. This was taken as evidence for quantum annealing on the scale of more than 100 qubits, thus generalizing the conclusion of the earlier result [1] based on the 8-qubit \gadget" shown in Fig. 1.4. Shortly thereafter a new semiclassical Spin-Vector Monte Carlo (SVMC) model was introduced, also known as SSSV, the author initials of Ref. [27]. 3 In this model spins are treated as O(2) rotors (eectively as single qubits), evolved according to the annealing schedule given in Eq. (1.2), with 3 Both the spin-dynamics and SVMC models can be derived in a strong coupling limit from the anisotropic Langevin equation, starting from Keldysh eld theory [72]. 32 Monte Carlo angle updates. The SVMC model correlated well with both the DW1 and SQA data, suggesting that although the DW1 device's performance is consistent with quantum annealing, it operated in a temperature regime where, for most random Ising spin glass instances, a quantum annealer may have an eective semi-classical description. This conclusion was challenged in Ref. [37], which considered the excited state distribution rather than just the ground state distribution over random J ij =1 Ising instances, as well as the ground state degeneracy. This work presented evidence that for these new measures neither SQA nor SVMC, which are classically ecient algorithms, correlated well with the DW1 experiments. The close correlation SQA and the SVMC model was explained by showing that the SVMC model represents a semiclassical limit of the spin-coherent states path integral, which forms the foundation for the derivation of the SQA algorithm. The intense debate that arose around the original classical model rejec- tion tests presented in Refs. [1, 15], in particular the critique presented in Refs. [69, 27], illustrates the risks associated with such tests|risks that ma- terialize whenever a suciently clever new classical model is found that agrees with (some of) the data|as well as the fruitfulness of the classical model rejec- tion approach, which can lead to a healthy updating and sharpening of models and assumptions. Black box classical model rejection tests such as the quantum signature Hamiltonians provide the basis for the testing of new putative quantum devices for which available controls and potential measurements are limited, and ulti- mately even the best experiments that seek to prove entanglement will depend on a series of such experiments to demonstrate that only quantum models can reproduce the experimental data from the device. Quantum supremacy tests are a type of limiting case of this, in which one can prove that should any clas- sical device be able to produce a particular output distribution in polynomial time then the computational complexity hierarchy will at least partially col- lapse. Since this is not expected to occur, building a device which can produce said distribution eciently will then immediately rule out all classical models for the device [55]. Another kind of black box classical model rejection test is based on the phe- nomenon of quantum tunneling, whereby a quantum state has sizable probabil- ity on either side of an energy barrier which the system could not move through classically, or at least will only be able to do so with reasonable probability at high temperature. The rst quantum annealing experiments involving tunable tunneling were carried out using the disordered ferromagnet LiHo x Y 1x F 4 in a transverse magnetic eld [73, 74], and served as inspiration for the design of programmable superconducting ux-qubit based quantum annealers. These experiments indicated that quantum annealing hastens convergence to the opti- mum state via tunneling, compared to simple thermal hopping models. The rst programmable quantum annealer experiment was reported in Ref. [75], in which it was demonstrated that an 8-qubit quantum annealing device was able to re- produce the domain wall tunneling predictions of quantum theory for a chain of superconducting ux qubits by modifying the time during the annealing process 33 at which a local eld is abruptly applied to the qubits. This contradicted the temperature dependence predictions of a classical thermal hopping model, thus serving as a classical model rejection experiment. More recently, Ref. [2] reported on a specially designed tunneling probe Hamiltonian for quantum annealing, illustrated in 1.5. The probe uses two unit cells of the D-Wave Chimera graph, binding each one together tightly so they each act like a single eective spin, or cluster. Opposite magnetic elds are applied to each unit cell, one weak and one one strong, so that the spins in the \strong" cluster align before the spins in the \weak" cluster. Initially, there is only a single minimum. A second minimum develops over the course of the anneal, and eventually becomes the global minimum of the nal Ising Hamiltonian. The only way to reach the global minimum is to overcome an energy barrier whose strength increases as the anneal progresses, a classic ex- ample of tunneling. Using the non-interacting blip approximation (NIBA) it was shown in Ref. [2] that the system eectively acts like a two-level system even in the open-system setting with a strongly coupled bath. NIBA-based predictions without free parameters for tests at dierent values of h L and dierent temper- atures demonstrated very good agreement with experiments involving a DW2 device, and were not reproducible using classical models for the device such as SVMC [27]. A variant of this experiment was reported on in Ref. [76], which introduced a new class of problem instances which couples the weak-strong clus- ters of the tunneling probe as sub-blocks of the Hamiltonian. This work can be interpreted as an attempt to go from classical model rejection to speedup- inferred quantumness, as it claimed a large tunneling-induced constant-factor speedup over classical simulated annealing and simulated quantum annealing for a DW2X device. However, this claim was critiqued in Ref. [77] on the basis of a comparison to classical algorithms with better performance. Moreover, as we discuss below, speedup-inferred quantumness requires a demonstration of an optimal annealing time ([4] and 2), which was absent in the results reported in Ref. [76]. Validating non-gate based quantum devices will continue to be a challenge as new such systems come online, but applying combinations of the techniques discussed above, from the construction of quantum signature Hamiltonians and tunneling probes to (in)direct proofs of entanglement via entanglement witnesses and direct computation of entanglement, should allow one to boost condence that the system obeys the predictions of quantum theory over small scales. The challenge remains to extend these techniques so that they are able to demon- strate conclusively that a device with hundreds or thousands of qubits displays coherence and long-range entanglement. Due to decoherence this presents a challenge for gate-based quantum devices as well even at a smaller scale [78, 79], and speedup-inferred quantumness tests may prove to be simpler to execute than direct quantumness tests even in the gate-model setting. 34 Figure 1.5: The 16-spin Ising Hamiltonian composed of two K 4;4 unit cells introduced in Ref. [2]. All couplings are set to J = 1, all qubits in the left unit cell have a local eld 0 < h L < 0:5 applied to them while all spins in the left unit cell have h R = 1 applied to them. Two local minima form, one with the cells internally aligned but in opposite states from each other (a local minimum) and the other with all states aligned withh R (the global minimum). By tightly binding each unit cell, they eectively act like single large spins. 1.5 Benchmarking Assume we have at our disposal a device veried to be quantum, at least provi- sionally on the small scales covered by classical model rejection, and we would like to compare its performance to competing classical solvers. This is the task we refer to here as benchmarking, which belongs more generally to the eld of experimental algorithmics [80]. Specically, consider the problem of estimat- ing the value of some function of merit (or \reward") R from the output of a given solver (e.g., our quantum device or some classical algorithm) for a given problem familyP =fPg. Each problem instance P is parametrized by some parameters . In the case of quantum annealers, particularly studies of the D-Wave devices thus far, the goal has generally been to nd the ground state of Ising Hamiltonians as dened in Eq. (1.1). In that context, typically the reward is taken to be the negative of the time to solution (TTS), dened as TTS =t f log(1p d )= log(1p) for a probability p of nding the ground state at least once with desired probability p d (typically 0:99), and annealing time t f . 4 In the language above, R =TTS (one would like to minimize the TTS), and the problem is parametrized by =fh i ;J ij g. Many similar metrics have been proposed, such as time-to-epsilon and time-to-target [81], which amount to mild generalizations of TTS. A more elaborate notion of cost, based on optimal stopping theory, has also been considered and shown to recover the previous metrics as special cases [82]. We shall return to this below. 4 The probability of not nding the ground state even once after k independent runs of durationt f each is (1p) k , so the probability of nding it at least once is 1 (1p) k , which we set equal to p d . Solving for k and substituting into TTS = t f k gives the TTS formula. See, e.g., Ref. [4] for a more detailed derivation. 35 1.5.1 Prior work on benchmarking quantum annealers Our discussion here will focus exclusively on work that advanced the state of the art in benchmarking quantum annealers, and also primarily on those studies taken prior to or in the early phases of my PhD. The rst comprehensive study benchmarking QA devices was Ref. [15], us- ing a 108 qubit DW1 processor. This article introduced many of the concepts used in later studies in the eld, including the above denition of the TTS. It focused on the performance on the set of random Ising problems with binary 1 local elds and couplings, and introduced the use of SA and SQA as impor- tant comparison algorithms. It also noted the importance of comparing against parallelized versions of classical algorithms, as quantum annealers such as the D-Wave device consume linearly more computational hardware with increasing problem size, and in many cases SA and SQA can be eectively parallelized in much the same way. Another signicant contribution of Ref. [15] was the use of \gauge averag- ing" in benchmarking, a technique that was introduced in Ref. [1] (where it was called \spin inversions") and which has become so universal that it is now included natively in the D-Wave API for their processors, and which points toward a more general consideration for noisy quantum devices in the absence of quantum error correction. The need for gauge averaging arises from the observation that in QA, one may have per-qubit or per-edge random and sys- tematic biases from stray elds or interactions. In such cases, performance may be dramatically impacted by the choice of mapping from a logical Hamiltonian as dened in Eq. (1.1) to a physically implemented computation. In essence, a gauge transformation corresponds to swapping which physical spin state cor- responds to a computational 0 or 1. In an ideal annealer, this transformation commutes with (i.e., is a symmetry of) the total Hamiltonian and so has no dynamical eect. However in the presence of noise, this symmetry is broken and the choice of gauge does make a dierence, and indeed was found to have a signicant eect on the performance of the DW1 quantum annealer, to such a degree that the device did not even correlate with itself if one compares one gauge to another, or even one gauge with itself when run later (most likely the result of slow drift 1=f noise resulting in the eect that each time the annealer is programmed, a small random error term is added to the Hamiltonian). How- ever, when results for the same Hamiltonian were averaged across many gauges, the DW1 processor correlated quite well with itself [15]. Since then, applying many gauge transformations to the same Hamiltonian and averaging the results has become a standard practice in the QA community, and the idea behind it has been steadily generalized since then to include sampling over every known potentially broken symmetry of the Hamiltonian. For example, if one is solving a fully connected Ising problem, the Hamilto- nian has a permutation symmetry. Since every logical spin has an interaction with every other, one can relabel which spin is which without changing anything about the logical problem. However, when one goes to implement such a prob- lem on an actual quantum annealer with limited connectivity, such as the DW2, 36 one has to perform a minor embedding in which each logical spin is mapped to a chain of spins on the physical device [33, 13]. Those physical spins may have local eld biases which vary from chain to chain, and thus the distribution over logical states will depend, in part, on the assignment of the logical spin variables to the physical chains, as shown in Ref. [83]. This work was the rst case study of both minor embedding of fully connected problems as well as permutation embeddings for such problems, and demonstrated the importance of optimizing the strength of the coupling in minor embedding applications. Finally, Ref. [16] demonstrated evidence for the easy-hard-easy phase transi- tion for Max 2-SAT problems (wherein one wishes to nd the maximal number of simultaneously satisable two-variable Boolean clauses over a set of variables from some ensemble of clauses) near a clause density of one, on the 108-qubit DW1 processor. It performed a rudimentary benchmarking comparison between the DW1 and an exact Max 2-SAT solver (akmaxsat) (see also Ref. [14]), and noted that there was no correlation between the two solvers over randomly se- lected instances of Max 2-SAT. This work also introduced the important idea of bootstrapping into the QA community, variants of which (such as the Bayesian bootstrap [84]) formed the backbone of error analyses for later studies, as a non- parametric method for approximating the distribution over the problem space and over the aforementioned broken computational symmetries, and will be discussed in much more detail in the next chapter, 3. 37 Chapter 2 Dening and detecting quantum speedup The development of small-scale quantum devices raises the question of how to fairly assess and detect quantum speedup. Here I show how to dene and measure quantum speedup, and how to avoid pitfalls that might mask or fake such a speedup. I illustrate my discussion with data from tests run on a D-Wave Two device with up to 503 qubits. This study was done as a collaboration with my co-authors on the original paper, [4]. Using random spin glass instances as a benchmark, we found no evidence of quantum speedup when the entire data set is considered, and obtain inconclusive results when comparing subsets of instances on an instance-by-instance basis. Our results do not rule out the possibility of speedup for other classes of problems and illustrate the subtle nature of the quantum speedup question. Denoting the time used by a specic classical device or algorithm to solve a problem of sizeN byC(N) and the time used on the quantum device byQ(N), we dene quantum speedup as the asymptotic behavior of the ratio S(N) = C(N) Q(N) (2.1) for N!1. Subtleties appear in the choice of classical algorithms, in dening C(N) andQ(N) if the runtime depends not just on the sizeN of a problem but also on the specic problem instance, and in extrapolating to the asymptotic limit. Depending on our knowledge of classical algorithms for a given problem we may consider ve dierent types of quantum speedup. The optimal scenario is one of a \provable quantum speedup," where there exists a proof that no classical algorithm can outperform a given quantum algorithm. The best known example is Grover's search algorithm [85], which, in the query complexity setting, exhibits a provable quadratic speedup over the best possible classical algorithm [86]. A \strong quantum speedup" was dened in [87] by using the performance of the 38 best classical algorithm for C(N), whether such an algorithm is known or not. Unfortunately, the performance of the best classical algorithm is unknown for many interesting problems. In the case of factoring, for example, a proof of a classical super-polynomial lower-bound is not known. A less ambitious goal is therefore desirable, and thus one usually denes \quantum speedup" (without additional adjectives) by comparing to the best available classical algorithm instead of the best possible classical algorithm. A weaker scenario is one where a quantum algorithm is designed to make use of quantum eects, but it is not known whether these quantum eects pro- vide an advantage over classical algorithms or where a device is a putative or candidate quantum information processor. To capture this scenario, which is of central interest to us in this work, we dene \limited quantum speedup" as a speedup obtained when comparing specically with classical algorithms that `correspond' to the quantum algorithm in the sense that they implement the same algorithmic approach, but on classical hardware. A natural example is quantum annealing [46, 73] or adiabatic quantum optimization [8] implemented on a candidate physical quantum information processor vs corresponding classi- cal algorithms such as simulated annealing (SA) [88] (which performs annealing on a classical Monte Carlo simulation of the Ising spin glass and makes no use of quantum eects) or simulated quantum annealing (SQA) [89, 90] (a classi- cal algorithm mimicking quantum annealing in a path-integral quantum Monte Carlo simulation). In this comparison a limited quantum speedup would be a demonstration that quantum eects improve the annealing algorithm. The standard notion of quantum speedup depends on there being a consen- sus about the \best available" algorithm, and this consensus may be time- and community-dependent. For example, it may be the case, though it seems un- likely, that a classied polynomial-time factoring algorithm is available to parts of the intelligence community. In the absence of a consensus about what is the best classical algorithm, we dene \potential (quantum) speedup" as a speedup compared to a specic classical algorithm or a set of classical algorithms. An example is the simulation of the time evolution of a quantum system, where the propagation of the wave function on a quantum computer would be exponen- tially faster than a direct integration of Schr odinger's equation on a classical computer. A potential quantum speedup can of course be trivially attained by deliberately choosing a poor classical algorithm, so that here too one must make a genuine attempt to compare against the best classical algorithms known, and any potential quantum speedup might be short-lived if a better classical algo- rithm is found. Concerning the limited quantum speedup concept which is of central interest to us in the study focused on in this chapter, we only compare quantum anneal- ing to classical simulated annealing and simulated quantum annealing. Another example of a limited quantum speedup would be Shor's factoring algorithm run- ning on a fully coherent quantum computer vs a classical computer where the period nding using a quantum circuit has been replaced by a classical period nding algorithm. 39 2.1 Classical and quantum annealing of a spin glass To illustrate the subtleties in detecting quantum speedup, even after a classical reference algorithm is chosen, we will compare the performance of an experimen- tal 503-qubit D-Wave Two (DW2) device to classical algorithms and analyze the evidence for quantum speedup on the benchmark problem of random spin glass instances. We will consider the distributions of the time to solution over many random spin glass problems with integer weight and zero local elds on the `Chimera graph' realized by the DW2 device. This problem is NP-hard[7], as stated in chapter 1 and all known classical algorithms scale super-polynomially not only for the hardest but also for typical instances. While quantum mechanics is not expected to reduce the super-polynomial scaling to polynomial, a quantum algo- rithm might still scale better with problem size N than any classical algorithm. The approach adopted here, of seeking evidence of a (limited) quantum speedup, directly addresses the crucial question of whether large-scale quantum eects create a potential for the devices to outperform classical algorithms. To test this possibility we compare the performance of a DW2 device to two `corresponding' classical algorithms: SA and SQA. 2.2 Considerations when computing quantum speedup Since quantum speedup concerns the asymptotic scaling of S(N) let's consider the subtleties of estimating it from small problem sizes N, and ineciencies at small problem sizes that can fake or mask a speedup. In the context of annealing, the optimal choice of the annealing timet a turns out to be crucial for estimating asymptotic scaling. To illustrate this we rst consider the time to solution using SA and SQA run at dierent xed annealing times t a , independent of the problem size N. Figure 2.1A shows the scaling of the median total annealing time (over 1000 dierent random instances on the D-Wave Chimera graph { see section 2.5) for SQA to nd a solution at least once with probability p = 0:99. Corresponding times for SA are shown in gure S10. We observe that at constant t a , as long as t a is long enough to nd the ground state almost every time, the scaling of the total eort is at rst relatively at. The total eort then rises more rapidly, once one reaches problem sizes for which the chosen annealing time is too short, and the success probabilities are thus low, requiring many repetitions. Extrapolations to N!1 need to consider the lower envelope of all curves, which corresponds to choosing an optimal annealing timet opt a (N) for each N. Figure 2.1B demonstrates that when using xed annealing times no conclu- sion can be drawn from annealing (simulated or in a device) about the asymp- totic scaling . The initial slow increase at constant t a is misleading and instead 40 the optimal annealing time t opt a needs to be used for each problem size N. To illustrate this we show in Figure 2.1B the real \speedup" ratio of the scaling of SA and SQA (actually a slowdown), and a fake speedup due to a constant and excessively long annealing time t a for SQA. Since SA outperforms SQA on our benchmark set, it is our algorithm of choice in the comparisons with the DW2 reported below. 2.2.1 Resource usage and speedup from parallelism A related issue is the scaling of hardware resources (computational gates and memory) with problem size, which must be identical for the devices we com- pare. A device whose hardware resources scale as N can almost always achieve an intrinsic parallel speedup compared to a xed size device. Such is the case for the DW2, which uses N (out of 512) qubits andO(N) couplers and classi- cal logical control gates to solve a spin glass instance with N spin variables in time T DW (N). Considering quantum speedup for N!1 we need to compare a (hypothetical) larger DW2 device with the number of qubits and couplers growing asO(N) to a (hypothetical) classical device withO(N) gates or pro- cessing units. Since SA (and SQA) are perfectly parallelizable for the bipartite Chimera graphs realized by the DW2, we can relate the scaling of the time T C (N) on such a device to the timeT SA (N) for SA on a xed size classical CPU by T C (N)T SA =N and obtain S(N) = T C (N) T DW (N) T SA (N) T DW (N) 1 N : (2.2) 2.3 Performance of D-Wave Two versus SA and SQA We nally address the question of how to measure time when the time to solution depends on the specic problem instance. When a device is used as a tool for solving computational problems, the question of interest is to determine which device is better for almost all possible problem instances. If instead the focus is on the underlying physics of a device then it might suce to nd a subclass of instances where a speedup is exhibited. These two questions lead to dierent quantities of interest. 2.3.1 Performance as an optimizer: comparing the scaling of hard problem instances To illustrate these considerations we now turn to our results for this benchmark- ing study. Figure 2.2 shows the scaling of the time to nd the ground state for various quantiles, from the easiest instances (1%) to the hardest (99%), compar- ing the DW2 and SA. We chose the values of the couplings J ij from 2r discrete valuesfn=rg, with n2f1;:::;r 1;rg, and call r the \range". Since we do 41 not a priori know the hardness of a given problem instance we have to assume the worst case and perform a sucient number of repetitions R to be able to solve even the hardest problem instances. Hence the scaling for the highest quantiles is the most informative. Here, we consider only the pure annealing times, ignoring setup and readout times that scale subdominantly (see 2.5 for wall-clock results). We observe for both the DW2 and SA, for suciently large N, that the total time to solution for each quantile q scales as exp c q p N (with c q > 0 a constant), as reported previously for SA and SQA [15]. The origin of the p N exponent is well understood for exact solvers as re ecting the treewidth of the Chimera graph [91, 13]. While the SA code was run at an optimized annealing time for each problem sizeN, the DW2 has a minimal annealing time of t a = 20s, which is longer than the optimal time for all problem sizes 2.5. Therefore the observed slope of the DW2 data can only be taken as a lower bound for the asymptotic scaling. With this in mind, we observe similar scaling for SA and the DW2 for N& 200. How can we probe for a speedup in light of this similar scaling? With algorithms such as SA or quantum annealing, where the time to solution depends on the problem instance, it is impractical to experimentally nd the hardest problem instance. If instead we target a fraction of q% of the instances then we should consider the qth quantile in the scaling plots shown in Figure 2.2. The appropriate speedup quantity is then the ratio of these quantiles (\RofQ"). Denoting a quantile q of a random variable X by [X] q we dene this as S RofQ q (N) = [T SA (N)] q [T DW (N)] q 1 N ; (2.3) Plotting this quantity for the DW2 vs SA in Figure 2.3 (A and B) we nd no evidence for a limited quantum speedup in the interesting regime of largeN and largeq (almost all instances). That is, while for all quantiles and for both ranges the initial slope is positive, when N and q become large enough we observe a turnaround and eventually a negative slope. While we observe a positive slope for quantiles smaller than the median, this is of limited interest since we have not been able to identify a priori which instances will be easy. Taking into account that due to the xed suboptimal annealing times the speedup dened in Eq. (2.3) is an upper bound, we conclude that there is no evidence of a speedup over SA for this particular benchmark. 2.3.2 Instance-by-instance comparison S RofQ q (N) measures the speedup while comparing dierent sets of instances for DW and SA, each determined by the respective quantile. Now we consider instead whether there is a speedup for a (potentially small) subset of the same problem instances. To this end we study the scaling of the ratios of the time to solution for individual instances, and display in Figure 2.3 (C and D) the 42 scaling of various quantiles of the ratio (\QofR") S QofR q (N) = T SA (N) T DW (N) q 1 N : (2.4) For r = 7 all the quantiles bend down for suciently large N, so that there is no evidence of a limited quantum speedup. There does seem to be an indication of such a speedup compared to SA in the low quantiles for r = 1, i.e., for those instances whose speedup ratio was high. However, the instances contributing here are not run at the optimal annealing time, and more work is needed to establish that the potentialr = 1 speedup result persists for those instances for which one can be sure that the annealing time is optimal. Next we consider the distribution of solution times at a xed problem size. This does not address the speedup question since no scaling can be extracted, but illuminates instead the question of correlation between the performance of the DW2 and SA. To this end we perform individual comparisons for each instance and show in Figure 2.4A-B the time to solution for the same instances for the DW2 and SA. We observe a wide scatter (in agreement with the DW1 results of Ref. [15]) and nd that while the DW2 is sometimes up to 10 faster in pure annealing time, there are many cases where it is 100 slower. 2.4 Discussion It is not yet known whether a quantum annealer or even a perfectly coherent adiabatic quantum optimizer can exhibit (limited) quantum speedup at all, although there are promising indications from theory [92], simulation [90], and experiments on spin glass materials [73]. Experimental tests are thus important. There are several candidate explanations for the absence of a clear quantum speedup in the tests discussed in this chapter. Perhaps quantum annealing simply does not provide any advantages over simulated (quantum) annealing or other classical algorithms for the problem class we have studied [93]; or, perhaps, the noisy implementation in the DW2 cannot realize quantum speedup and is thus not better than classical devices. Alternatively, a speedup might be masked by calibration errors, improvements might arise from error correction [94], or other problem classes might exhibit a speedup. Future studies probed these alternatives and continue to aim to determine whether one can nd a class of problem instances for which an unambiguous speedup over classical hardware can be observed. While we used specic processors and algorithms for illustration, the consid- erations about a reliable determination of quantum speedup presented here are general. For any speedup analysis, using the same scaling of hardware resources for both quantum and classical devices is required to disentangle parallel and quantum speedup. And, for any quantum algorithm where the runtime must be determined experimentally, a careful extrapolation to large problem sizes is important to avoid mistaking ineciencies at small problem sizes for signs of quantum speedup. 43 In section 2.5 I provide additional plots and methods information which are not be necessary to understand the main thrust of this chapter, namely the in- troduction of an ensemble of denitions for speedup and the initial attempts at benchmarking the performance of a quantum annealer, and the admonition to account for hardware scaling when attempting to compute TTS. I will, however, draw the reader's attention to 2.5.3 for some additional discussion of gauge aver- aging, particularly as was performed in this study, and 2.5.8 for the introduction of the idea of classes of problems with tunable hardness, which will come into play in chapter 4. Speaking of gauge averaging, the next chapter will give serious thought on how to rigorously perform gauge averaging and its consequences for the bench- marking task, as well as a more ecient method of estimating TTS across a widely varying ensemble of instances than the brute force method used in the study in this chapter. We'll also see the case for taking seriously what your statistical methods are actually telling you, and thinking carefully about what question you're really trying to answer with benchmarking. Acknowledgments This chapter was originally published as [4]. The following are the acknowledgements for that paper. We thank N. Allen, M. Amin, E. Farhi, M. Mohseni, H. Neven, and C. McGeoch for useful discussions and comments. We are grateful to I. Zintchenko for providing the an ss ge nf bp simulated annealing code before publication of the code with Ref. [95]. This project was supported by the Swiss National Science Foundation through the National Competence Center in Research NCCR QSIT, the ARO MURI Grant No. W911NF-11-1-0268, ARO grant number W911NF-12-1-0523, the Lockheed Martin Corporation and Microsoft Research. Simulations were performed on clusters at Microsoft Research and ETH Zurich an on supercomputers of the Swiss Center for Scientic Computing CSCS. We acknowledge hospitality of the Aspen Center for Physics, supported by NSF grant PHY-1066293. 44 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Total time [μs] A) SQA, median Annealing time [MCS] 5 10 50 100 200 500 1000 2000 Optimal scaling p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 0 1 2 3 4 5 6 T SA /T SQA ·10 3 B) Suboptimal Optimal Figure 2.1: Pitfalls when detecting speedup. A) The typical (median) time to nd a ground state at least once with 99% probability for spin glasses with 1 couplings using SQA at constant annealing time. The lower envelope of the curves at constantt a corresponds to the total eort at an optimal size-dependent annealing time t opt a (N) and can be used to infer the asymptotic scaling. The initial, relatively at slope at xed N is due to suboptimal performance at small problem sizes N, and should therefore not be interpreted as speedup. Annealing times are given in units of Monte Carlo steps (MCS), corresponding to one update per spin. B) The speedup of SQA over SA for two cases. If SQA is run suboptimally at small sizes by choosing a xed large annealing time t a = 10000 MCS (dashed line) a speedup is feigned. This is due to suboptimal performance on small sizes and not indicative of the real asymptotic behavior when both codes are run optimally (solid line). 45 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Total time [μs] A) Range 1 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Total time [μs] B) Range 7 99% 95% 90% 75% 50% 10% 5% 1% Figure 2.2: Scaling of time to solution for the ranges r = 1 (panel A) and r = 7 (panel B). Shown is the scaling of the pure annealing time to nd the ground state at least once with a probability p = 0:99 for various quantiles of hardness, for simulated annealing (SA, dashed) and the DW2 (solid). The solid lines terminate for the highest quantiles because the DW2 did not solve the hardest instances for large problem sizes within the maximum number of repetitions (at least 32000) of the annealing we performed. 46 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 0.0 0.5 1.0 1.5 2.0 2.5 [T SA ] q /[T DW ] q ·512/N A) Range 1 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 [T SA /T DW ] 100−q ·512/N C) Range 1 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 0 2 4 6 8 10 12 14 16 [T SA ] q /[T DW ] q ·512/N B) Range 7 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2 [T SA /T DW ] 100−q ·512/N D) Range 7 1% 5% 10% 50% 75% 90% 95% 99% Figure 2.3: Speedup of the DW2 compared to SA. A) and B) for the ratio of the quantiles (RofQ), C) and D) for the quantiles of the ratio (QofR). For a random variableX with distributionp(x) and values inx2 [0;1) we dene, as usual, the qth quantile as R xq 0 p(x)dx = q=100, which we solve for x q and plot as a function of p N. In the QofR case we usex 100q so that high quantiles still correspond to instances that are hard for the DW2. We terminate the curves when the DW2 does not nd the ground state for large N at high percentiles. In these plots we multiplied Eqs. (2.3) and (2.4) by 512 so that the speedup value at N = 512 directly compares one DW2 processor against one classical CPU. An overall positive slope suggests a possible limited quantum speedup, subject to the caveats discussed in the text. A negative slope indicates that SA outperforms the DW2. 47 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 Total time T SA [ms] 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 Total time T DW [ms] A) Range 1 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 Total time T SA [ms] 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 B) Range 7 SA faster DW2 faster 0 2 4 6 8 10 12 14 16 18 20 Instances Figure 2.4: Instance-by-instance comparison of annealing times. Shown is a scatter plot of the pure annealing time for the DW2 compared to SA using an average over 16 gauges (see 2.5) on the DW2 for A)r = 1 and B)r = 7. The color scale indicates the number of instances in each square. Instances below the diagonal red line are faster on the DW2, those above are faster using SA. Instances for which the DW2 did not nd the solution with 10000 repetitions per gauge are shown at the top of the frame (no such instances were found for SA). 2.5 Supplementary Information 2.5.1 Annealing methods Quantum annealing The annealing schedules used in our work are shown in Figure 2.5. Simulated annealing (SA) During the annealing schedule we linearly increase the inverse temperature over time from an initial value of = 0:1 to a nal value of = 3r. For the case of1 couplings (r = 1), and forr = 3 we use a highly optimized multispin-coded algorithm based on Refs. [96, 97]. This algorithm performs up- dates on 64 copies in parallel, updating all at once. For the r = 7 simulations we use a code optimized for bipartite lattices. Implementations of the simulated annealing codes are available in Ref. [95]. We used the code an ms r1 nf for r = 1, the code an ms r3 nf for r = 3 and the code an ss ge nf bp for r = 7. 48 0.0 0.2 0.4 0.6 0.8 1.0 Time t/t f 0 1 2 3 4 5 6 7 8 9 Energy [Ghz] A) DW schedule A(t) B(t) T 0.0 0.2 0.4 0.6 0.8 1.0 Time t/t f 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Energy [a.u.] B) SQA schedule A(t) B(t)/r T Figure 2.5: Annealing schedules. A) The amplitudes of the transverse elds A i (t) [decreasing, blue] and the longitudinal couplings B(t) (increasing, red) as a function of time. The device temperature of T = 18mK is indicated by the black horizontal dashed line. B) The linear annealing schedule used in simulated quantum annealing. p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 11 Total time [μs] A) SQA, range 1 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 11 Total time [μs] B) SQA, range 7 99% 95% 90% 75% 50% 10% 5% 1% Figure 2.6: Scaling of time-to-solution for SQA. Shown is the time-to- solution A) for range r = 1 and B) for range r = 7. Simulated quantum annealing (SQA) The algorithm we used here is similar to that of Ref. [89], but uses cluster up- dates along the imaginary time direction, typically with 64 time slices. Our annealing schedule is linear, as shown in Figure 2.5B): the Ising couplings are ramped up linearly while the transverse eld is ramped down linearly over time. Our SQA results for ranges 1 and 7, complementing Figure 2 in the main text, are shown in Figure 2.6. All three annealing methods mentioned above are heuristic. They are not guaranteed to nd the global optimum in a single annealing run, but only nd it with a certain instance-dependent success probability s 1. We determine the true ground state energy using an exact belief propagation algorithm [98]. 49 N 238 N = 284, 332 N = 385, 439 N = 503 1000 2000 5000 10000 Table 2.1: Repetitions of annealing runs used on the DW2. This ta- ble summarizes the total number of repetitions used to estimate the success probabilities on the DW2 for various system sizes. We then perform at least 1000 repetitions of the annealing for each instance, count how often the ground state has been found by comparing to the exact re- sult, and use this to estimate the success probabilitys for each problem instance. See Table 2.1 for the exact number of repetitions performed on the DW2 device. 2.5.2 The D-Wave Two Vesuvius device 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 Figure 2.7: Qubits and couplers in the D-Wave Two device. The DW2 \Vesuvius" chip consists of an 8 8 two-dimensional square lattice of eight- qubit unit cells, with open boundary conditions. The qubits are each denoted by circles, connected by programmable inductive couplers as shown by the lines between the qubits. Of the 512 qubits of the device located at the University of Southern California used in this work, the 503 qubits marked in green and the couplers connecting them are functional. The annealing schedules A i (t) and B(t) used in the device are shown in 50 gure 2.5A). Our problem class The family of problem instances we use for our benchmarking tests employ couplings J ij on all edges of N = 8LL 0 -vertex subgraphs of the Chimera graph of the DW2, comprising LL 0 unit cells, with L;L 0 2f1;:::; 8g. We set the elds h i = 0 since nonzero values of the elds h i destroy the spin glass phase that exists at zero eld, thus making the instances easier [99]. We choose the values of the couplings J ij from 2r discrete valuesfn=rg, with n2fr;r 1;:::;1; 1;:::;r 1;rg, and call r the \range". Thus when the range r = 1 we only pick values J ij =1. This choice is the least susceptible to calibration errors of the device, but the large degeneracy of the ground states in these cases makes nding a ground state somewhat easier. At the opposite end we consider r = 7, which is the upper limit given the four bits of accuracy of the couplings in the DW2. These problem instances are harder since there are fewer degenerate minima, but they also suer more from calibration errors in the device. We also used an annealing time of 20s. 2.5.3 Gauge averaging Calibration inaccuracies cause the couplings J ij and h i that are realized in the DW2 to be slightly dierent from the intended and programmed values ( 5% variation). These calibration errors can sometimes lead to the ground states of the model realized in the device being dierent from the perfect model. To overcome these problems it is advantageous to perform annealing on the device with multiple encodings of a problem instance into the couplers of the device [15]. To realize these dierent encodings we use a gauge freedom in realizing the Ising spin glass: for each qubit we can freely dene which of the two qubits states corresponds to i = +1 and i =1. More formally this corresponds to a gauge transformation that changes spins z i ! a i z i , with a i =1 and the couplings as J ij ! a i a j J ij and h i ! a i h i . The simulations are invariant under such a gauge transformation, but (due to calibration errors which break the gauge symmetry) the results returned by the DW2 are not. If the success probability of one annealing run is denoted by s, then the probability of failing to nd the ground state after R independent repetitions (annealing runs) each having success probability s is (1s) R , and the total success probability of nding the ground state at least once in R repetitions is P = 1 (1s) R : (2.5) Thus the number of repetitions needed to nd the ground state at least once with probability P is found by isolating R in Eq. (2.5). Following [15], after splitting these repetitions intoR=G repetitions for each of G gauge choices with success probabilities s g , the total success probability 51 becomes P (G) = 1 G Y g=1 (1s g ) R=G : (2.6) If we use the geometric mean of the failure probabilities of the individual gauges to dene s = 1 G Y g=1 (1s g ) 1=G ; (2.7) then Eq. (2.6) can be written in the same form as Eq. (2.5): P (G) = 1 (1s) R : (2.8) We thus use the geometric mean s in our scaling analysis. 2.5.4 Annealing and wall-clock times A complementary distinction is that between wall-clock time, denoting the full time-to-solution, and the pure annealing time. Wall-clock time is the total time to nd a solution and is the relevant quantity when one is interested in the performance of a device for applications and has been used in Ref. [14]. It includes the setup, cooling, annealing and readout times on the DW2, and the setup, annealing and measurement time for the classical annealing codes. The pure annealing time for R repetitions is straightforwardly dened as t anneal =Rt a ; (2.9) where t a the time used for a single annealing run. It is the relevant quantity when one is interested in the intrinsic physics of the annealing processes and in scaling to larger problem sizes on future devices. In order to measure wallclock times on the DW2 we have performed tests with varying numbers of repetitionsR and performed a linear regression analysis to t the total wall clock time for each problem size to the formt p (N)+Rt r (N), wheret p (N) is the total preprocessing time and t r (N) is the total run time per repetition for an N-spin problem. The values of t p and t r are summarized in Table 2.2. With these numbers we obtain the total wall clock time for R annealing runs split over G gauges (with R=G annealing runs each) as t total (N) =Gt p (N) +Rt r (N): (2.10) To calculate pure annealing times for the simulated annealer we determine the total eort in units of Monte Carlo updates (attempted spin ips), and then convert to time by dividing by the number of updates that the codes can perform per second [95]. Our classical reference CPU is an 8-core Intel Xeon E5-2670 CPU, which was introduced around the same time as the DW2. To obtain wall-clock times we measure the actual time needed to perform a simulation on the same Intel Xeon E5-2670 CPU. Since the multi-spin codes 52 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Total time [μs] A) Range 1 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Total time [μs] B) Range 3 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Total time [μs] C) Range 7 99% 95% 90% 75% 50% (a) p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Total time [μs] A) Range 1 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Total time [μs] B) Range 3 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Total time [μs] C) Range 7 99% 95% 90% 75% 50% (b) Figure 2.8: Comparing wall-clock times. A comparison of the wall-clock time to nd the solution with probability p = 0:99 for SA running on a single CPU (dashed lines) compared to the DW2 [solid lines] using a single gauge choice in the left column and 16 gauges in the right column. A) for ranger = 1, B) for range r = 3, C) for range r = 7. Shown are curves from the median (50th quantile) to the 99th quantile. The large constant programming overhead of the DW2 masks the exponential increase of time-to-solution that is obvious in the plots of pure annealing time. 53 N t p [ms] t r [s] 8 14:7 0:3 51:0 0:2 16 14:8 0:3 53:0 0:2 31 14:8 0:3 57:9 0:2 47 14:9 0:4 60:6 0:2 70 15:0 0:4 64:5 0:2 94 15:2 0:3 68:3 0:2 126 15:6 0:2 73:1 0:2 158 15:5 0:2 78:0 0:2 198 15:5 0:2 80:8 0:2 238 15:7 0:2 83:5 0:2 284 15:8 0:2 83:6 0:1 332 16:0 0:3 87:1 0:2 385 16:6 1:0 87:1 0:6 439 16:6 0:1 90:4 0:1 503 16:6 0:2 90:5 0:1 Table 2.2: Wallclock times on the DW2. Listed are measured programming timest p and annealing plus readout timest r (for a pure annealing time of 20s) on the DW2 for various problem sizes. perform at 64 repetitions in parallel, we always make at least 1024 repetitions when running 16 threads on 8 cores. This causes the initially atter scaling in wall-clock times as compared to pure annealing times (Figure 2.8). The mea- sured initialization time includes all preparations needed for the algorithm to run, and the spin ip rate was computed for the 99% quantile for 503 qubits. For smaller system sizes or lower quantiles, the spin ip rate is lower since the problems are not hard enough to benet from parallelization over several cores. While not as interesting from a complexity theory point of view, it is in- structive to also compare wall-clock times for the above benchmarks, as we do in Figure 2.8. We observe that the DW2 performs similarly to SA run on a single classical CPU, for suciently large problem sizes and at high range val- ues. Note that the large constant programming overhead of the DW2 masks the exponential increase of time-to-solution that is obvious in the plots of pure annealing time. Considering the wall-clock times, the advantage of the DW2 seen in Fig- ure 4 A-B (in the main text) for some instances tends to disappear, since it is penalized by the need for programming the device with multiple dierent gauge choices. Figure 2.9A-F shows that for one gauge choice there are some instances, for r = 7, where the DW2 is faster, but many instances where it never nds a solution. Using 16 gauges the DW2 nds the solution in most cases, but is always slower than the classical annealer on a classical CPU for r = 1, as can be seen in Figure 2.9A and D. For r = 7 the DW2 is sometimes faster than a single classical CPU. Overall, the performance of the DW2 is better for r = 7 54 than for r = 1, and comparable to SA only when just the pure annealing time is considered. The dierence compared to the results of Ref. [14] is due to the use of optimized classical codes using a full CPU in our comparison, as opposed to the use of generic optimization codes using only a single CPU core in Ref. [14]. 2.5.5 Optimal annealing times As discussed in the main text we need to determine the optimal annealing time t opt a for every problem size N in order to make meaningful extrapolations of the time to nd a solution. To determine t opt a we perform annealing runs at dierent annealing times t a , determine the success probabilities s(t a ) of 1000 instances, and from them the required number of repetitions R(t a ) to nd the ground state with a probability of 99%. That is, we solve Eq. (2.8) for R while setting P (G) = 0:99: R(t a ) = log[1P (G) ] log[1 s(t a )] : (2.11) The total eort R(t a )t a diverges for t a ! 0 and t a !1 and has a minimum at an optimal annealing time t opt a . The reason is that for short t a the success probability s(t a ) goes to zero, which leads to a diverging total eort, while for large t a the time also grows since one always needs to perform at least one annealing run and the total eort is thus bounded from below by t a . In Figure 2.10 (left) we plot various quantiles of the total eort R(t a )t a for the simulated annealer as a function of t a to determine the optimal annealing time t opt a . For the DW2 we nd, as shown in Figure 2.10 (right) that the minimal annealing time of 20s is always longer than the optimal time and we thus always use the device in a suboptimal mode. As a consequence the scaling of time-to-solution is underestimated, as explained in detail in the main text. While one can in principle look for an optimal annealing time for each in- dividual problem instance, our approach is to instead determine an averaged optimal annealing time t opt a (N) for each problem size N by annealing many in- stances at various annealing timest a , and then use these for all future problems of that size. 2.5.6 Resource usage and speedup from parallelism In the main text we saw that in order to avoid mistaking a parallel speedup for a quantum speedup we need to scale hardware resources (computational gates and memory) in the same way for the devices we compare, and employ these resources optimally. These considerations are not universal but need to be carefully applied for each comparison of a quantum algorithm and device to a classical one. Here we provide several independent derivations that lead us to the same conclusion. 55 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 Total time T DW [ms] A) Range 1 Wall-clock time single gauge 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 D) Range 1 Wall-clock time 16 gauges 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Total time T DW [ms] B) Range 3 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 10 6 E) Range 3 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 Total time T SA [ms] 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 Total time T DW [ms] C) Range 7 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 Total time T SA [ms] 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 F) Range 7 SA faster DW2 faster 0 2 4 6 8 10 12 14 16 18 20 Instances Figure 2.9: Instance-by-instance comparison. Shown is a scatter plot of the total time for the DW2 device (DW) compared to a simulated classical annealer (SA) A and D) for r = 1, B and E) for r = 3, and C and F) for r = 7. A, B and C) wall-clock time using a single gauge on the DW2, and, D, E and F) wall-clock time using 16 gauges on the DW2. The color scale indicates the number of instances in each square. Instances below the diagonal red line are faster on the DW2, those above are faster classically. Instances for which the DW2 device did not nd the solution are shown at the top. SA found a solution for every instance of this benchmark. 56 10 2 10 3 10 4 10 5 10 4 10 5 10 6 10 7 10 8 10 9 Total annealing time [MCS] A) SA, range 1 10 1 10 2 10 1 10 2 10 3 10 4 10 5 10 6 Total annealing time [μs] C) DW, range 1 10 2 10 3 10 4 10 5 Annealing time [MCS] 10 5 10 6 10 7 10 8 10 9 10 10 Total annealing time [MCS] B) SA, range 7 10 1 10 2 Annealing time [μs] 10 2 10 3 10 4 10 5 10 6 Total annealing time [μs] D) DW, range 7 99% 95% 90% 75% 50% 10% 5% 1% Figure 2.10: Optimal annealing times for the simulated annealer and for the D-Wave device. Shown is the total eort R(t a )t a as a function of annealing timet a for various quantiles of problems withr = 1 andr = 7. A) and B) SA, where the minimum of the total eort determines the optimal annealing time t opt a . C) and D) DW2, where we nd a monotonically increasing total eort, meaning that the optimal time t opt a is always shorter than the minimal annealing time of 20s. Recall that in the main text we argued that the quantum part of speedup is estimated by comparing the times required by two devices with the same hardware scaling, giving S(N) = T C (N) T DW (N) / T SA (N) T DW (N) 1 N ; (2.12) which is Eq. (3) of the main text. The factor 1=N in the speedup calculation discounts for the intrinsic parallel speedup of the analog device whose hardware resources scale as N, compared to a xed size classical device used for the timings. Derivation assuming xed computational resources In the consideration of how to disentangle parallel and quantum speedup it may seem more natural to assume xed computational resources of a given device. We will show that this leads to the same scaling as Eq. (2.12). We might be tempted to dene the speedup in this case as S(N) = TSA(N) TDW(N) . However, in this manner only a fraction N=512 of the qubits are used while the classical code uses the available CPU fully, independently of problem size. This 57 suboptimal use of the DW2 may again be incorrectly interpreted as speedup. The same issue would appear when comparing a classical analog annealer against a classical simulated annealer. As in the discussion of optimal annealing times above, we need to ensure an optimal implementation to correctly assess speedup. For the DW2 (or a simi- larly constructed classical analog annealer) this means that one should always attempt to make use of the entire device: we should perform as many annealing runs in parallel as possible. Let us denote the machine size byM (e.g.,M = 512 in the DW2 case). With this we dene a new, optimized, annealing time T opt DW (N) =T DW (N) 1 bM=Nc ; (2.13) and the speedup in our case is then S(N) = T SA (N) T opt DW (N) = T SA (N) T DW (N) M N : (2.14) Omitting the oor function (bc), which only gives subdominant corrections in the limit M!1 we recover Eq. (2.12). Derivation from the scaling of the annealing time. The conclusion that the speedup function includes a factor proportional to 1=N is validated from yet another perspective, that focuses on the annealing time. Instead of embeddingCbM=Nc dierent instances in parallel, we can embed C replicas of a given instance. Each replica r (where r2f1;:::;Cg) results in a guess E r;i of the ground state energy for the ith run, and we can take E i = min r E r;i as the proposed solution for that run. If the replicas are independent and each has equal probability s of nding the ground state, then using C replicas the probability that at least one will nd the ground state is s 0 = 1 (1s) C , which is also the probability that E i is the ground state energy for the ith run. Repeating the argument leading to Eq. (2.11), the number of repetitions required to nd the ground state at least once with probability p is then: R 0 = log(1p) log(1s 0 ) = log(1p) C log(1s) ; (2.15) while R = l log(1p) log(1s) m . It is easy to show that R 0 1R=CR 0 . Focusing on the pure annealing time we have T opt DW (N) = t a R 0 and T DW (N) = t a R, which yields Eq. (2.13) in the limit of large R. 2.5.7 Arguments for and against a speedup on the DW2 Let us consider in more detail the speedup results discussed in the main text and above. We have argued that the apparent limited quantum speedup seen in the 58 r = 1 results of Figure 3 in the main text must be treated with care due to the suboptimal annealing time. It might then be tempting to argue that, strictly speaking, the comparison with suboptimal-time instances cannot be used for claiming a slowdown either, i.e., that we simply cannot infer how the DW2 will behave for optimal-time instances by basing the analysis on suboptimal times only. However, let us make the assumption that, along with the total time, the optimal annealing time t opt a (N) also grows with problem size N. This assump- tion is supported by the SA and SQA data shown in Figure 1 in the main text and Figure 2.14, and is plausible as long as the growing annealing time does not become counterproductive due to coupling to the thermal bath [100]. By denition, T DW (N;t opt a (N)) T DW (N;t a ), where we have added the explicit dependence on the annealing time, and t a is a xed annealing time. Thus S(N) = T C (N) T DW (N;t a ) 1 N (2.16) T C (N) T DW (N;t opt a (N)) 1 N =S opt (N): Under our assumption, t opt a (N) < t a for small N, but for suciently large N the optimal annealing time grows so that t opt a (N) t a . Thus there must be a problem size N at which t opt a (N ) = t a , and hence at this special problem size we also have S(N ) =S opt (N ). However, the minimal annealing time of 20s is longer than the optimal time for all problem sizes (see Figure 2.10), i.e., N > 503 in our case. Therefore, if S(N) is a decreasing function of N for suciently large N, as we indeed observe in all our RofQ results (recall Figure 3A and B in the main text), then sinceS opt (N)S(N) andS(N ) =S opt (N ), it follows thatS opt (N) too must be a decreasing function for a range ofN values, at least until N . This shows that the slowdown conclusion holds also for the case of optimal annealing times. For the instance-by-instance comparison (QofR), no such conclusion can be drawn for the subset of instances (atr = 1) corresponding to the high quantiles whereS QofR q (N) is an increasing function of N. This limited quantum speedup may or may not persist for larger problem sizes or if optimal annealing times are used. 2.5.8 Additional Discussion In this work we have discussed challenges in properly dening and assessing quantum speedup, and used comparisons between a DW2 and simulated clas- sical and quantum annealing to illustrate these challenges. Strong or provable quantum speedup, implying speedup of a quantum algorithm or device over any classical algorithm, is an elusive goal in most cases and one thus usually denes quantum speedup as a speedup compared to the best available classical algorithm. We have introduced the notion of limited quantum speedup, referring 59 to a more restricted comparison to `corresponding' classical algorithms solving the same task, such as a quantum annealer compared to a classical annealing algorithm. Quantum speedup is most easily dened and detected in the case of an exponential speedup, where the details of the quantum or classical hardware do not matter since they only contribute subdominant polynomial factors. In the case of an unknown or a polynomial quantum speedup one must be careful to fairly compare the classical and quantum devices, and, in particular, to scale hardware resources in the same manner. Otherwise parallel speedup might be mistaken for (or hide) quantum speedup. An experimental determination of quantum speedup suers from the prob- lem that all measurements are limited to nite problem sizes N, while we are most interested in the asymptotic behavior for large N. To arrive at a reliable extrapolation it is advantageous to focus the scaling analysis on the part of the execution time that becomes dominant for large problem sizes N, which in our example is the pure annealing time, and not the total wall-clock time. For each problem size we furthermore need to ensure that neither the quantum device nor the classical algorithm are run suboptimally, since this might hide or fake quantum speedup. If the time to solution depends not only on the problem size N but also on the specic problem instance, then one needs to carefully choose the relevant quantity to benchmark. We argued that in order to judge the performance over many possible inputs of a randomized benchmark test, one needs to study the high quantiles, and dene speedup by considering the ratio of the quantiles of time to solution. If, on the other hand, one is interested in nding out whether there is a speedup for some subset of problem instances, then one can instead perform an instance-by-instance comparison by focusing on the quantiles of the ratio of time to solution. We chose to focus here on the benchmark problem of random zero-eld Ising problems parametrized by the range of couplings. We did not nd evidence of limited quantum speedup for the DW2 relative to simulated annealing in our particular benchmark set when we considered the ratio of quantiles of time to solution, which is the relevant quantity for the performance of a device as an optimizer. We note that random spin glass problems, while an interesting and important physics problem, may not be the most relevant benchmark for practical applications, for which other benchmarks may have to be studied. When we focus on subsets of problem instances in an instance-by-instance comparison, we observe a possibility for a limited quantum speedup for a frac- tion of the instances. (The fact that some problems are solved faster on classical hardware and some on the DW2 raises the possibility of a hybrid approach that benets by solving a problem instance using both, and then selecting the best solution found.) However, since the DW2 runs at a suboptimal annealing time for most of the corresponding problem instances, the observed speedup may be an artifact of attempting to solve the smaller problem sizes using an excessively long annealing time. This diculty can only be overcome by xing the issue of suboptimal annealing times, e.g., by nding problem classes for which the 60 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 -2 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Total time [μs] A) SA, range 3 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 11 B) SQA, range 3 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 C) DW, range 3 99% 95% 90% 75% 50% 10% 5% 1% Figure 2.11: Scaling of time-to-solution for r = 3. Shown is the scaling for the time to nd the ground state with a probability of 99% for various quantiles of hardness for A) the simulated annealer, B) the simulated quantum annealer, and C) the DW2 device. p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 0 1 2 3 4 5 6 7 8 [T SA ] q /[T DW ] q ·512/N 1% 5% 10% 50% 75% 90% 95% 99% (a) p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 10 4 [T SA /T DW ] 100−q ·512/N 1% 5% 10% 50% 75% 90% 95% 99% (b) Figure 2.12: Speedup for the DW2 device compared to SA for instances with r = 3. 16 gauges were used. Left: the ratio of quantiles (RofQ). Right: quantiles of the ratio (QofR). The results are intermediate between the r = 1 and r = 7 results as discussed in the text. annealing time is demonstrably already optimal. Alternatively, other problem classes might exhibit a speedup. For example, it is possible that a randomized benchmark with a tunable \hardness knob", such as the clause density in the MAX-2SAT problem [16], will allow for a more ne-tuned exploration of the performance potential of a quantum annealer than a purely randomized bench- mark such as we have used here. It was also proposed that embedding 3D spin-glass problems into the Chimera graph, with a nite critical temperature, might yield problem classes that are better suited as benchmarks for quantum annealing [93]. 2.5.9 Additional scaling data: range 3 Scaling plots for range 3 instances, requiring 3 bits of precision in the couplings, are shown in Figure 2.11, complementing Figure 2 in the main text. Figure 2.12 61 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Total time T SA [ms] 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Total time T DW [ms] A) Figure 2.13: Instance-by-instance comparison for r = 3. Comparison between time-to-solution for SA and DW2 using pure annealing times and using 16 gauges. (left) displays the ratio of quantiles and, like Figure 3 in the main text, does not exhibit a limited quantum speedup. Figure 2.12 (right) displays the results for the quantiles of ratio of time-to-solution. The results are intermediate between those seen in Figure 3 in the main text for r = 1; 7, namely, while for r = 1 there appears to be a limited quantum speedup (relative to SA) for the higher quantiles, this speedup disappears forr = 7; forr = 3 we observe a attening of the speedup curves starting at the 10th percentile, while the higher percentiles bend down. The same suboptimality remarks discussed in this context in the main text apply. We have obtained similar results (not shown) for ranges r = 2; 4; 5; 6 and for instances including random longitudinal elds. Figure 2.13 shows the instance-by-instance comparisons for the pure anneal- ing time. The conclusions are very similar to those obtained based on Figure 4 in the main text using the r = 1; 7 data. 62 p 8 p 32 p 72 p 128 p 200 p 288 p 392 p 512 Linear problem size p N 10 -2 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Total time [μs] A) SA, median Annealing time [MCS] 5 10 50 100 200 500 1000 2000 Optimal scaling Figure 2.14: Optimal annealing times for SA. Shown is the time-to-solution for various quantiles as a function of annealing time (Monte Carlo steps) for SA range r = 1. This supplements Figure 1 in the main text. 63 Chapter 3 Benchmarking or: How I learned to stop worrying and love Bayesian nonparametrics and optional stopping Benchmarking is a dicult task, as partly discusse at the end of chapter 1, and if one does it naively one can easily fool oneself into believing false things with high condence. In the hopes of aligning data analysis and data collection procedures across our community, this chapter presents a basic overview of the eects of gauges, the modeling of benchmarking time to solution (TTS) for algorithms for Ising-type problems, Bayesian nonparametric modeling, and applies the insights learned to the benchmarking problem. The nal procedure for data analysis is quite similar to previous methods, while the data collection procedure is quite dierent. In general, the conclusion for quantum annealers is that one should collect 100 anneals for each programming cycles/gauge, unless there is a specic rea- son to change it, and continue running until the coecient of variation of one's posterior distribution for probability of success or TTS falls below a threshold of 0.2 (or 0.1). Estimates of the posterior should be performed by drawing sam- ples from the distribution dened by rst sampling a value from each gauge's posterior distribution forp s , Beta(x; 100x) forx successes in that gauge, and then performing a weighted average with weights drawn from the Dir(1,1,1,...,1) distribution (a Bayesian bootstrap). The development of the D-Wave line of quantum annealers has sparked a urry of activity in the optimization and quantum computation community, and spurred a small cottage industry around benchmarking their performance. This 64 chapter is not concerned with the results of any such analysis, but with the method of analysis itself. The goal is to lay out guidelines which will hopefully aid all future researchers in this eld benchmark future generations of D-Wave devices, as well as competing optimization algorithms, in a consistent and stan- dard way. This chapter is organized into three parts. The rst addresses the problem of data analysis alone, introducing the benchmarking problem formally again and discussing the ideal solution as well as realistic solutions forced upon us by practical necessities. It includes a brief introduction to the eld of Bayesian nonparametrics and its bread and butter, the Dirichlet distribution and the asso- ciated Dirichlet process, and grounds the analysis procedure in this eld, while exploiting a technique which is simpler computationally called the Bayesian bootstrap. It concludes with a comparison between the Bayesian and classical bootstraps, arguing that the classical bootstrap is conceptually awed and com- putes a distribution we, as scientists, don't care about (the sampling distribution of a statistic) while the Bayesian bootstrap gives us the distribution we typically think the classical bootstrap gives us | namely a posterior distribution on the parameter or function of interest. The second part addresses the question of data collection procedures, and outlines an approach based on optional stopping, where one uses the data itself to determine when one has enough data, on the y, instead of specifying sample sizes in advance. While this has the downside of resulting in an estimator with a small bias, this bias is controllable (it decreases linearly with the number of gauges sampled), small (a few percent at most), and acts eectively as an overall constant factor on the TTS rather than as an element with scaling. It thus has no bearing on scaling results, which involve tting to an exponential. Moreover, the resulting posterior credible intervals have very good frequentist coverage properties. The chief benet is a potentially enormous time savings when one seeks to learn the TTS for most instances in a collection. The third section presents the conclusions of a simulation study using various articial distributions meant to highlight various possible cases and to validate the collection procedure and recommendations given in the rest of the paper. A few notes on style before we begin: First, this paper is written in a tone more similar to a presentation than a traditional paper. Second, I use \I" throughout to refer to the author, and \we" to denote \the reader and I". It is also rather opinionated. As this chapter has not appeared in any form in a publication, and as it was wholly original and written solely by the author of this dissertation, it will likely be the most idiosyncratic chapter. 3.1 Please lie down on the couch: Data analysis Suppose that you have a problem, H whose properties depend on some set of parameters which you control, call them , as well as a set of nuisance param- eters you do not control and/or do not care about. A solver takes in some problem H(;) and outputs a trial solution~ s under some set of solver param- 65 eters. We have some function which maps a solution~ s into a binary output s = 0; 1 for \failure" and \success" respectively. Note that this language may be misleading | any binary classier on the samples ~ s qualies. Applying the map to all solutions sampled from the solver, and we can represent a solver as a simple generator of a sequence of 0s and 1s. The problem of interest is to learn the probability under some dened distri- bution over the parameters that a given solution will be classied as a \success" or 1. In other words, we seek to estimatep s (;) =hPr(s = 1jH(;);)i , and thus also the time to solution (related top s through a nonlinear transformation log(0:01)=log(1p s ), as dened in previous chapters). Here, could be the list of all couplers specifying the Hamiltonian, but it could also include meta-parameters such as the range of random Ising problems, the loop density in the frustrated loop planted solutions, chain strength in an embedded problem, etc. Any property of an instance which, contextually, we wish to account for and track explicitly. The constitute variables to be av- eraged over, like the physical couplers of the programmed Hamiltonian (which are inaccessible and taken to be stochastic). Certain properties may lie on either side of this boundary depending on context. If one is studying the correlation between particular embeddings of full connected problems into Chimera, then embeddings would be in . If one instead simply views dierent embeddings as dierent but eectively equal rep- resentations of the same underlying problem, then the embedding would be in. Variables like annealing time, number of sweeps, initial and nal temperature, number of replicas in parallel tempering, number of Trotter slices in PIMC, etc. constitute . The task of benchmarking perfomance on a particular instance is, to repeat in this language, to estimate p s (;), or, for xed , simply p s (). 3.1.1 Paradiso: the ideal solution If one resamples for each trial according to its intrinsic distribution, then the expected probability of success at any trial is exactly p s (this is, indeed, what it means to \marginalize" over a variable, i.e. to average over it). By resampling , we eectively render the success or failure of each new solution a simple Bernoulli trial. We then seek to estimate the probability of success of this Bernoulli trial. To do so, the default procedure is to choose a conjugate prior to the Bernoulli distribution, namely a Beta distribution, and update via Bayes rule on the evidence from repeated trials. Conjugate priors have the helpful property that the posterior distribution will be of the same family as the prior after updating on new evidence [101]. For the Beta distribution with a priorBeta(c 1 ;c 2 ) and a sequence of N trials with x successes, the posterior is simply Beta(x +c 1 ;Nx +c 2 ). c 1 can thus be considered the prior's number of pseudosuccesses and c 2 the prior's number of pseudofailures. This can be immediately seen, as the probability density p of Beta(;) is p/p 1 (1p) 1 , and the likelihood of observingx successes from a Binomial random variable in N trials is simply/p x (1p) Nx . 66 To review for those unfamiliar with Bayesian terminology, a prior is one's initial information or state of belief about a value, and a posterior is one's information after adjusting one's views according to the rules of probability after acquiring new information. The choice of prior has negligible impact to the posterior distribution when c 1 x and c 2 Nx. Generally, c 1 = c 2 = c, and there are three common choices each with arguments in its favor { c = 0, 1 2 , or 1. c = 1 is the uniform distribution on [0; 1]. c = 1 2 is the arcsine distribution, and is the Jerey's prior for the Bernoulli/Binomial distribution, namely the prior which is the square root of the determinant of the Fisher information matrix (and is thus invariant under reparameterization of the distribution). Finally, c = 0 is the Haldane prior, which is a half-delta function at 0 and another at 1. The Haldane distri- bution is the only one for whom the mean of the posterior is equal to x=N, in general it is equal to x+c1 N+c1+c2 for aBeta(c 1 ;c 2 ) prior. The standard deviation is (x+c1)(Nx+c2) (N+c1+c2) 2 (N+c1+c2) 1 2 . The coecient of variation, = = q Nx+c2 (x+c1)(N+c1+c2) . If c 1 and c 2 are small relative to Nx and N, this is approximately 1 x 1 N , and for large N is just 1 x . We immediately see an important lesson: the coecient of variation of p s is controlled by the number of successes that we see. Thus, if we wish to estimate to within a xed variation, we require the number of successes to be approximately constant. 3.1.2 The realistic solution While a method of operation like the above is available to us for some algorithms (those on classical hardware for instance), to resample at each trial on D-Wave hardware is untenable due to the long programming time. If run in such a mode, the D-Wave chip will be annealing only 1% of the time. In a given time T , one can perform G = T=( prog + anneal R) gauges where prog 15ms is the programming time per gauge, and anneal 150ms is the total time per anneal (including readout). For R = 1 you can perform approximately 66 gauges per second with 66 samples. ForR = 100 you can perform approximately 33 gauges per second with 3300 samples, and forR = 200 you can perform approximately 22 gauges with 4400 samples. One sees there is a shift, from a moderate decrease in the number of gauges causing enormous increases in the number of samples (and thus better probability estimates) to moderate tradeos in the number of gauges corresponding to moderate increases in the number of samples. For R> 1000 this tradeo is quite catastrophically large, moving from R = 1000 to R = 50000 increases the number of samples per second by less than 10% while decreasing the number of gauges per unit time by roughly a factor of 50. Since we have no interest in the individual probability per gauge except what it tells us about the mean over, we should expect that restricting ourselves toR 100 will yield the most ecient inferences. Later simulation studies will show that quite generally R = 100, i.e. that R such that the quantum annealer spends approximately half of it's time programming and half annealing, is indeed the 67 optimal number of runs per gauge for our probability estimates to converge with the minimum wall-clock time, i.e. the minimum eort. Thus, out of practical necessity we must perform multiple trials from each value of and then take an average to estimate p s . For each set of trials (equivalent to a programming cycle on the D-Wave processor; I will call them \progcycles" or \gauges" interchangeably) we can use a Beta distribution to encode our knowledge about the probability of success for that progcycle, and then compute our belief about the average value over all progcycles. Now, the choice of prior is no longer practically irrelevant (unlike the ideal case). Since we know we cannot run an unlimited number of anneals with a single progcycle we will inevitably reach a point where the overwhelming majority of our progcycles have zero successes. If our prior on p success for a gauge has any bias in the average value, then we'll have a bias of the same magnitude in our estimate for the average,p s . Thus, for very dicult problems, with low success probabilities, our prior will swamp our data and we will learn nothing no matter how many anneals we perform. Thus, we are forced to choose the Haldane prior,Beta(0; 0), for the prior on the probability of success of each gauge, as it is the only Beta distribution prior whose mean is equal to the empirical success rate and thus is not biased by prior information. 3.1.3 I was blind but now I see: Bayesian nonparametrics We can construct a Bayesian model for our problem: Y i Bin(R;p i ) are the number of successes for each progcyclei given some probability of successp i F for some unknown distribution F . To complete a full Bayesian model speci- cation, we must specify a prior on F . This may at rst seem quite dicult, even impossible, as F can in principle be any probability distribution dened on the interval [0; 1]. Moreover, if the prior isn't conjugate it may be infeasible to compute posterior densities. There is, however, a density which is conjugate to iid sampling, and it is called the Dirichlet process. The Dirichlet Distribution Before discussing the Dirichlet process (DP) in more detail, allow me rst to quickly introduce a related distribution: the Dirichlet distribution. Denoted by Dir(~ ) for an arbitrary vector of positive real parameters ~ , the Dirichlet distribution is conjugate to the categorical and multinomial distributions. Suppose you wish to estimate the probability of rolling each side of a six- sided die. Just as for the Beta distribution there are a set of common choices of prior Dirichlet distribution priors. The one whose mean corresponds to the maximum likelihood estimate (MLE) is the one with all 0s in ~ [84]. The posterior with this prior, assuming one saw x i instances of the numberi after a total of P i x i =N rolls of the die will be Dir(x 1 ;x 2 ;x 3 ;x 4 ;x 5 ;x 6 ). The marginal density for p(ij~ x) =Beta(x i ;Nx i ), peaked at the empirical probability of success. The resulting posterior density will be a smooth version of the posterior 68 derived from the classical bootstrap (which is discussed later, but is simply resampling values from your empirical distribution [102]). An important property of the Dirichlet distribution is that one can reparti- tion the space of outcomes and the result will still be Dirichlet. For instance, the distribution of \(1 OR 2), 3, 4, 5, 6" would be given by Dir(x 1 +x 2 ;x 3 ;x 4 ;x 5 ;x 6 ) (while the corresponding conditional distribution just on 1 and 2 would be Dir(x 1 ;x 2 )). This ability to regroup elements in the space of outcomes at will and easily compute marginal distributions is one of the reasons why the Dirichlet distribution and it's extension, the Dirichlet process, are so useful in Bayesian statistics. The Dirichlet process was originally proposed in [103]. Since, there have been a number of characterizations of it, but for our purposes here we need only deal with the rst. A Dirichlet process DP(F 0 ) is a probability distribution on the support of the \base distribution" F 0 that is almost surely discrete. It has the property that for any partition of the support ofF 0 , call it (X 1 ;X 2 ;:::;X N ), one nds that realizations F of DP (F 0 ) obey the following property: (F (X 1 );F (X 2 );:::;F (X N ))Dir(F 0 (X 1 );:::;F 0 (X N )) , whereA(X) denotes the total support of whatever distributionA on the inter- val X. This rule applies for every partitionfXg simultaneously. While at rst it is not obvious such a process is possible, it is and it is actually fairly trivial to sample realizations to arbitrary accuracy from a Dirichlet Process. The Dirichlet process is conjugate to iid sampling. If one samples a set of N values fromFf i g N i=1 , one's posterior distribution forF is DP(F 0 + P i i ), where denotes an atom with weight 1 on the point . Thanks to the above partitioning property, one can rewrite this (at least to arbitrary accuracy) as DP F 0 + X i i =p DP(F 0 ) + (1p)DP X i i p Beta(;N) by separating the pointsf i g and arbitrarily small regions around each point into one partition and the rest of the interval (which is eectively the entire interval, minus N holes) into another. The distribution DP( P i i ) is quite interesting, in that it eectively only has support on a nite set of points. Indeed, it is identical to a Dir( ~ 1 N ) (N 1's) distribution on the collection of points i . We will return to this distribution shortly. (Note, to my knowledge this is actually an original representation of the posterior of a DP.) What role do and F 0 have? acts as a strength parameter, essentially representing how condent we are that the true distribution is in the \vicinity" ofF 0 . If is very large, it will take quite a lot of data for the data to overwhelm the prior (as seen from the weights between the components of the posterior, namely Beta(;N)). If is very small, the prior has negligible in uence on the posterior estimates even at small sizes. In the limit ! 0, the distribution becomes purely the above Dir( ~ 1 N ) on the pointsf i g. This distribution has been 69 given another name, the Bayesian bootstrap, introduced in Ref [84], and will henceforth be denoted asBB( ~ ) The Bayesian bootstrap has no free parameters, and is wholly determined by the data vector ~ . One can see that the Bayesian bootstrap also dominates the Dirichlet Process in the limit of N!1. With all these preliminaries in mind, we can complete our Bayesian model specication as Y i Bin(R;p i ) p i F F DP(F 0 ) for some values of andF 0 denoting our prior beliefs (which may, in principle, vary from person to person). This problem was actually investigated in [104], where they computed an approximation to the true posterior distribution. [105] demonstrated an importance sampling technique to attempt to approximate the posterior for conjugate priors F 0 . In the last two decades, additional MCMC techniques have been developed for inference in the general case for Dirichlet processes [106]. These have several disadvantages, however. First and foremost, these MCMC approaches are generally quite slow to converge, and indeed can take minutes, hours, and even days when one has tens of thousands of data points (as would be the case for benchmarking fairly hard problems, with success probabilities in the 10 5 range). This is simply impracticle for our datasets. A number of approximation techniques have been proposed, such as [107]. Approximations don't overcome the other challenge, however, which is elic- iting sensible prior values for and F 0 . Often researchers place hyperpriors on these, and learn their values, too, from the data, but this only makes the conver- gence problem of the MCMC algorithms worse. Moreover, our community is not composed solely of Bayesians, and soliciting good priors is a notoriously dicult problem in any case. Rather than attempt to do so, it is my suggestion that we instead adopt the Bayesian bootstrap directly as our inference mechanism. 3.1.4 Why the Bayesian bootstrap? This is justied in several ways. First and foremost, the Bayesian bootstrap is the only way the data enters into the posterior of the Dirichlet process for a given sample fromF ~ . Since our prior will obviously be weak (as we don't have any good idea of what F should look like a priori), and we will want to take a large amount of data in any case, the posterior will likely be very near the Bayesian bootstrap anyway. ! 0 serves as a fairly natural noninformative prior for Dirichlet processes, acting in a manner similar to the Haldane and Dir( ~ 0) priors for Binomial and Multinomial distributions. Finally, this prior is extremely tractable and exible. Since we know that each element of ~ must be distinct (as they are simply the true probabilities of success for each gauge we ran and the underlying distribution is almost surely 70 continuous and nonatomic), we can treat each i independently, deriving a pos- terior distribution for each and then performing a Monte Carlo average. The posterior for i is Beta(x i ;Rx i ) given a Haldane prior, which also corresponds to taking the likelihood function L(x i j i ) and interpreting it as a distribution on i . We would then place a Dir( ~ 1) distribution on the weight of each sample in our posterior estimation. That is: i Beta(x i ;R i ) w i Dir(1; 1; 1;:::; 1) p s = X i w i i MC simulations of this type can be performed in seconds even with tens of thousands of data points, making them suitable for use in online learning of our parameters and their credible intervals. Indeed, due to the decomposition property of the Dirichlet distribution, one can even update an existing ensemble of points sampled from one's posterior to update to a new sample from one's posterior on the acquisition of new information, leading to eective sample times of well under a second for thousands of realizations. Let us imagine that for some problem, x = R or x = 0 for all of N gauges run. Ifx =R, one's posterior for i is a delta function at 1, while forx = 0 it is a delta at 0. As a result, one can regroup elements in the Dirichlet distribution into \deltas at 0" and \deltas at 1", yielding Dir(NX;X), whereX being the number of gauges with all 1s. The sampled values from p s will then be (1p) forp Beta(NX;X), i.e. Beta(X;NX). Notice that if one setsR = 1, one is actually performing the ideal experiment and the analysis procedure yields a posterior that is exactly the posterior for the ideal case (with a Haldane prior). This is an extremely attractive feature, and another source of justication for using the Bayesian bootstrap. How does the Bayesian bootstrap compare to the classical bootstrap? 3.1.5 To Bayes, or not to Bayes, that is the question: Bayesian v. classical bootstrap While the Bayesian bootstrap has a natural connection to Bayesian non-parametrics, it isn't obvious what the payo is over the classical bootstrap unless one is a committed Bayesian and frequentism makes one queasy. Why bother when we already have the classical bootstrap if they're so similar? To explain the dif- ferences, let's review the classical bootstrap and compare the assumptions each bootstrap makes. In a classical bootstrap [102], one makes the following assumptions: 1. All values that may be sampled from the distribution have been. (One has seen every type in the population.) 71 2. All frequencies of the types in the population are exactly those in the empirical sample. One then estimates the distribution for a parameter of interest under these assumptions. One does so by creating new pseudodata vectors by sampling with replacement N values from the original vector (of length N). A key point is that the classical bootstrap mimics the sampling distribution of a parameter, not the distribution of the parameter in the population. In the case of benchmarking this distinction proves fatal for the classical bootstrap. Take, for instance, the case where one has 10 gauges, 9 of whom have 0 successes and 1 of whom has all successes, after R trials in each gauge. We know the Bayesian bootstrap will yield Beta(1; 9) as discussed above. For the classical bootstrap, we would be sampling from 1 n Bin(10; 0:1). A plot of this is provided in 3.1. 0.0 0.1 0.2 0.3 0.4 0.5 0 1000 2000 3000 4000 y1 Figure 3.1: Results from 10000 frequentist bootstrap samples from a sample of 10 gauges, nine with zero successes and one with all successes. If one tried to interpret this as a distribution of one's uncertatiny about the mean success probability, averaged over all gauges, then one has signicant weight on the mean success rate is 0, even though one actually observed successes, a clear contradiction. As you can see, you have non-negligible support on p s = 0. But of course, this isn't p s , the gauge-marginalized mean success probability, this is our boot- strapped estimate of the sampling distribution for our statistic. The practitioner of the classical bootstrap is left with few options when attempting to move to 72 sampling from the distribution for time to solution, which goes like 1=p s and thus has weight on TTS=1. Any time there is a gauge with all zeros, the classical bootstrap will place support on innite time to solution, whether we happen to have sampled those elements from the distribution or not. Thus, while we may be able to \get away with it" in the sense that using the classical bootstrap won't always fail in practice due to the presence of a single 0, the fact remains that we know analytically the true mean of the bootstrap estimate for TTS is1. This gets to a deeper issue: if the classical bootstrap is supposed to somehow represent my knowledge or information about p s , then it clearly fails. We have seen successes, yet the frequentist bootstrap places nonzero weight on the belief that the average probability of success is 0, and thus observing a success is impossible. But I've seen one. More than one in fact! It strikes one as incredibly odd that one would use a procedure which seems to predict that there is a chance one couldn't possibly have observed a value one did, in fact, observe. Fundamentally, we aren't interested in the sampling distribution of our pa- rameter of interest; we're interested in learning about the parameter itself, i.e. a posterior distribution, which is quite dierent conceptually and, as one will see, in practice. Often, statisticians can get away with ignoring the dierence, but one does so at one's own peril (such as by attempting to produce a scaling plot of TTS and having an error bar that goes to1). One may still counter \So long as the weight on 0 is small enough, it won't matter for practical purposes." To couch the dierences in a more pragmatic context, consider also the case where we have 100 gauges (normally considered plenty for problems with appreciable probability of success in our eld, sayp s > 0:01), each with 2000 anneals, and a distribution of probability of success chosen from a mixture of Normal(0:05; 0:003) with weight 0:95 and Normal(0:4; 0:01) with weight 0:05, and observe what the classical bootstrap has to say about p s in this case as well 3.2 As you can see, the sampling distribution has a number of peaks which one would not expect from a posterior distribution for a mean. These arise due to the discrete nature of the classical bootstrap | in each sample, the bootstrap has sampled a certain discrete number of the large-mean distributions. The variation in the number of samples from Normal(0:4; 0:01) overwhelms the variation over the other samples, eectively creating a Binomial distribution on the number of said gauges with Gaussian noise from the other gauges centered on each discrete value. While this example can be \xed" by simply collecting more data (in this example, 200 gauges would be approximately sucient), the fundamental prob- lem will remain | frequentist statistics estimate uncertainty due to sampling, not our uncertainty about parameters themselves. Rubin [84] proposed the Bayesian bootstrap. It can be understood (entirely separate from the above derivation from the Dirichlet process) as based on two assumptions: 1. All values that may be sampled from the distribution have been. (One 73 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0 20 40 60 80 100 120 y1 Figure 3.2: Results from 1000 frequentist bootstrap samples from a set of 100 gauges whose success probability was sampled from a mixture distribution of Normal(0:05; 0:003) with weight 0:95 and Normal(0:4; 0:01) with weight 0:05. The multiple peaks arise from the discrete binomial distribution over samples from the high probability component. has seen every type in the population.) 2. The true population distribution is unknown. We merely have sampled G times and seen all G types (as per 1). To encode (2) mathematically, we recall that the conjugate prior to the categorical and multinomial distributions is the Dirichlet distribution (whose equivalent of the Haldane prior is Dirichlet(0; 0; 0; 0; 0;:::)). Thus, if we take the standard prior for the G-dimensional Dirichlet distribution and update based on having seen each type once, we get the Dir(1; 1; 1; 1;:::; 1) distribution (for convenience, we will refer to this distribution as Dir( ~ 1) where the dimension is implicit), aka the Bayesian bootstrap as dened above from DPs. Rather than sampling a vector of nite length at each sample as in the classical bootstrap, one samples from the Dirichlet distribution to nd the weights for each element in the population to get a trial population on which to compute whatever statistic one desires. The major practical benet of the Bayesian bootstrap is that it yields con- tinuous results for the mean for any continuous probability distribution, and has zero weight on all probability densities which have zero weight on an observation | that is, if you observed a success, the Bayesian bootstrap will never sample 74 p s = 0. It also has a more reasonable assumption than the classical bootstrap | rather than asserting certainty about the distribution in the population, it assumes that we are ignorant of this distribution and uses only the knowledge we have. Finally, rather than representing some sampling distribution for a statistic, we are computing a genuine posterior distribution for our information about the statistic. Let's examine the cases where the classical bootstrap broke down again, to see the advantages of the Bayesian bootstrap. In the case of 10 gauges with 9 having no successes and one having only successes, as discussed above, the posterior will be Beta(1,9). Beta(1,9) has no support on 0, i.e. there is no weight on TTS =1. The Bayesian bootstrap never claims it may be impossible to observe observed values. In the second case, we have the result in gure 3.3. 0.06 0.08 0.10 0.12 0 100 200 300 400 y1 Figure 3.3: Results from 10000 Bayesian bootstrap samples from a set of 100 gauges whose success probability was sampled from a mixture distribution of Normal(0:05; 0:003) with weight 0:95 and Normal(0:4; 0:01) with weight 0:05. It is now a smooth density, unlike in the frequentist case. As one can see, this is smooth and continuous, due to the continuous na- ture of the Bayesian bootstrap, which is also a far more reasonable posterior distribution for a mean. The Bayesian bootstrap can be nested, just like the classical bootstrap. For instance, one may desire to compute the posterior for the median over a class of instances, such as range-1 random Ising problems forC 8 chimera graphs. To do so, for each instance sample a value from the Bayesian bootstrap distribution 75 for p s . Then, compute the median over a population with those values of p s with population weights sampled from the Dir( ~ 1) (this may be very dierent than the median over the vector of sampledp s alone) to extract a single sample of the median, and repeat many times to construct a Monte Carlo estimate of the posterior for the median. In general posteriors estimated in this way will be signicantly smoother than estimates generated from the classical boot- strap, though may still be multimodal by the essentially discrete nature of order statistics on nite populations. So long as the practitioner doesn't want weight on the problem being impos- sible after having solved it or expects posterior estimates of means to be smooth and unimodal (i.e. everyone) then the practitioner should use the Bayesian bootstrap rather than the classical bootstrap. Now that we know how to estimate p s , it is time to draw this paper to a close. Given an arbitrary distribution we have a technique to estimate p s and the benchmarking problem, at least as stated above, is solved. But a way to analyze data you already have isn't really a full solution to the real benchmarking problem. We know what to do if someone hands us a collection of data. But our goal is to perform an investigation ourselves. How should one go about collecting the data in the rst place? 3.2 Goldilocks and the three samples sizes: why there is no universal \right" sample size Traditionally in science, particularly in frequentist circles, scientists are admon- ished to design an experimental protocol in advance, including selecting sample sizes. This is due to concerns over the possibility of induced bias in estimates under conditions of \optional stopping", i.e. determining sample sizes by peek- ing at the data as you go along. Indeed, the debate about optional stopping has raged for over fty years. Why are sample sizes usually determined in advance? A simple stopping rule can point out how optional stopping could lead to incorrect inferences: continue sampling until such time as you can reject your null hypothesis with a p-value of 0.05. Obviously, this rule will cause you to always reject the null hypothesis, even if it's true, as for any distribution there is a remote possibility of sampling a chain which would satisfy it. This proposal was given by Armitage at the Savage Forum in 1959 [108], and has been the topic of quite a bit of debate since. Nevertheless, despite the potential dangers of optional stopping, it has one dramatic advantage for the cause of benchmarking: it allows us to run relatively few samples for easy problems, and only run many samples for problems which are very hard. This could, in principle, enormously reduce the time to gather sucient statistics. To gauge how signicant the time savings may be, recall that we expect the number of observed successes to control the accuracy of our inference, namely 76 we suspect that number of gauges G/ 1=p s . We can examine the distribu- tion over p s for problems of interest and it's scaling as a function of system size to approximately determine the time savings. One can see in Figure 3.4, which shows approximately how many gauges would be necessary to complete gathering all data for various sizes of range-1 Ising problems tested in chapter 2, assuming we wish to solve at least 95% of them (which seems a reasonable number). As one can see, optional stopping would reduce the time to gather all data by roughly an order of magnitude for the largest system size. These benets should be expected to compound as we go to larger systems, with a broader distribution forp s across instances. Given that problems are also going to get exponentially harder, requiring longer data collection times, this could serve as an enormous practical reduction in diculty. ��������� ������� � ��� ��� ��� ��� ��� ��� ����������������� ���������������� ������������� �� � �� � �� � �� � �� � �� � � ����������������� ���� ������������� Figure 3.4: A comparison of the approximate number of gauges required to gather sucient data to know TTS for the easiest 95% of instances for various sizes of range-1 Ising problems. Optional stopping consumes more than an order of magnitude less time for the largest size, and we should expect this advantage to continue to grow as we go to larger systems with broader distributions for p s . Given that we desire to use some optimal stopping rule, how might we go about selecting one? First, it seems natural to perform inference in TTS space, as that is the space that (a) correlates with how long it takes to gather the data and (b) is the space in which costs are directly proportional, rather than connected by a nonlinear function as inp s space. Thus, we convert our posterior densities onp s 77 to TTS before applying whatever stopping rule we wish. Since we want to have a large amount of information about the value of TTS, we will place a threshold on the posterior coecient of variation (equal to = for standard deviation and mean), such that we will stop gathering data once our estimated posterior =<c for small c. This stopping rule corresponds directly to placing a threshold on the number of successes in the ideal R = 1 case. Such a stopping rule was analyzed in [109] and found to be virtually unbiased (with a bias upper bounded by1=N where N is the number of successes; this corresponds to a threshold c = 1= p N in the ideal case) with guarantees on the condence level (essentially, resulting from c). While I was unable to make signicant progress in analyzing the non- idealR> 1 case analytically, simulation studies detailed below will demonstrate that this proposed stopping rule is also virtually unbiased and has extremely good frequentist coverage properties (indeed, the credible intervals are often overly conservative, which could be a good or a bad thing depending on your point of view | it's either somewhat wasteful or good protection against being overcondent). Before we see those results, what properties do we expect of this stopping rule? By the law of large numbers we expect the posterior standard deviation to decrease like 1= p G, and thus for the number of gauges to be roughly inversely proportional to the square of the threshold. Thus, the bias term decreases quadratically, with inverse c 2 , as do the number of gauges. G/ 1 , as would be expected. In general, we expect G to also grow as the inverse square of the coecient of variation of the underlying distribution. 3.3 The proof is in the pudding: a simulation study To study the entire above procedure's performance, we perform a simulation study. The data collection algorithm is as follows: 1. Sample a random p from the test distribution 2. Sample a number of successes from Bin(R,p) 3. Every hundred gauges (or roughly every 3 seconds of wall-time-equivalent on D-Wave) we perform a brief Bayesian bootstrap to check if =<c, if so exit and report the results of the trial. Else, repeat. We test 12 distributions, across various orders of magnitude of the probability of success. I tested each at several dierent values of R = 50, 100, 150, and 200, and thresholds c = 0.05, 0.1, and 0.2. The distributions are: 1. Beta(1,99) 2. Beta(1,999) 78 3. Beta(1,9999) 4. (0:01) 5. (0:001) 6. (0:0001) 7. Uniform(0,1) 8. Beta(1/2,100) 9. MixtureModel([Beta(10; 1000); Beta(1; 1); Beta(1=2; 100)]; [0:2; 0:01; 0:79]) 10. MixtureModel([Beta(33; 7800); Beta(12; 2 1 4)]; [0:3; 0:7]) 11. MixtureModel([Beta(5; 10 4 ); Beta(2; 2)]; [0:998; 0:002]) 12. MixtureModel([Beta(3; 1000);(0:9)]; [0:95; 0:05]) The rst 6 are relatively simple, fairly peaked, across several orders of magni- tude. The Uniform distribution is also quite simple. Beta(1/2,100) is unbounded near 0 with an average near 1=200. The mixture models are written as a list of probabiliy densities and their associated weights in the model. The rst mixture model is multimodal with a rare uniform mixture, that I would anticipate will yield poor performance. The nal 3 are what I call \Nixon" distributions, in that they lie about their true nature, potentially tricking an experimenter into believing the average of the distribution is quite dierent than it's true value. First, let us address wallclock time. How long will it take to gather sucient data, for each of the threshold and number of runs? In gure 3.5, we can see that there is an essentially linear relationship between the inverse mean for various thresholds. In gure 3.6, we see that the number of runs makes relatively little dierence in the limited range I ran in these simulations. Nevertheless, both in these and in additional distributions, not shown in these gures, it seems that approximately 100 anneals per programming cycle was either optimal or near- optimal for the vast majority of distributions tested. For a threshold of 0.2, R = 100 is optimal across essentially the entire range of distributions tested. Additional checks were made at much larger (and more traditional) values for R, such as 1000 and 10000, however, those were universally dramatically worse. Indeed, I was unable to build a distribution for p s such that even R = 1000 could be remotely competitive to R = 100 in wallclock time. This nding is fairly reasonable, and indeed was what I expected. ForR = 1, the chip would be programming 99% of the time, yielding just 66 samples per second with 66 gauges. At R = 100, the chip is programming approximately half the time, yielding approximately 3300 samples per second with 33 gauges. A mere factor of 2 in sampling over gauges yields a fty fold increase in the number of samples per second. Beyond R = 100, one achieves only moderate increases in the number of samples per second in exchange for a reduced number 79 ������� ���� �� � �� � �� � �� � �� � �� � �� �� �� �� �� �� �� �� �� �� � �������� �� � �� � �� � �� � �� � � ��� ����� � ��� Figure 3.5: Wall clock time, estimated based on 15 ms programming time and 180 ms per anneal, vs inverse mean for the various distributions at R=100. The vertical variation along a single mean line are caused by variations in the variation among instances at a particular mean. It is an approximately linear relationship, meaning it will take linearly more time to gather sucient statistics as the probability decreases. of gauges. By the time one moves much beyondR = 200, purchasing additional samples comes at the cost of a great number of gauges. R = 50000 yields only 10% more samples per second than R = 1000 at a fty fold decrease in the number of gauges. Notice, also, that at R = 100, one is only sacricing half of the maximum number of anneals possible, in exchange for 500 times as many gauges per second. We know optional stopping may lead to biased estimates, but how biased, on average? Figure 3.7 shows histograms for dierent values of R at dierent thresholds. Clearly, 0.2 is prone to error, even quite large errors, across all values of the number of runs. However, since the time to gather data increases by a factor of four between c = 0:2 and c = 0:1, and again between c = 0:1 and c = 0:05, it may be in our interests to accept being more easily misled on a particular instance in exchange for running four times as many experiments. That is a question that must be decided on careful deliberations and may vary from experiment to experiment (or change in the middle). It should also be noted that these biases do not really exhibit any scaling and thus should not pose any issue for asymptotic scaling or speedup analyses. Finally, let us review the credible intervals of our estimates. A credible inter- val is very similar to a condence interval, however it is the product of selecting a region of high posterior density rather than the output of some frequentist procedure. For ease of computation, we will here use equal-tailed credible in- 80 ������������ �� � �� � �� � �� � �� � �� � ��� ��� �� � ��� � �� � �� � �� � �� � �� � ���������� ����� Figure 3.6: Wall clock time, estimated based on 15 ms programming time and 180 ms per anneal, vs inverse mean for the various distributions for a threshold of 0.1. The vertical variation along a single mean line are caused by variations in the variation among instances at a particular mean. We notice that R=100 is optimal or near optimal across the distributions. tervals, rather than the strict highest-posterior density interval, as the true HPD interval is not invariant, i.e. our credible intervals for time to solution and p s would not be related by the TTS equation. Moreover, HPD intervals are relatively dicult to compute, while equal-tailed intervals are rather triv- ially estimated using order statistics (the % equal-tailed condence interval is simply the middle % of the posterior distribution). Figures 3.8, 3.9, and 3.10 display how many, out of 100, simulated collec- tions of data yielded the mean inside of the 67, 95, and 99 percent condence intervals derived via the Bayesian bootstrap. As you can see, essentially across all thresholds, the performance of the estimators is quite good from a frequen- tist perspective, and indeed they are often overly conservative. It should also be noted that for 100% of the simulations for all tested distributions, the true mean was within one standard deviation of the estimated mean. Thus, we can be virtually certain that the true mean will be within the range 1c of our estimated mean for threshold c. Given all of the above results and arguments, it is my opinion that we should adoptR = 100 as a convention, and a threshold on= of 0.2 or, if it is desired to be more conservative, 0.1. Obviously, some upper time limit will have to be imposed as well, but this is essentially a question of convenience. Checks of the exit condition can be made after every gauge if one uses the ecient update scheme mentioned in the the rst section. 81 �������������������� ���������� ��� ��� ���� ���� ���� ���� ���� ��� ��� ���� ���� ���� ���� ��� ��� ���� ���� ���� ���� ��� ��� � � �� ��� � � �� ��� � � �� ��� � � �� �� Figure 3.7: Histogram of the average relative bias of the optional stopping esti- mator for 100 simulations across all the listed distributions for dierent thresh- olds and numbers of runs. We see that a threshold of 0.2 can lead one fairly far astray, while the number of runs per gauge doesn't make a very signicant dierence. However, given that the wallclock time is inversely proportional to the square of the threshold, it may be deemed worth the trade o to accept a slightly greater degree of bias in exchange for signicant reduction in cost. 82 ���������������������������� ������������������������� ��� ��� ���� �� �� �� �� �� �� ��� �� �� �� �� �� �� ��� �� �� �� �� �� �� ��� � � � � � ��� � � � � � ��� � � � � � ��� � � � � � �� Figure 3.8: Histogram of the performance of the 67 percent credible interval out of 100 simulations for each distribution, across thresholds and runs. The x-axis in each subplot denotes the number of 100 simulations for which the mean was inside the interval. The histogram is out of the 12 distributions presented in this paper. 83 ���������������������������� ������������������������� ��� ��� ���� �� �� �� �� ��� ��� �� �� �� �� ��� ��� �� �� �� �� ��� ��� � � �� ��� � � �� ��� � � �� ��� � � �� �� Figure 3.9: Histogram of the performance of the 95 percent credible interval out of 100 simulations for each distribution, across thresholds and runs. The x-axis in each subplot denotes the number of 100 simulations for which the mean was inside the interval. The histogram is out of the 12 distributions presented in this paper. 84 ���������������������������� ������������������������� ��� ��� ���� �� �� �� ��� ��� �� �� �� ��� ��� �� �� �� ��� ��� � � �� �� ��� � � �� �� ��� � � �� �� ��� � � �� �� �� Figure 3.10: Histogram of the performance of the 99 percent credible interval out of 100 simulations for each distribution, across thresholds and runs. The x-axis in each subplot denotes the number of 100 simulations for which the mean was inside the interval. The histogram is out of the 12 distributions presented in this paper. 85 3.4 Conclusion To review, we have shown that the Bayesian bootstrap is signicantly better, both theoretically and practically, than the classical bootstrap. Theoretically, as it never claims it may be impossible to see things one has seen and thus never puts weight on TTS=1 unless you have seen no successes, and is the natural outcome of a Bayesian nonparametric analysis in the face of very weak prior information. Practically, as it yields more reasonable posteriors for the mean (unimodal, rst and foremost). We have also seen how optional stopping, while being slightly biased, can save an order of magnitude or more in time, a savings that can only reasonably be expected to grow as we seek to benchmark larger and larger annealers, and still yields high quality credible intervals in a variety of circumstances. By taking seriously the question of how to represent our state of knowledge at each stage of a benchmarking experiment, we can ensure that we arrive at sensible answers at every moment in time and save signicant amounts of time and eort when studying problems of widely varying hardness. Generically, then, to eectively keep track of one's knowledge, the com- munity should adopt a Bayesian bootstrap over gauges and a cuto on the coecient of variation of their posterior density for the time to solution. Moreover, much of the reasoning above also applies in many other poten- tial contexts that our community may encounter as we move forward. Similar procedures may be used when trying to estimate performance of quantum an- nealers for machine learning tasks, such as QAML (see chapter 5). Moreover, experiments on new circuit-based devices may also benet from more rigorous analyses of the state of knowledge. Very generally, the most important task for the researcher isn't running the experiment, but accurately representing the knowledge gained from the experiment, and for all the reasons listed above, a Bayesian bootstrap procedure with a variational cuto is likely to be both the simplest, most honest, and most ecient mechanism for completing that task in a wide variety of circumstances. Additionally, for the interested reader, I've included an appendix chapter A on treating the problem of gauge selection as a many-armed bandit problem, ie abandoning the marginalization approach and instead attempting to exploit the variation over gauges to the best of one's abilities. It is in the appendix as it does not really t in the ow of this thesis, but is certainly related. In the following two chapters, we'll see some partial applications of the tech- niques discussed here | partial because in the rst case, chapter 4, I was still developing these ideas, and in the second, chapter 5, the parameters of interest were not time to solution or probabiltiy of success, but classier accuracy. How- ever, our data analysis in that case was informed by the basic principles outlined here | namely, taking seriously our state of knowledge and the properties of our parameter of interest, and sampling from meaningful posterior densities in order to produce accurate results. 86 Chapter 4 Probing for quantum speedup in spin glass problems with planted solutions 4.1 Introduction In this chapter, we go beyond earlier studies of random Ising problems, and in- troduce a method to construct a set of frustrated Ising-model optimization prob- lems with tunable hardness. We study the performance of a D-Wave Two device (DW2) with up to 503 qubits on these problems and compare it to a suite of classical algorithms, including a highly optimized algorithm designed to compete directly with the DW2 (namely, HFS). The problems are generated around pre- determined ground-state congurations, called planted solutions, which makes them particularly suitable for benchmarking purposes. The problem set exhibits properties familiar from constraint satisfaction (SAT) problems, such as a peak in the typical hardness of the problems, determined by a tunable clause density parameter. We bound the hardness regime where the DW2 device either does not or might exhibit a quantum speedup for this problem set. While we do not nd evidence for a speedup for the hardest and most frustrated problems in our problem set, we cannot rule out that a speedup might exist for some of the easier, less frustrated problems. Our empirical ndings pertain to the specic D-Wave processor and problem set we studied and leave open the possibility that future processors might exhibit a quantum speedup on the same problem set. This chapter was originally published as Ref [5], with minor modications here. Here we report on experimental results that probe the possibility that a pu- tative quantum annealer may be capable of speeding up the solution of certain 87 carefully designed optimization problems. We refer to this either as a limited or as a potential quantum speedup, since we study the possibility of an advan- tage relative to a portfolio of classical algorithms that either \correspond" to the quantum annealer (in the sense that they implement a similar algorithmic approach running on classical hardware), or implement a specic, specialized classical algorithm, in accordance with Ref [4] and chapter 2. In addition, for technical reasons detailed below we must operate the putative quantum annealer in a suboptimal regime. We achieve this by designing Ising model problems that exhibit frustration, a well-known feature of classically-hard optimization prob- lems [110, 111]. The putative quantum annealer used in this study is the D-Wave Two (DW2) device [112], the same as in chapter 2. A demonstration of speedup on a meaningful problem has so far been an elusive goal, possibly because the random Ising problems chosen in previous benchmarking tests [15, 4], up to the date of the study discussed here, were too easy to solve for the classical algorithms against which the D-Wave devices were compared; namely such problems exhibit a spin-glass phase only at zero tem- perature [93]. The Sherrington-Kirkpatrick model with random1 couplings, exhibiting a positive spin-glass critical temperature, was tested on a DW2 de- vice, but the problem sizes considered were too small (due to the need to embed a complete graph into the Chimera graph) to test for a speedup [83]. The approach we outline next allows us to directly probe for a quantum speedup using frustrated Ising spin glass problems with a tunable degree of hardness, though we do not know whether these problems exhibit a positive-temperature spin-glass phase. 88 (a) ������������� � ��� ��� ��� ��� �� � � �� ������� (b) Figure 4.1: Examples of randomly generated loops and couplings on the DW2 Chimera graph. Top: qubits and couplings participating in the loops are highlighted in green and purple, respectively. Only even-length loops are embeddable on the Chimera graph. Bottom: distribution of J values for a sample problem instance with N = 126 spins and edges, and 101 loops. It is virtually impossible to recover the loop-Hamiltonians H j from a given H Ising . The couplings are all eventually rescaled to lie in [1; 1]. We always set the local elds h i to zero as non-zero elds tended to make the problems easier. 89 4.2 Frustrated Ising problems with planted so- lutions In this section we introduce a method for generating families of benchmark prob- lems that have a certain degree of \tunable hardness", achieved by adjusting the amount of frustration, a well-known concept from spin-glass theory [113]. In frustrated optimization problems no conguration of the variables simultane- ously minimizes all terms in the cost function, often causing classical algorithms to get stuck in local minima, since the global minimum of the problem satises only a fraction of the Ising couplings and/or local elds. We construct our prob- lems around \planted solutions" { an idea borrowed from constraint satisfaction (SAT) problems [114, 115]. The planted solution represents a ground state con- guration of Eq. (1.1) that minimizes the energy and is known in advance. This knowledge circumvents the need to verify the ground state energy using exact (provable) solvers, which rapidly become too expensive computationally as the number of variables grows, and which were employed in earlier benchmarking studies ([15, 4],2). To create Ising-type optimization problems with planted solutions, we make use of frustrated \local Ising Hamiltonians" H j , i.e., Ising problem instances dened on sub-graphs of the Chimera graph in lieu of the clauses appearing in the SAT formulas. The total Hamiltonian for each problem is then of the form H Ising = P M j=1 H j , where the sum is over the (possibly partially overlapping) local Hamiltonians. Similarly to SAT problems, the size of these local Hamil- tonians, or clauses, does not scale with the size of the problem. Moreover, to ensure that the planted solution is a ground state of the total Hamiltonian, we construct the clauses so that each is minimized by that portion of the planted solution that has support over the corresponding subgraph. 1 The planted so- lution is therefore determined prior to constructing the local Hamiltonians, by assigning random values to the bits on the nodes of the graph. The above pro- cess generates a Hamiltonian with the property that the planted solution is a simultaneous ground state of all the frustrated local Hamiltonians. 2 The various clauses H j can be generated in many dierent ways. This free- dom allows for the generation of many dierent types of instances, and here we present one method. An N-qubit M-clause instance is generated as follows. 1. A random conguration ofN bits corresponding to the participating spins of the Chimera graph is generated. This conguration constitutes the planted solution of the instance. 2. M random loops are constructed along the edges of the Chimera graph. 1 To see that the planted solution minimizes the total Hamiltonian, assume that a cong- uration x is a minimum of f j (x) for all j, whereff j (x)g is a set of arbitrary real-valued functions. Then, by denition, for each j, f j (x) f j (x) for all possible congurations x. Let us now dene f(x) P j f j (x). It follows then that f(x) P j f j (x). Since also f(x) = P j f j (x), x is a minimizing conguration of f(x). 2 Somewhat confusingly from our perspective of utilizing frustration, such Hamiltonians are sometimes called \frustration-free" [116]. 90 The loops are constructed by placing random walkers on random nodes of the Chimera graph, where each edge is determined at random from the set of all edges connected to the last edge. The random walk is terminated once the random walker crosses its path, i.e., when a node that has already been visited is encountered again. If this node is not the origin of the loop the \tail" of the path is discarded. Examples of such loops are given in Fig. 4.1. We distinguish between \short loops" of length ` = 4; 6, and \long loops" of length ` 8, as these give rise to peaks in hardness at dierent loop densities. Here we focus on long loops; results for short loops, which tend to generate signicantly harder problem instances, will be presented elsewhere. 3. On each loop, a clause H j is dened by assigning J ij =1 couplings to the edges of the loop in such a way that the planted solution minimizes H j . As a rst step, the J ij 's are set to the ferromagnetic J ij =s i s j , where the s i are the planted solution values. One of the couplings in the loop is then chosen at random and its sign is ipped. This ensures that no spin conguration can satisfy every edge in that loop, and the planted solution remains a global minimum of the loop, but is now a frustrated ground state. 3 4. The total Hamiltonian is then formed by adding up theM-loop clausesH j . Note that loops can partially overlap, thereby also potentially \canceling out" each other's frustration, a useful feature that will give rise to an easy- hard-easy pattern we discuss below. Since the planted solution is a ground state of each of theH j 's, it is also a ground state of the total Hamiltonian H Ising . Ising-type optimization problems with planted solutions, such as those we have generated, have several attractive properties that we utilize later on: i) Having a ground-state conguration allows us to readily precompute a measure of frustration, e.g., the fraction of frustrated couplings of the planted solution. We shall show that this type of measure correlates well with the hardness of the problem, as dened in terms of the success probability of nding the ground state or the scaling of the time-to-solution. ii) By changing the number of clauses M we can create dierent classes of problems, each with a \clause density" = M=N, analogous to problem generation in SAT. 4 The clause density can be used to tune through a SAT-type phase transition [117], i.e., it may be used to control the hardness of the generated problems. Here too, we shall see that the clause density plays an important role in setting the hardness of the 3 To see that there can be no spin conguration with an energy lower than that of the planted solution, consider a given loop and a given spin in that loop; note that every spin participates in two couplings. Either both couplings are satised after the sign ip, or one is satised and the other is not. Correspondingly, ipping that spin will thus either raise the energy or leave it unchanged. This is true for all spins in the loop. 4 Note that for small values ofM, the number of spins actually participating in an instance will be smaller than N, the number of spins on the graph from which the clauses are chosen. 91 problems. Note that when the energy is unchanged under a spin- ip the solution is degenerate, so that our planted solution need not be unique. 4.3 Algorithms and scaling A judicious choice of classical algorithms is required to ensure that our test of a limited or potential quantum speedup is meaningful, as discussed in previous chapters. We considered (i) simulated annealing (SA), (ii) simulated quantum annealing (SQA), (iii) the Shin-Smolin-Smith-Vazirani (SSSV) thermal rotor model (also called spin-vector Monte Carlo or SVMC), and (iv) the Hamze- Freitas-Selby (HFS) algorithm [22, 23], an algorithm that is ne-tuned for the Chimera graph and appears to be the most competitive at this time. Of these, the HFS algorithm is the only one that is designed to exploit the scaling of the treewidth of the Chimera graph (see 4.6) , which renders it particularly ecient in a comparison against the D-Wave devices [118]. These are the same algorithms given an overview back in the rst chapter, 1. The D-Wave devices and all the algorithms we considered are probabilistic, and return the correct ground state with some probability of success. We thus perform many runs of the same duration for a given problem instance, and estimate the success probability empirically as the number of successes divided by the number of runs. This is repeated for many instances at a given clause density and number of variables N, and generates a distribution of success probabilities. Letp() denote the success probability for a given set of parame- ters =fN;;qg, where q denotes the qth percentile of this distribution; e.g., half the instances for givenN and have a higher empirical success probability than the median p(N;; 0:5). The number of runs required to nd the ground state at least once with a desired probability p d , once again, is [15, 4] 5 r() = log(1p d ) log(1p()) ; (4.1) and henceforth we set p d = 0:99 as in previous chapters. Correspondingly, the time-to-solution is TTS() = r(), where for the D-Wave device signies the annealing time t a (at least 20s for the DW2), while for SA, SQA, and SSSV is the number of Monte Carlo sweepss (a complete update of all spins) multiplied by the time per sweep X for algorithmX, again as in chapter 2. 6 For SA, we further distinguish between using it as a solver (SAS) or as an annealer (SAA): in SAS mode we keep track of the energies found along the annealing schedule (which we take to be linear in the inverse temperature ) and take the lowest, while in SAA mode we always use the nal energy found. Thus SAA can never be better than SAS, but is a more faithful model of an analog annealing device. A similar distinction can be made for SQA (i.e., SQAA and SQAS), but we primarily consider the annealer version since it too more closely mimics 5 We prefer to dener in this manner, rather than rounding it as in [15, 4], as this simplies the extraction of scaling coecients. 6 In our simulations SA = 3:54s, SQA = 9:92s, and SSSV = 10:34s. 92 the operation of DW2 (note that SAA and SQAA were also the modes used in Refs. [15, 4]; SQAS results are shown in Section 4.7). For the HFS algorithm, TTS() is calculated directly from the distribution of runtimes obtained from 10 5 identical, independent executions of the algorithm. Further timing details are given in 4.6. We next brie y discuss how to properly address resource scaling, as a re- fresher from chapter 2. The D-Wave devices use N qubits and O(N) couplers to compute the solution of the input problem. Thus, it uses resources which scale linearly in the size of the problem, and we should give our classical solvers the same opportunity. Namely, we should allow for the use of linearly more cores and memory for our classical algorithms as we scale the problem size. For annealers such as SA, SQA, and SSSV, this is trivial as we can exploit the bi- partite nature of the Chimera graph to perform spin updates in parallel so that each sweep takes constant time and a linear number of cores and memory. The HFS algorithm is not perfectly parallelizable but as I explain in more detail for interested readers in 4.6, we take this into account as well. Finally, note that dynamic programming can always nd the true ground state in a time that is exponential in the Chimera graph treewidth, i.e., that scales as exp c p N [13]. The natural size scale in our study is the square Chimera subgraph C L comprising L 2 unit cells of 8 qubits each, i.e., N = 8L 2 . Therefore, henceforth we replace N by L so that =fL;;qg. This is now standard practice for benchmarking studies of Chimera-structured annealers, but was not as of the time of the study in chapter 2. 93 �������������� �� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� � � �� � �� � �� � �� � ������� �������������� (a) �������������� �� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� � � �� � �� � �� � �� � �� � ���� ����������������� (b) Figure 4.2: Time to solution as a function of clause density. Shown is TTS(L;; 0:5) (log scale) for (a) DW2 and (b) HFS, as a function of the clause density. The dierent colors represent the dierent Chimera subgraph sizes, which continue to L = 12 in the HFS case. In both cases there is a clear peak. From the HFS results we can identify the peak position as being at = 0:17 0:01, which is consistent with the peak position in the DW2 results. Error bars represent 2 condence intervals. 94 ���������������� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� � � � � � � � � ���� ���� ���� ���� ���� �������������������������������� Figure 4.3: Frustration fraction. Shown is the fraction of frustrated couplings (the number of frustrated couplings divided by the total number of couplings, where a frustrated coupling is dened with respect to the planted solution) as a function of clause density for dierent Chimera subgraphs C L , in the case of loops of length 8, averaged over the 100 instances for each given and N. There is a broad peak at 0:25. This is the clause density at which there is the largest fraction of frustrated couplings, and is near where we expect the hardest instances to occur, in good agreement with Fig. 4.2. 95 ln(TTS) Chimera graph size CL Figure 4.4: Sketch of the relation between the log(TTS) curves for optimal and suboptimal annealing times. The annealing time needs to be optimized for each problem size. Blue represents the TTS with a size- independent annealing time t a . Red represents the optimal TTS correspond- ing to having an optimal size-dependent annealing time t opt a (L), i.e. the lower envelope of the full series of xed annealing time TTS curves. This curve need not be linear as depicted, though we expect it to be linear for NP- hard problems. The blue line upper bounds the red line since by denition TTS DW2 (;t opt a (L)) TTS DW2 (;t a ). The vertical dotted line represents the problem size L at which t a = t opt a (L ). To the left of this line t a > t opt a and the slope of the xed-t a TTS curve lower-bounds the slope of the optimal TTS curve, since for very small problem sizes a large t a results in insensitivity to problem size, and the success probability is essentially constant. The oppo- site happens to the right of this line, where t a < t opt a , and where the success probability rapidly drops with L at xed t a . 96 �������������������� � ��� ��� ��� ��� ��� ��� ��� � ���� ��� ��� ��� ������ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � �� � �� � �� � �� � �� � ��������������������� Figure 4.5: Median time-to-solution over instances. Plotted is TTS(L;; 0:5) (log scale) as a function of the Chimera subgraph size C L , for a range of clause densities and for all solvers we tested. Note that only the scaling matters and not the actual TTS, since it is determined by constant factors that vary from processor to processor, compiler options, etc. All algorithms' timing re ects the result after accounting for parallelism, as described in 4.6. Error bars represent 2 condence intervals. The DW2 annealing time is t a = 20s. ������� ������������� � ��� ��� ��� ��� ��� ��� ��� � ��� ���� ��� ��� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� �� �� �� �� � �� � �� � �� � ������� Figure 4.6: Speedup ratio. Plotted is the median speedup ratio S X (L;; 0:5) (log scale) as dened in Eq. (4.2) for all algorithms tested. A negative slope indicates a denite slowdown for the DW2. A positive slope indicates the possi- bility of an advantage for the DW2 over the corresponding classical algorithm. This is observed for > 0:4 in the comparison to SAS, SQA, SSSV, and HFS (see Fig. 4.10 for a more detailed analysis). Error bars represent 2 condence intervals. 97 4.4 Probing for a quantum speedup We now come to our main goal for this study, which is to probe for a limited or potential quantum speedup on our frustrated Ising problem set. We reserve the term \limited speedup" for our comparisons with SA, SQA, and SSSV, while the term \potential speedup" refers to the comparison with the HFS algorithm, which unlike the other three algorithms, does not implement a similar algorithmic approach to a quantum annealer. 4.4.1 Dependence on clause density We rst analyze the eect of the clause density. This is shown in Fig. 4.2, where we plot TTS(L;; 0:5) for the DW2 and the HFS algorithm. 7 We note that the worst-case TTS of 10ms is smaller than that observed for random Ising problems in Ref. [4] [100ms for range 7, C 8 , and q = 0:5 (median)]. However, as we shall demonstrate, the classical algorithms against which the DW2 was benchmarked in Ref. [4] (SA and SQA) scale signicantly less favorably in the present case, i.e., whereas in Ref. [4] no possibility of a limited speedup against SA was observed for the median, here we will nd that such a possibility remains. In this sense, the problem instances considered here are relatively harder for the classical solvers than those of Ref. [4]. For our choice of random loop characteristics, the time-to-solution peaks at a clause density 0:17, re ecting the hardness of the problems in that regime. To correlate the hardness of the instances with their degree of frustration, we plot the frustration fraction, dened as the ratio of the number of unsatised edges with respect to the planted solution to the total number of edges on the graph, as a function of clause density. The frustration fraction curve, shown in Fig. 4.3, has a peak at 0:25, conrming that frustration and hardness are indeed correlated. 8 The hardness peak is reminiscent of the analogous situation in SAT, where the clause density can be tuned through a phase transition be- tween satisable and unsatisable phases [117]. The peak we observe may be interpreted as a nite-size precursor of a phase transition. This interpretation is corroborated below by time-to-solution results of all other tested algorithms which will also nd problems near the critical point the hardest. Indeed, all the algorithms we studied exhibit qualitatively similar behavior to that seen in Fig. 4.2 (see Fig. 4.16 in Section 4.7), with an easy-hard-easy pattern separated at 0:2. This is in agreement with previous studies, e.g., for MAX 2-SAT problems on the DW1 [16], and for k-SAT with k > 2, where a similar pattern is found for backtracking solvers [119]. It is important to note that we do not claim that this easy-hard-easy transition coincides with a spin-glass phase tran- sition; we have not studied which phases actually appear in our problem set as 7 In this study we focus mostly on the median since with 100 instances per setting, higher quantiles tend be noise-dominated. 8 Note that in our denition of frustration fraction, frustration is measured with respect to all edges of the graph from which clauses are chosen, similarly to the way clause density is dened in SAT problems. 98 we tune the clause density. A qualitative explanation for the easy-hard-easy pattern is that when the number of loops (and hence ) is small they do not overlap and thus each loop becomes an easy optimization problem. In the opposite limit many loops pass through each edge, thus tending to reduce frustration, since each loop con- tributes either a \frustrated" edge with small probability 1=` (where ` is the loop length) or an \unfrustrated" edge with probability 11=`. The hard prob- lems thus lie in between these two limits, where a constant fraction (bounded away from 0 and 1) of loops overlap. 4.4.2 General considerations concerning scaling & speedup Of central interest is the question of whether there is any scaling advantage (withL) in using a quantum device to solve our problems. Therefore, we dene a quantum speedup ratio for the DW2 relative to a given algorithm X as in chapter 2. S X (;t a ) = TTS X () TTS DW2 (;t a ) : (4.2) Since this quantity is specic to the DW2 we refer to it simply as the empirical \DW2 speedup ratio" from now on, though of course it generalizes straight- forwardly for any other putative quantum annealer or processor against which algorithm X is compared. We must be careful in using S X (;t a ) in assessing a speedup, since the annealing time must be optimized for each problem size L in order to avoid the pitfall of a fake speedup [4]. Let us denote the (unknown) optimal annealing time by t opt a (L). By denition, TTS DW2 (;t opt a (L)) TTS DW2 (;t a ), where t a is a xed annealing time. We now dene the optimized speedup ratio as S opt X (;t opt a (L)) = TTS X () TTS DW2 (;t opt a (L)) (4.3) and clearlyS X (;t a )S opt X (;t opt a (L)), i.e., the speedup ratios computed using Eq. (4.2) are lower bounds on the optimized speedup. However, what matters for a speedup is the scaling of the speedup ratio with problem size. Thus, we are interested not in the numerical value of the speedup ratio but rather in the slope dS=dL (recognizing that this is a formal derivative since L is a discrete variable). A positive slope would indicate a DW2 speedup, while a negative speedup slope would indicate a slowdown. We thus dene the DW2 speedup regime as the set of problem sizesL + where d dL S X (;t opt a (L)) > 0 for all L2L + . Likewise, the DW2 slowdown regime is the set of problem sizesL where d dL S X (;t opt a (L))< 0 for all L2L . From a computational complexity perspective one is ultimately interested in the asymptotic performance, i.e., the regime where L becomes arbitrarily large. In this sense a true speedup would correspond to the observation that L + = [L + min ;L + max ], with L + min a positive constant and L + max !1. Of course, such a denition is meaningless for a physical device such as the DW2, for which 99 L + max is necessarily nite. Thus the best we can hope for is an observation that L + is as large as is consistent with the device itself, which in our case would imply thatL + = [1; 8]. However, as we shall argue, we can in fact only rule out a speedup, while we are unable to conrm one. I.e., we are able to identify L = [L min ;L max ], but notL + . The culprit, as in earlier benchmarking work [4] and as we establish below, is the fact that the DW2 minimum annealing time oft a = 20s is too long (see also Section 4.7). This means that the smaller the problem size the longer it takes to solve the corresponding instances compared to the case with an optimized annealing time, and hence the observed slope of the DW2 speedup ratio should be interpreted as a lower bound for the optimal scaling. This is illustrated in Fig. 4.4. Without the ability to identify t opt a (L) we do not know of a way to infer, or even estimateL + . However, as we now demonstrate, under a certain reasonable assumption we can still boundL . The assumption is that if t a >t opt a then d dL TTS DW2 (;t a ) d dL TTS DW2 (;t opt a (L)) for all L < L , the problem size for which t a = t opt a (L ). This assumption is essentially a statement that the TTS is monotonic in L 9 , as illustrated in Fig. 4.4. Next we consider the (formal) derivatives of Eqs. (4.2) and (4.3): d dL S X (;t a ) S X (;t a ) = @ L TTS X () TTS X () @ L TTS DW2 (;t a ) TTS DW2 (;t a ) (4.4a) d dL S X (;t opt a (L)) S X (;t opt a (L)) = @ L TTS X () TTS X () d dL TTS DW2 (;t opt a (L)) TTS DW2 (;t opt a (L)) : (4.4b) Collecting these results we have S X (;t a ) S X (;t opt a ) d dL S X (;t opt a )< d dL S X (;t a ) if t a >t opt a (L): Therefore, if we nd that d dL S X (;t a ) < 0 in the suboptimal regime where t a >t opt a (L), then it follows that d dL S X (;t opt a (L))< 0. In other words, a DW2 speedup is ruled out if we observe a slowdown using a suboptimal annealing time. 4.4.3 Scaling and speedup ratio results In Fig. 4.5 we show the scaling of the median time-to-solution for all algorithms studied, for a representative set of clause densities. All curves appear to match the general dynamic programming scaling forL& 4, i.e., TTS() exp[b()L], 9 Individual instances may not satisfy this assumption [120, 121], but we are not aware of any cases where averaging over an ensemble of instances violates this assumption. 100 but the scaling coecient b() clearly varies from solver to solver. This scaling is similar to that observed in previous benchmarking studies of random Ising instances [15, 4]. In Fig. 4.6 we show the median scaling of S X for the same set of clause densities as shown in Fig. 4.5. We observe that in all cases there is a strong dependence on the clause density , with a negative slope of the DW2 speedup ratio for the lower clause densities, corresponding to the harder, more frus- trated problems. In this regime the DW2 exhibits a scaling that is worse than the classical algorithms and by Eq. (4.5) there is no speedup. The possibility of a DW2 speedup remains open for the higher clause densities, where a posi- tive slope is observed, i.e., the DW2 appears to nd the easier, less frustrated problems easier than the classical solvers. This apparent advantage is most pro- nounced for 0:4, where we observe the possibility of a potential speedup even against the highly ne-tuned HFS algorithm (this is seen more clearly in Fig. 4.10). Moreover, the DW2 speedup ratio against HFS improves slightly at the higher percentiles (see Fig. 4.20 in Section 4.7), which is encouraging from the perspective of a potential quantum speedup. 4.4.4 Scaling coecient results To test the dependence on t a , we repeated our DW2 experiments for t a 2 [20; 40]s, in intervals of 2s. Figure 4.7a is a success probability correlation plot between t a = 20s and t a = 40s, at = 0:35. The correlation appears strong, suggesting that the device might already have approached the asymptotic regime where increasing t a does not modify the success probabilities. To check this more carefully let us rst dene the normalized Euclidean distance between two length-M vectors of probabilities ~ p 1 and ~ p 2 as D(~ p 1 ;~ p 2 ) := 1 M k~ p 1 ~ p 2 k (4.5) (wherek~ pk = p ~ p~ p); clearly, 0D(~ p 1 ;~ p 2 ) 1. The result for the DW2 data with ~ p 1 and ~ p 2 being the ordered sets of success probabilities for all instances with given att a = 20s andt a = 40s respectively, is shown as the blue circles in Fig. 4.8. The small distance for all suggests that for t a 20s the dis- tribution of ground state probabilities has indeed nearly reached its asymptotic value. This result means that the number of runs r() [Eq. (4.1)] does not depend strongly on t a either. To demonstrate this we rst t the number of runs to r(L;; 0:5) =e a()+b()L ; (4.6) in accordance with the abovementioned expectation of the scaling with the treewidth, and nd a good t across the entire range of clause densities (see Fig. 4.25a in Section 4.7). The corresponding scaling coecient b() is plotted in Fig. 4.9 for all annealing times (the constant a() is shown in Fig. 4.25b in Section 4.7 and is well-behaved); the data collapses nicely, showing that the 101 scaling coecient b() has already nearly reached its asymptotic value. Also plotted in Fig. 4.9 is the scaling coecient b() for HFS and for SAS with an optimized numbers of sweeps at each andL, as extracted from the data shown in Fig. 4.5. By Eq. (4.5), where b DW2 ()b X () there is no DW2 speedup (L = [4; 8]), whereas where b DW2 () < b X (), a DW2 speedup over algorithm X is still possible. We thus plot the dierence in the scaling coecients in Fig. 4.10. Figures 4.9 and 4.10 do allow for the possibility of a speedup against both HFS and SAS, at suciently high clause densities. However, we stress once more that the smaller DW2 scaling coecient may be a consequence of t a = 20s being excessively long, and that we cannot rule out that the observed regime of a possible speedup would have disappeared had we been able to optimize t a by exploring anneal- ing times shorter than 20s. Note that an optimization of the SAS annealing schedule might further improve its scaling, but the same cannot be said of the HFS algorithm, and it seems unlikely that it could be outperformed even by the fully optimized version of SAS. 4.4.5 DW2 vs SAA Earlier work has ruled out SAA as a model of the D-Wave devices for random Ising model problems [15], as well as for certain Hamiltonians with suppressed and enhanced ground states for which quantum annealing and SAA give op- posite predictions [1, 36]. The observation made above, that the DW2 success probabilities have nearly reached their asymptotic values, suggests that for the problems studied here the DW2 device may perhaps be described as a thermal annealer. While we cannot conclude on the basis of ground state probabilities alone that the DW2 state distribution has reached the Gibbs state, we can com- pare the ground state distribution to that of SAA in the regime of an asymptotic number of sweeps, and attempt to identify an eective nal temperature for the classical annealer that allows it to closely describe the DW2 distribution. We empirically determine f = 5 to be the nal temperature for our SAA simu- lations that gives the closest match, and S = 50;000 (corresponding to 150ms, much larger than the DW2's t a = 20s) to be large enough for the SAA dis- tribution to have become stationary (see Fig. 4.26a in Section 4.7 for results with additional sweep numbers conrming this). The result for the Euclidean distance measure is shown in Fig. 4.8 for the two extremal annealing times, and the distance is indeed small. To more closely assess the quality of the correlation we select = 0:35, the value that minimizes the Euclidean distance as seen in Fig. 4.8, and present the correlation plot in Fig. 4.7b. Considering the results for each size L separately, it is apparent that the correlation is good but also systematically skewed, i.e., for each problem size the data points approximately lie on a line that is not the diagonal line. Additional correlation plots are shown in Fig. 4.22 of Section 4.7. As a nal comparison we also extract the scaling coecients b() and com- pare DW2 to SAA in Fig. 4.9. It can be seen that in terms of the scaling 102 coecients the DW2 and SAA results are essentially statistically indistinguish- able. However, we stress that since the SAA number of sweeps is not optimized, one should refrain from concluding that the equal scaling observed for the DW2 and SAA rules out a DW2 speedup. 103 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ ● L= 2 ■ L= 3 ◆ L= 4 ▲ L= 5 ▼ L= 6 ○ L= 7 □ L= 8 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, t a =20 μs DW success probability, t a =40 μs (a) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ ● L=2 ■ L=3 ◆ L=4 ▲ L=5 ▼ L=6 ○ L=7 □ L=8 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, t a =20 μs SAA success probability (b) Figure 4.7: Success probability correlations. The results for all instances at = 0:35 are shown. Each datapoint is the success probability for the same instance, (a) for DW2 att a = 20s andt a = 40s, (b) for DW2 att a = 20s and SAA at 50000 sweeps and f = 5 [in dimensionless units, such that max(J ij ) = 1]. Perfect correlation means that all data points would fall on the diagonal, and a strong correlation is observed in both cases. The data is colored by the problem sizeL and shows a clear progression from high success probabilities at small L to large success probabilities at small L. Qualitatively similar results are seen for t a = 20s vs t a = 40s at all values, and for DW2 vs SAA at intermediate values (see Figs. 4.21 and 4.22 in Section 4.7). 104 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ● DW2: t a =20 μs vs t a =40 μs ■ SAAvsDW2at t a =20 μs ◆ SAAvsDW2at t a =40 μs 0.0 0.2 0.4 0.6 0.8 0.00 0.05 0.10 0.15 0.20 α Euclidean Distance Figure 4.8: Correlation between the DW2 data for two dierent anneal- ing times and SAA. Plotted is the normalized Euclidean distance D(~ p 1 ;~ p 2 ) for ~ p 1 and ~ p 2 being, respectively, the ordered success probability for DW2 at t a = 20s and t a = 40s (blue circles), DW2 at t a = 20s and SAA (yellow squares), DW2 at t a = 40s and SAA (green diamonds). For comparison, the Euclidean distance between two random vectors with elements2 [0; 1] is 0:4. SAA data is for 50;000 sweeps and f = 5. The correlation with SAA de- grades slightly for t a = 40s. Error bars represent 2 condence intervals and were computed using bootstrapping (see Appendix 4.6.2 for details). In each comparison, to construct ~ p 1 and ~ p 2 we xed and used half the instances (for bootstrapping purposes) for L2 [2; 8]. 105 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● HFS ■ SAS (optimized) ◆ SAS (optimized with noise) ▲ SAA (S=50,000, β f =5) ● t a = 20 ■ t a = 22 ◆ t a = 24 ▲ t a = 26 ▼ t a = 28 ○ t a = 30 □ t a = 32 ◇ t a = 34 △ t a = 36 ▽ t a = 38 ● t a = 40 0.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 α scaling coefficient b( α) Figure 4.9: Scaling coecients of the number of runs. Plotted here isb() [Eq. (4.6)] for the DW2 at all annealing times (overlapping solid lines and large symbols), for the HFS algorithm, for SAS with an optimal number of sweeps, for SAS with noise and an optimal number of sweeps, and for SAA with a large enough number of sweeps that the asymptotic distribution has been reached at f = 5. The scaling coecients of HFS and of optimized SAS each set an upper bound for a DW2 speedup against that particular algorithm. In terms of the scaling coecient the DW2 result is statistically indistinguishable (except at = 0:1) from SAA run at S = 50;000 and f = 5. The coecients shown here are extracted from ts with L 4 (see Fig. 4.25a in Section 4.7). Error bars represent 2 condence intervals. 106 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ● HFS ■ SAS (optimized) ◆ SAS (optimizedwithnoise) 0.0 0.2 0.4 0.6 0.8 -0.4 -0.2 0.0 0.2 0.4 α b X ( α)-b DW ( α) Figure 4.10: Dierence between the scaling coecients. Plotted here is the dierence between the scaling coecients in Fig 4.9,b X ()b DW2 (), where X denotes the HFS algorithm, SAS with an optimal number of sweeps, SAS with noise and an optimal number of sweeps, or SAA with S = 50; 000 and f = 5. When the dierence is non-positive there can be no speedup since optimizingt a can only increaseb DW2 (); conversely, when the dierence is positive a speedup is still possible, i.e., not accounting for the error bars, for 0:05 and 0:4 for HFS, and for 0:15 and 0:35 for SAS without noise. These ranges shrink if the error bars are accounted for, but notably, for most values SAS with noise does not disallow a limited speedup, suggesting that control noise may play an important factor in masking a DW2 speedup. Error bars represent 2 condence intervals. 107 Figure 4.11: Scaling of SAA at dierent nal temperatures. SAA is run at S = 50;000 and various nal inverse temperatures. The peak at 0:2 remains a robust feature. The data marked \Noisy" is with 5% random noise added to the couplings J ij . 108 4.5 Discussion In this study we proposed and implemented a method for generating problems with a range of hardness, tuned by the clause density, mimicking the phase structure observed for SAT problems. By comparing the DW2 device to a number of classical algorithms we delineated where there is no DW2 speedup and where it might still be possible, for this problem set. No advantage is observed for the low clause densities corresponding to the hardest optimization problems, but a speedup remains possible for the higher clause densities. In this sense these results are more encouraging than for the random Ising problems studied in Ref. [4], where only the lowest percentiles of the success probability distribution allowed for the possibility of a DW2 speedup. In our case there is in fact a slight improvement for the higher percentiles (see Fig. 4.20 in Section 4.7). In the same sense our ndings are also more encouraging than recent theoretical results showing that quantum annealing does not provide a speedup relative to simulated annealing on 2SAT problems with a unique ground state and a highly degenerate rst excited state [122]. The close match between the DW2 and SAA scaling coecients seen in Fig. 4.9 suggests that SAA in the regime of an asymptotic number of sweeps can serve as a model for the expected performance of the D-Wave device as its temperature is lowered. Thus we plot in Fig. 4.11 the scaling coecient for SAA at various nal inverse temperatures, at a xed (and still asymptotic) value of S = 50;000. Performance improves steadily as f increases, suggesting that SAA does not become trapped at 50;000 sweeps for the largest problem sizes we have studied (this may also indicate that even at the hardest clause densities these problems do not exhibit a positive-temperature spin-glass phase). We may infer that a similar behavior can be expected of the D-Wave device if its temperature were lowered. An additional interesting feature of the fact that the asymptotic DW2 ground state probability observed here is in good agreement with that of a thermal an- nealer, is that it gives the ground state with a similar probability as expected from a Gibbs distribution. This result is consistent with the weak-coupling limit that underlies the derivation of the adiabatic Markovian master equation [123], i.e., it is consistent with the notion that the system-bath coupling is weak and decoherence occurs in the energy eigenbasis [124]. This rules out the possibility that decoherence occurs in the computational basis, as this would have led to a singular-coupling limit master equation with a ground state probability drawn not from the Gibbs distribution but rather from a uniform distribution, if the single-qubit decoherence time is much shorter than the total annealing time [123]. This is important, as the weak-coupling limit is compatible with deco- herence between eigenstates with dierent energies while maintaining ground state coherence, a necessary condition for a speedup via quantum annealing. In contrast, in the singular-coupling limit no quantum eects survive and no quantum speedup of any sort is possible. We note that an important disadvantage the DW2 device has over all the classical algorithms we have compared it with, is control errors in the program- 109 ming of the local elds and couplings [125, 126, 127, 128]. As shown in Section 4.7 (Fig. 4.27) for those interested, many of the rescaled couplings J ij used in our instances are below the single-coupler control noise specication, meaning that with some probability the DW2 is giving the right solution to the wrong problem. We are unable to directly measure the eect of such errors on the DW2 device, but their eect is demonstrated in Figs. 4.10 and 4.11, where both SAS and SAA with f = 5; 20, respectively, are seen to have substantially in- creased scaling coecients after the addition of noise that is comparable to the control noise in the DW2 device. In fact, the DW2 scaling coecient is smaller than the scaling coecient of optimized SAS with noise over almost the entire range of , suggesting that a reduction in such errors will extend the upper bound for a DW2 speedup against SAS over a wider range of clause densities. The eect of such control errors can be mitigated by improved engineering, but also emphasizes the need for the implementation of error correction on puta- tive quantum annealing devices. The benecial eect of such error correction has already been demonstrated experimentally [94, 129] and theoretically [130], albeit at the cost of a reduction in the eective number of qubits and reduced problem sizes. To summarize, we believe that at least three major improvements will be needed before it becomes possible to demonstrate a conclusive (limited or po- tential) quantum speedup using putative quantum annealing devices: (1) harder optimization problems must be designed that will allow the annealing time to be optimized, (2) decoherence and control noise must be further reduced, and (3) error correction techniques must be incorporated. Another outstanding chal- lenge is to theoretically design optimization problems that can be unequivocally shown to benet from quantum annealing dynamics. Finally, the methods introduced here for creating frustrated problems with tunable hardness should serve as a general tool for the creation of suitable benchmarks for quantum annealers. Our study directly illustrates the impor- tant role that frustration plays in the optimization of spin-glass problems for classical algorithms as well as for putative quantum optimizers. It is plausible that dierent, perhaps more nely-tuned choices of clauses to create novel types of benchmarks, may be used to establish a clearer separation between the per- formance of quantum and classical devices. These may eventually lead to the demonstration of the coveted experimental annealing-based quantum speedup. Note added. Work on problem instances similar to the ones we studied here, but with a range of couplings above the single-coupler control noise specication, appeared shortly after our preprint [131]. The DW2 scaling results were much improved, supporting our conclusion that such errors have a strong detrimental eect on the performance of the DW2. Finally, I'll point out a few places where the study in this chapter diverges from the discussion of chapter 3. In this study, I had not yet come up with or hit upon the notion of optional stopping, and thus it was not used. Secondly, while I did use the Beta distribution for estimation of the probability of success, as detailed below for enterprising readers. Also, as we were not interested in the TTS of individual instances, essentially at all, we neglected to gauge average. 110 In retrospect, I would not do this today, or if I did I would have taken a more rigorous approach to estimating my posterior density for the various percentiles of problem hardness (here we used the classical bootstrap over the problem ensembles). In essence, this study represents a kind of halfway house between the period from chapter 2, where we were groping in the dark after a fashion doing blind classical bootstraps of every single parameter, and the more rigorous approach discussioned in chapter 3. The analysis in the next chapter is more rigorous and inspired by the analysis in the previous chapter, but has a number of dierent properties due to the dierent parameter of interest (not TTS, but classier accuracy). Acknowledgements for this chapter Part of the computing resources were provided by the USC Center for High Performance Computing and Communications. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Lab- oratory, which is supported by the Oce of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. I.H. and D.A.L. acknowl- edge support under ARO grant number W911NF-12-1-0523. The work of J.J., T.A. and D.A.L. was supported under ARO MURI Grant No. W911NF-11-1- 0268, and the Lockheed Martin Corporation. M.T. and T.F.R. acknowledges support by the Swiss National Science Foundation through the National Compe- tence Center in Research NCCR QSIT, and by the European Research Council through ERC Advanced Grant SIMCOFE. We thank Mohammad Amin, An- drew King, Catherine McGeoch, and Alex Selby and for comments and dis- cussions. I.H. would like to thank Gene Wagenbreth for assistance with the implementation of solution-enumeration algorithms. J.J. would like to thank the developers of the Julia programming language [132], which was used exten- sively for data gathering and analysis. 4.6 Methods In this section I describe various technical details regarding our methods of data collection and analysis. 4.6.1 Experimental details For each clause density and problem size N (or C L ) we generated 100 in- stances, for a total of 12;600 instances. We performed approximately 990000s=t a annealing runs (experiments) for each problem instance and for each annealing time t a 2 [20; 40]s, in steps of 2s, for a total of more than 10 10 experiments. No gauge averaging [15] was performed because we are not concerned with the timing data for a single instance but a collection of instances, and the variation over instances is larger than the variation over gauges. 111 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 30 35 t/t f Annealing schedule (GHz) A(t) B(t) k B T Figure 4.12: Annealing schedule of the DW2. The annealing curves A(t) and B(t) are calculated using rf-SQUID models with independently calibrated qubit parameters. Units of~ = 1. The operating temperature of 17mK is also shown. Solver details HFS runs in a serial manner on a single core of a CPU. Since we allow the DW2 to useO(N) resources, we should also parallelize the HFS algorithm. In principle, HFS proceeds in a series of steps | at each step, one shears o each of the leaves of the tree (i.e., the outermost vertices) and then proceeds to the next step until the tree has collapsed to a point. Each of the leaves can be eliminated separately, however each elimination along a particular branch is dependent on the previous eliminations. As a result, we must perform between L + 2 and 3 2 L + 2 (average 5 4 L + 2) irreducibly serial operations when reducing the trees on an LL unit cell Chimera graph (depending on which of the trees we use and the specics for how we do the reduction). Since we deal only with square graphs and are concerned with asymptotic scaling, we ignore the constant steps and use 5 4 L basic operations per tree (see Sec. 4.6.2 for additional details). HFS and SA are run 10 4 times each to estimate probability of success, while SQA is run 1000 times, and DW2 is run with 1000 samples each. We used the DW2 annealing schedule in Fig. 4.12, and the temperature was kept constant at 10:56mK when performing simulations with SVMC/SQA 112 4.6.2 Error estimation As discussed above, we performed primarily classical bootstraps for error esti- mates, though our probabilities of success were sampled from the correct pos- terior distribution for each instance. Annealers If we haveM instances in our set of interest, then we resample with replacement a \new" set of instancesfI i g j , also of length M, from our setI. For each instanceI i;j from this new set we sample a value for its probability from its corresponding distribution Ii;j to get a set of probabilitiesfp i;j g. We then calculate whatever function F j =f(fp i;j g) we wish on these probabilities, e.g., the median over the set of instances. We repeat this process a large number of times (in our case, 1000), to have many values of our functionfF j g. We then take the mean and standard deviation over the setfF j g to get a value F and a standard deviation F , which form our value and error bar for that size and clause density. This is essentially a classicla bootstrap over instances, but with a proper Bayesian account of the probability of success for each instance (as no guage averaging was performed, we will here neglect gauges as a nuisance parameter). When we take the ratio of two algorithms A and B, for each pair of cor- responding data points for the two algorithms (i.e., each size and clause den- sity) we characterize each point with a normal distributionsN ( A ; A ) and N ( B ; B ). We then resample from each distribution a large number of times (1000) to get two sets of values for the function we wish to plot F : fF A g andfF B g. We take the ratio of the corresponding elements in the two sets S i =F A;i =F B;i and take the mean and standard deviation of the setfS i g to get our data point and error bar for the ratio. HFS algorithm For the HFS algorithm, time to solution for each problem is computed as the mean number of trees per sample multiplied by 0:625sL for a C L problem. The number 0:625 is chosen to approximate the times on a standard laptop, and the scaling with L is due to the fact that, in a parallel setting, the number of steps to reduce a tree is linear inL (exactly, it is 5 4 L+2, but the 2 serves only to mask asymptotic scaling and is thus not included in our TTS or speedup plots or the scaling analysis). Euclidean distance In order to generate the error bars in Fig. 4.8, instead of calculating the Eu- clidean distance over the total number of instances at a given (700 total), we calculated the Euclidean distance over half the number of instances (350 total). We were then able to perform 100 bootstraps over the instances, i.e., we picked 113 350 instances at random for each Euclidean distance calculation. For each boot- strap, we calculated the Euclidean distance, and the data points in Fig. 4.8 are the mean of the Euclidean distances over the bootstraps, while the error bars are twice the standard deviation of the Euclidean distances. Dierence in slope In order to generate the error bars in Fig. 4.10, we used the data points and the error bars in Fig. 4.9 as the mean and twice the standard deviation of a Gaussian distribution. We then took 1000 samples from this distribution and calculated their dierences. The means of the dierences are the data points in Fig. 4.10, and the error bars are twice the standard deviation of the dierences. 114 Figure 4.13: The DW2 Chimera graph. The qubits or spin variables occupy the vertices (circles) and the couplingsJ ij are along the edges. Of the 512 qubits, 503 were operative in our experiments (green circles) and 9 were not (red circles). We utilized subgraphs comprising LL unit cells, denoted C L , indicated by the solid black lines. There were 31; 70; 126; 198; 284; 385; 503 qubits in our C 2 ;:::;C 8 graphs, respectively. 115 4.7 Additional results In this section we collect additional results in support of the main text of the chapter, for those interested. 4.7.1 Degeneracy-hardness correlation It is known that a non-degenerate ground state along with an exponentially (in problem size) degenerate rst excited state leads to very hard SAT-type optimization problems [133]. Here we focus on the ground state degeneracy and ask whether it is correlated with hardness. We show the ground state degeneracy in Fig. 4.14. It decays rapidly as grows, and for suciently large (that depends on the problem size) the ground state found is unique (up to the global Z 2 symmetry). This suggests that degeneracy is not necessarily correlated with hardness, since in the main text we found that hardness peaks at 0:25. To this test directly we restrict ourselves to L = 8 and = 0:4. We then bin the 100 instances at this according to their degeneracy and study their median TTS using the HFS algorithm. We show the results in Fig. 4.15, where we see no correlation between degeneracy and hardness for a xed. We nd a similar result when we use the success probability of the DW2 as the metric for hardness. 4.7.2 Additional easy-hard-easy transition plots The universal nature of the scaling behavior can be seen in Fig. 4.16, com- plementing Fig. 4.2 with results for the 25th and 75th percentiles respectively. The peak in the TTS near = 0:2 is a feature shared by all the solvers we considered. 4.7.3 Optimality plots The absence of an optimal DW2 annealing time was discussed in detail in the main text, along with the optimality of the number of sweeps of the classical algorithms. Figure 4.17 illustrates this: a clear lower envelope is formed by the dierent curves plotted for SQA, SA, and SSSV, from which the optimal number of sweeps at each size can be easily extracted. Unfortunately no such envelope is seen for the DW2 results [Fig. 4.17a], leading to the conclusion that t a = 20s is suboptimal. A complementary perspective is given by Fig. 4.18, where we plot the TTS as a function of the number of sweeps, for a xed problem size L = 8. It can be seen that the classical algorithms all display a minimum for each clause density. The DW2 curves all slope upward, suggesting that the minimum lies to the left, i.e., is attained att a < 20s. We note that in an attempt to extract an optimal annealing time we tried to t the DW2 curves to various functions inspired by the shape of the classical curves, but this proved unsuccessful since the DW2 curves essentially dier merely by a factor of t a , as discussed in the main text. 116 � ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� � � � � � � � � � �� � �� � �� �� �� �� �� �� ������������������� Figure 4.14: Ground state degeneracy. Number of unique solutions as a function of clause density for dierent Chimera subgraph sizes C L . As the clause density is increased, the number of unique solutions found decreases to one (up to the global bit ip symmetry). Shown is the median degeneracy, i.e., we sort the degeneracies of the 100 instances for each value of L and , and nd the median. Our procedure counts the degenerate solutions and stops when it reaches 10 5 solutions. If the median has 10 5 solutions then we assume that not all solutions were found and hence the degeneracy for that value of L and is not plotted. These are solutions on the used qubits (e.g., there are many instances for each at L = 8 that use < 503 qubits); to account for the n uq unused qubits we multiply the degeneracy by 2 nuq . We can take this a step further and use these optimal number of sweeps results to demonstrate that the problems we are considering here really are hard. To this end we plot in Fig. 4.19 the optimal number of sweeps as a function of problems size for SAA. We observe that certainly for the smaller clause densities the optimal number of sweepss opt appears to scale exponentially inL, which indicates that the TTS (which is proportional tos opt ) will also grow exponentially in L. 4.7.4 Additional speedup ratio plots To test whether our speedup ratio results depend strongly on the percentile of the success probability distribution, we recreate Fig. 4.6 for the 25th and 117 ���������� �� � �� � �� � �� � �� � �� �� �� � �� � �� � ��������������������� � �� Figure 4.15: Scatter plot of the TTS for the HFS algorithm and the degeneracy at L = 8 and = 0:4 (100 instances total). Even though there is a wide range of degeneracy over several orders of magnitude, we do not observe any trend in the TTS. The degeneracy accounts for the fact that some qubits are not coupled into the problem (e.g., if n qubits are not specied for that particular problem, then the degeneracy is 2 n times the directly counted degeneracy). The Pearson correlation coecient is0:046. 118 ���������������� ��� ��� ��� ��� ��� ��� ��� � � ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� �� � �� � �� � �� � �� � �� � �� � �� � ���� �� � �� � �� � �� � �� � �� � �� � �� � ��� �� � �� � �� � �� � �� � �� � �� � �� � ��� �� � �� � �� � �� � �� � �� � �� � �� � ��� �� � �� � �� � �� � �� � �� � �� � �� � ������ ���������������� �� � �� (a) �������������� �� ��� ��� ��� ��� ��� ��� ��� � � ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� �� � �� � �� � �� � �� � �� � �� � �� � ���� �� � �� � �� � �� � �� � �� � �� � �� � ��� �� � �� � �� � �� � �� � �� � �� � �� � ��� �� � �� � �� � �� � �� � �� � �� � �� � ��� �� � �� � �� � �� � �� � �� � �� � �� � ������ ���������������� �� � �� (b) Figure 4.16: Comparison of the 25th (left) and 75th (right) percentiles of the TTS (log scale) for all algorithms as a function of clause density. The dierent colors represent the dierent Chimera sizes tested. All solvers show a peak at the same density value of 0:2. 119 75th percentiles in Fig. 4.20. The results are qualitatively similar, with a small improvement in the speedup ratio relative to HFS at the higher percentile. 4.7.5 Additional correlation plots To complement Fig. 4.7, we provide correlation plots for both the DW2 against itself at t a = 20s and t a = 40s (Fig. 4.21), the DW2 vs SAA (Fig. 4.22), the DW2 vs SQAA (Fig. 4.23), and SAA vs SQAA (Fig. 4.24. The DW2 against itself displays an excellent correlation at all clause densities, while the DW2 vs SAA and DW2 vs SQAA continues to be skewed at low and high clause densities. Recall that Fig. refg:Euclid-dist provides an objective Euclidean distance measure that is computed using all problem sizes and depends only on the clause density. 4.7.6 Additional scaling analysis plots We provide a few additional plots in support of the scaling analysis presented in the main text. Figure 4.25a shows the number of runs at dierent problem sizes and clause densities, and the corresponding least-squares ts. It can be seen that the straight lines ts are quite good. The slopes seen in this gure are the b() values for t a = 20s plotted in Fig. 4.9; the intercepts are plotted in Fig. 4.25b for all annealing times, and collapse nicely, just like the b(). Finally, Fig. 4.26a is a check of the convergence of SAA to its asymptotic scaling coecient as the number of sweeps is increased from 5; 000 to 50;000. Convergence is apparent within the 2 error bars. 4.7.7 SQAS vs SAS In the main text we only considered SQA as an annealer since that is a more faithful representation of the DW2. Here we present a comparison of SQA as a solver (SQAS), where we keep track of the lowest energy found during the entire anneal, with SAS. We present the scaling coecient b() from Eq. (4.6) of these two solvers in Fig. 4.26b. SAS has a smaller scaling coecient than SQAS for the large values, but at small values we cannot make a conclusive determination because of the substantial overlap of the error bars. We note that Ref. [25] reported that discrete-time SQA (the version used here) can exhibit a scaling advantage over SA but that this advantage vanishes in the continuous- time limit. We have not explored this possibility here. 4.7.8 Scale factor histograms We analyze the eect of increasing clause density and problem size on the re- quired precision of the couplings. Fig. 4.27 shows a trend of scaling factor increasing as increases for xed L, and as L increases for xed . Increased scaling factor has the eect of relatively amplifying control error and thermal 120 eects in DW2, and can therefore contribute to a decline in performance for the larger problems studied. However, recall that the region where a speedup is possible according to our results is in fact that of high clause densities. Thus, whatever the eect of precision errors is, it does not appear to heavily impact the DW2's performance in the context of our problems. The same is true for SAS, since as can be seen in Fig. 4.10, the scaling coecient is unaected by the addition of noise when & 0:6. 121 � � � � � � � � ���� ���� � ���� ���� ���� � �� � �� � �� � �� � �� � �� � �� � ��� (a) DW2 � � � � � � � � ���� ���� � ���� ���� ���� � �� � �� � �� � �� � �� � �� � �� � ��� (b) SQA � � � � � � � � ��� ��� ��� ��� � ��� � �� � �� � �� � �� � �� � �� � ��� (c) SA � � � � � � � � ���� ���� � ���� ���� ���� � �� � �� � �� � �� � �� � �� � �� � ��� (d) SSSV Figure 4.17: Suboptimal annealing time and optimal sweeps for = 0:2. Plotted is the TTS (log scale) as a function of size L for (a) the DW2, with all available annealing times, (b) SQA, (c) SA, and (d) SSSV, for many dierent sweep numbers. The lower envelope gives the scaling curves shown in Fig. 4.5 for = 0:2. The TTS curves atten at high L for the following reason: each classical annealer was run N X times (N SA = N SSSV = 10 4 , N SQA = 10 3 ), and our distribution is(0:5;N X +0:5) for 0 successes, which has an average value of 1=(2N X ). This re ects the (Bayesian) information acquired after N X runs with 0 successes (one would not expect the probability to be 0). The attening has no impact on the scale of the optimal number of sweeps. 122 �� �� �� �� �� �� �� �� �� �� �� ��� ��� ��� ��� ��� ��� ��� ��� ��� � �� � �� � �� � ��������������������� (a) DW2 ������������������ � ��� ��� ��� ��� ��� ��� ��� ��� ��� ���� ��� ��� ��� ��� ��� ��� ��� ��� ��� � �� � �� � �� � �� � ��������������������� (b) SQA ������ ������������ � ��� ��� ��� ��� ��� ��� ��� ��� ��� ���� ��� ��� ��� ��� ��� ��� ��� ��� ��� � �� � �� � �� � �� � �� � ��������������������� (c) SA ������������������ � ��� ���� ���� ���� ���� ��� ��� ��� ��� ��� ��� ��� ��� ��� � �� � �� � �� � �� � �� � ��������������������� (d) SSSV Figure 4.18: TTS (log scale) for L = 8 as a function of number of sweeps for DW2, SQA, SA, and SSSV used to identify the optimal number of sweeps. 123 ● ● ● ● ● ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ● α = 0.1 ■ α = 0.2 ◆ α = 0.3 ▲ α = 0.4 ▼ α = 0.5 4 5 6 7 8 1.0 1.5 2.0 2.5 L log 10 (S opt ) ● ● ● ● ● ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ● α = 0.6 ■ α = 0.7 ◆ α = 0.8 ▲ α = 0.9 4 5 6 7 8 1.0 1.5 2.0 2.5 L log 10 (S opt ) Figure 4.19: Scaling of the SAA optimal number of sweeps. The optimal number of sweeps is extracted for eachL from Fig. 4.18. The scaling is roughly exponential for the smaller values, and appears to be close to exponential for the larger values. Lines are guides to the eye. 124 ������� ������ ������� � ��� ��� ��� ��� ��� ��� ��� � ��� ���� ��� ��� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� �� �� �� �� � �� � �� � �� � �� � ������� (a) 25th percentile ������� ������ ������� � ��� ��� ��� ��� ��� ��� ��� � ��� ���� ��� ��� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� �� �� �� �� � �� � �� � �� � �� � ������� (b) 75th percentile Figure 4.20: The speedup ratios (log scale) for the 25th and 75th percentile of time-to-solution as a function of system size for various . The dierent colors denote a representative sample of clause densities. 125 Figure 4.21: Success probability correlations. The results for all instances at all values are shown for the DW2 data at t a = 20s and t a = 40s. This complements Fig. 4.7; the color scheme corresponds to dierent sizes L as in that gure. 126 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.45 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.55 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.65 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.85 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20μs SAA success probability α = 0.9 Figure 4.22: Success probability correlations. The results for all instances at all values are shown for the DW2 data at t a = 20s and SAA with S = 50;000 and f = 5. This complements Fig. 4.7; the color scheme corresponds to dierent sizesL as in that gure. The correlation gradually improves from poor for the lowest clause densities to strong at = 0:35, then deteriorates again. 127 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.45 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.55 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.65 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.85 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 DW success probability, ta=20 μs SQA success probability α = 0.9 Figure 4.23: Success probability correlations. The results for all instances at all values are shown for the DW2 data at t a = 20s and SQAA with S = 10;000 and = 5. This complements Fig. 4.7; the color scheme corresponds to dierent sizes L as in that gure. The correlation gradually improves from poor for the lowest clause densities to strong at = 0:35, then deteriorates again. 128 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.45 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.55 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.65 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.85 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 SAA success probability SQA success probability α = 0.9 Figure 4.24: Success probability correlations. The results for all instances at all values are shown for the SQAA data at S = 10;000 and = 5 and SAA withS = 50;000 and f = 5. This complements Fig. 4.7; the color scheme corresponds to dierent sizes L as in that gure. The correlation gradually improves from poor for the lowest clause densities to strong at = 0:35, then deteriorates again. Note that we did not attempt to optimize the correlations between the two methods. 129 ● ● ● ● ● ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ □ □ □ □ □ ◇ ◇ ◇ ◇ ◇ △ △ △ △ △ ● α = 0.1 ■ α = 0.2 ◆ α = 0.3 ▲ α = 0.4 ▼ α = 0.5 ○ α = 0.6 □ α = 0.7 ◇ α = 0.8 △ α = 0.9 4 5 6 7 8 0 1 2 3 4 5 6 7 L log 10 (number of runs) t a =20 μs (a) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● t a = 20 ■ t a = 22 ◆ t a = 24 ▲ t a = 26 ▼ t a = 28 ○ t a = 30 □ t a = 32 ◇ t a = 34 △ t a = 36 ▽ t a = 38 ● t a = 40 0.0 0.2 0.4 0.6 0.8 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 α scaling constant a( α) (b) Figure 4.25: Exponential ts to the DW2 number of runs. In accordance with Eq. (4.6), the least-squares linear ts to log[r(L;; 0:5)] are plotted in (a) for t a = 20s. To reduce nite-size scaling eects we exclude L = 2; 3 and perform the t forL 4. The intercept of the linear ts is the the DW2 scaling constant a() from Eq. (4.6), and is shown in (b). Error bars represent 2 condence intervals. 130 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ▽ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● nSW=5k ■ nSW=6k ◆ nSW=7k ▲ nSW=8k ▼ nSW=9k ○ nSW=10k □ nSW=20k ◇ nSW=50k 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1.0 1.2 α scaling coefficient b( α) ● t a = 20 ■ t a = 22 ◆ t a = 24 ▲ t a = 26 ▼ t a = 28 ○ t a = 30 □ t a = 32 ◇ t a = 34 △ t a = 36 ▽ t a = 38 ● t a = 40 (a) (b) Figure 4.26: (a) The DW2 scaling coecients b() from Eq. (4.6) for SAA at various sweep numbers and f = 5, along with the DW2 scaling coecients for all t a s we tried. (b) The SQAS and SAS scaling coecient b() at the optimal number of sweeps for each. SAS had a nal temperature of f = 5 and SQAS was operated at = 5. 131 � ��� �� � � � � � � � � �� �� �� �� � �������� � � �� � � ��� ��� � � � � � � � � �� �� �� �� � � � �� � � ��� ��� � � � � � � � � �� �� �� �� �� �� � � � �� � � �� �� � � � �� �� � � � � � � � �� �� �� �� �� �� � �������� � � �� � � �� ��� � �� �� �� �� �� � � � � � � � �� �� �� �� �� �� � � � �� � � ��� � � �� �� �� � � � � � � � �� �� �� �� �� � � � �� � � ������� � ��� �� �� �� � �� �� �� �� � � � � � � � �� �� �� �� �� � �������� � � �� � � ������� � ��� �� ���� �� �� �� �� �� � � � � � � � �� �� �� �� �� � � � �� � � ������� � ��� �� � � � �� �� �� �� �� �� �� �� �� � � � � � � � �� �� �� �� �� � � � �� � Figure 4.27: Histograms of the scale factor of the instances as a function of system size for various values of . All problems passed to DW2 must have all coupler in the range [1; 1], so all couplers in a problem are rescaled down by a factor equal to the maximum absolute value of the couplers in the problem (and hence this quantity is called the scale factor). Since internal control error (ICE) is largely instance-independent, the larger the scaling factor of an instance, the worse the relative impact of ICE will be. We see a drift to larger scaling factors for with increasing size L and increasing clause density . Larger values of obviously will have larger scale factors as there are on average more loops per qubit (and thus, per edge) and thus a larger maximum potential coupler strength. Since the edges included in a loop are generated randomly, the more edges available at a xed clause density, the more opportunities for a single edge to be included in many loops by chance, resulting the average scale factor to drift upward as function of problem size. 132 Chapter 5 Solving a Higgs optimization problem with quantum annealing for machine learning 5.1 Introduction The discovery of the Higgs boson decays in a background of Standard Model processes was assisted by machine learning methods [134, 135]. The classi- ers used to separate signal from background are trained using highly unerring but not completely perfect simulations of the physical processes involved, often resulting in label noise and systematic errors. In this chapter we investigate the application of quantum[46, 47, 136, 137] and classical annealing[17, 21] in solving a Higgs signal versus background machine learning optimization prob- lem. We bag a set of weak classiers built based on the kinematic observ- ables of the Higgs decay photons into a strong classier that is highly resilient against overtraining and errors in the Monte Carlo simulation correlations of the physics observables. We show that the resulting quantum and classical anneal- ing classier systems perform comparably to current state of the art machine learning methods used in particle physics[138, 139] and are simple functions of directly interpretable experimental parameters with a clear physical meaning. The annealer-trained classiers exploit the excited states in the vicinity of the ground state and demonstrate some advantage for small training sizes. This technique may nd application in other areas of experimental particle physics given the algorithm's relative simplicity and robustness to error. This chapter was originally published as Ref [6], with minor updates and modications here. The Higgs boson discovery at the Large Hadron Collider (LHC) [134, 135] marks the beginning of a new era in Particle Physics. Experimental particle 133 physicists at the LHC are measuring the new boson's properties [140, 141], searching for heavier Higgs bosons[142] and trying to understand if the Higgs boson interacts with dark matter[143]. Cosmologists are trying to understand the symmetry-breaking Higgs phase transition that took place early in the his- tory of the universe and whether that event explains the excess of matter versus antimatter[144]. Curiously the measured value of the Higgs boson mass [142] tells us that the symmetry-breaking quantum vacuum is metastable[145] unless new physics intervenes. The profound implications of the Higgs boson discovery will keep motivating physics research for the years to come. One of the key requirements to precisely measure the properties of the Higgs boson is selecting large, high-purity samples containing the production and de- cay of a Higgs particle. Applying machine learning techniques [146] is both a potentially powerful approach and oers challenges. The challenge is greater when an investigation requires faithful simulation not only of the physics observ- ables themselves, but also of their correlations in the data. In the measurement of the properties of the Higgs boson [140], disagreements between simulations and observations result in label noise and systematic uncertainties in the ef- ciency of the classiers that adversely impact the classication performance and translate into uncertainties on the measured properties of the discovered particle. To address these challenges in the Higgs signal versus background optimiza- tion problem we study a binary classier trained with classical simulated an- nealing (SA) [17, 21] and quantum annealing (QA) [46, 47, 136, 137, 36]. To im- plement QA we use a programmable quantum annealer (DW) built by D-Wave Systems, Inc. and housed at the University of Southern California's Information Sciences Institute, comprising 1098 superconducting ux qubits (note: this is the rst study in this dissertation that uses the DW2X model, previous chap- ters used the DW2). The optimization problem is mapped to one of nding the ground state of a corresponding Ising spin model. We exploit the excited states in the vicinity of the ground state in the training method to improve the accuracy of the classiers beyond the baseline ground state nding model. We refer to this approach as quantum annealing for machine learning (QAML). 5.2 Results In what follows we discuss our results. The classier accuracy is our criterion for comparison between various classier construction methods. A classier that is slow to train may prove practically more useful than a slightly less performant one that is faster to train. Again, I wish to highlight that unlike in previous chapters, our goal is to estimate the performance of a classier, not estimate time to solution. As I'll discuss below, this modies many of our procedures, but the essential ideas remain. We model the Higgs diphoton decay channel H ! . See Fig. 5.1 for the Feynman diagrams of the Higgs production and decay processes. We can 134 represent this system via the momentum of the Higgs particle (H), the momenta of the two photons ( ), the angle with the beam axis, and the azimuthal angle . More specically, we select eight of the kinematical variables describing the generated events as the variables for our classier. The rst ve are related to the highest (p 1 T ) and second highest (p 2 T ) transverse momentum (momentum perpendicular to the axis dened by the colliding protons) of the photon pair: p 1 T =m ,p 2 T =m , (p 1 T p 2 T )=m ,p T =m , wherem is the invariant mass of the diphoton pair andp T is the transverse momentum of the diphoton system. The last three are of the two photons { the separation in the pseudorapidity = log(tan(=2)) ( is a pseudo-invariant proxy to commonly used in high energy physics), R = p 2 + 2 { the sum in quadrature of the separation in and of the two photons, andj j { the value of the diphoton system. Figure 5.2 shows the distribution of these variables for the signal and background datasets. The dierences between these distributions are exploited by the classier to distinguish the signal from the background. In addition to these eight variables, we incorporate the various products [using certain rules explained in the Section 5.7] between said variables, for a total of 36, given in Table 5.2. We construct weak classiers from our distributions of kinematical variables as shown in Fig. 5.2, and as described in Methods. We build the corresponding Ising problem as follows[137]. LetT =f~ x ;y g denote a given set of training events labeled by the index , where ~ x is a vector collecting the values of each of the variables we use, and y =1 is a binary label for whether ~ x is signal (+1) or background (1). Ifc i (~ x ) =1=N denotes the value of weak classier i on the event, where N is the number of weak classiers, equal to the number of spins or qubits, then with C ij = P c i (~ x )c j (~ x ) and C i = P c i (~ x )y and a penalty > 0 to prevent overtraining, the Ising Hamiltonian is H = P i;j J ij s i s j + P i h i s i , where s i =1 is the i th Ising spin variable, J ij = 1 4 C ij is the coupling between spins i and j, and h i = C i + 1 2 P j=1 C ij is the local eld on spin i. The problem QA or SA attempt to solve is to minimize H and return the minimizing, ground state spin congurationfs g i g i . The strong classier is then constructed as R(~ x) = P i s g i c i (~ x)2 [1; 1] for each new event ~ x that we wish to classify [137]. We introduce an additional layer in our study by also constructing strong classiers from excited state spin congurations. As benchmarks for traditional machine learning methods we train a deep neural net (DNN) using Keras[138] with the Theano backend,[147] and an en- semble of boosted decision trees using XGBoost[139], using optimized choices for training hyperparameters (details of which can be found for those interested in the Section 5.7). 5.2.1 Variable inclusion We compare the ground state congurations for2f0:01; 0:05; 0:1; 0:2; 0:4; 0:8g; a larger implies an increasing penalty against including additional variables, 135 and thus we expect the variables included at = 0:8 to be determining the performance of the classiers. Table 5.3 presents the relative strength of the variables in determining the classier performance by showing how often vari- ables are included in the ground state conguration of the full 36 variable prob- lem derived from 20 dierent training sets with 20k training events each, as a function of the penalty term . We nd that two of the original kinematical variables, p 1 T and , are never included. The number of classiers included in the ground state of all 20 training samples' corresponding Hamiltonian is 16 out of 36 for 0:05 and the following three for = 0:8: i) p 2 T =m , ii) (Rp T ) 1 , and iii) p 2 T =p T . Those three dominate the network performance and they would have been dicult to guess a priori in their composite form. The physical reason for why these variables are important for the classier can be gleaned from considering the kinematics of the system. The key dierence between an event with a Higgs decaying to two photons and another process that produces two photons in its nal state is the production of the heavy parti- cle in the event. A heavy particle will require considerably more energy to boost perpendicular to the beamline and hence we would expect real Higgs events to have a characteristically lower p T than background events. Since the system with the Higgs has less transverse boost, we would expect the two photons to have similar p T spectra, and hence the momentum of the sub-leading photon will typically be higher than in those events without the heavy process. The p T of the rst photon is largely determined by the overall energy available in the collision, which is also set by m , hence p 1 T =m is largely stochastic and provides little discrimination. 5.2.2 Classier performance : We estimate the receiver operating characteristic (ROC) curves on the training set and construct a nal output classier so that for a given signal eciency (true positive rate) S one uses the strong classier sampled from the annealer which has the maximum background rejection (true negative rate) r B . We construct such compound classiers for both SA and DW using excited states within a fraction f of the ground state energy E g , i.e., all thosefs i g returned so that H(fs i g)< (1f)E g (note that E g < 0). SA is used as a natural comparison to DW on these fully connected problems. In our experiments DW struggles to nd the true minimum of the objective function. This is likely a consequence of the fact that the current generation of the DW2X quantum annealers suers from non-negligible noise on the pro- grammed Hamiltonian. 1 Therefore we study and interrogate current-generation quantum annealers and interpret their performance as a lower bound for the performance of future systems with lower noise and denser hardware graphs. In Fig. 5.3, we plot the ROC curves for each algorithm forf = 0:05 at train- ing sizes 100 and 20; 000. We observe a clear separation between the annealing- 1 The problem of noise is compounded by the relatively sparse graph that requires a chain of qubits to embed the fully-connected logical Hamiltonian. In our case 12 qubits are ferro- magnetically coupled to act as a single logical qubit 136 based classiers and the XGB and DNN classiers, with the advantage for the annealers appearing at the small training size, but disappearing at the large training size. In Fig. 5.4, we plot the area under the ROC curve (AUROC) for various training sizes and f = 0:05 (the largest value we used) for each algo- rithm. An ideal classier would have an AUROC of 1. We nd that DW and SA have comparable performance, implying high robustness to approximate solu- tions of the training problem. This feature appears to generalize across QAML domain-applications (in review, by Li, Felice, Rohs, and D.L. (LFRL)). Here the asymptotic performance of the QAML model is achieved with just 1000 training events, and thereafter the algorithm does not benet from additional data. This is not true of DNN or XGB. A notable nding of this work is that QAML has an advantage over both DNN and XGB when training sizes are small. This is shown in Fig. 5.5 in terms of the integral of the true negative dierences over signal eciency for various ROCs. In the same regime of small training sizes, DW develops a small advantage over SA as the fraction of excited states f used increases, saturating atf = 0:05. The uncertainties are too large to draw deni- tive conclusions in this regard. In the regime of large training sizes, SA has a small advantage over DW, to a signicance of approximately 2. 5.3 A note on data collction and error analysis, ie benchmarking, on this problem It seems appropriate to pause and give a discussion here in the main text of this chapter on data collection and error analysis, in light of the discussion in chapter 3. For both SA and DW, to evalute performance we constructed a histogram of the unique solutions returned from the algorithm, and exclude those states with rates of occurrence low enough that one cannot be certain of their inclusion in further runs. This is done by excluding any solution occurring fewer than three times, as they have a greater than 5% chance of exclusion in subsequent batches of 10 4 solutions. In this way, we are giving a robust lower bound on the ensemble classier's performance. In essence, this represents a compromise between the very high computational costs of doing the full data analysis de- tailed in Chapter 3 and the desire to get as accurate an estimate of errors as possible. By excluding elements that have signicant weight to not be resam- pled in a standard bootstrap procedure, we provided a conservative estimate of performance at much lower computational cost. Since the improvements gained by including additional states were relatively small, this is an acceptable com- promise. The use of a classical bootstrap was due to the enormous number of samples and our very conservative cuto. A similar procedure could have also been done with the Bayesian bootstrap, but in this case the classical bootstrap was simpler to code and the dierences would be negligible. 137 5.3.1 Receiver operator characteristic curves and their un- certainty analysis Any classier may be characterized by two numbers: the true positive and true negative rates, in our case corresponding to the fraction of events successfully classied as signal or background, respectively. Since our classiers all return oating point values in [1; 1], to construct a binary classier we introduce a cut in this range, above and below which we classify as signal and background, respectively. Since this cut is a free parameter, we vary it across the entire range and plot the resulting parametric curve of signal acceptance (true positive, S ) and background rejection (true negative, r B ), producing a receiver operating characteristic (ROC) curve [148]. More explicitly, consider a labeled set of validation events,V =f~ x v ;y v g, with y v = 0 or 1 if ~ x v is background or signal, respectively, and a strong classier R ~ w (~ x). The latter is constructed from a given set of weak classiers and a vector of weights ~ w previously obtained from training over a training setT . The strong classier outputs a real numberR ~ w (~ x v ) = P i w i c i (~ x v ). To complete the classier, one introduces a cut, O c , such that we classify event ~ x v as signal if R ~ w (~ x v ) > O c and background if R ~ w (~ x v ) < O c . If we evaluate the strong classier on each of the events in our validation setV, we obtain a binary vector of classications, ~ C =fC v g, with entries 0 denoting classication as background and 1 denoting classication as signal. By comparing C v to y v for all v we can then evaluate the fraction of the events which are correctly classied as background, called the true negative rate or \background rejection" r B (equal to the number of times C v = y v = 0, divided by the total number of actual background events ), and the fraction of events correctly classied as signal or \signal eciency" S (equal to the number of times C v = y v = 1, divided by the total number of actual signal events). For a given strong classier, these values will be a function of the cuto O c . Plotting r B (O c ) against S (O c ) yields a parametric curve, dubbed the \receiver operator characteristic" (ROC) curve, as shown in Fig. 5.6. Note that the cutos are trivial to adjust while all the computational eort goes into forming the networks, so one can vary O c essentially for free to tune the performance of the network to suit one's purposes. In other words, for a given strong classier, i.e., solution/state, we can eval- uate its output as a oating point number on each of the values in our data set, and for any value of a cut on [1; 1] this results in a single classication of the test data, ~ C. One can then evaluate the true positive and true negative rates by computing ~ C v ~ y k where k2 [S;B] (signal, background), y i k = 1 if datum i is in ensemble k and is 0 otherwise. When we takef > 0 and accept excited states with energy E < (1f)E GS as \successes", we have a set of networks (labeled by f) for each training set. We simply take the supremum over the f-labeled set of values of r B at each value of S , to form the ROC curve for the classier formed by pasting together dierent classiers over various ranges of S . To estimate the error due to limited test sample statistics, we reweight each element of the test set with weights ~ w drawn from a Poisson distribution with 138 mean 1, eectively computing P i w i p i y i k . The weights on the elements of the test set are determined for all elements at once and we evaluate all strong clas- siers using the same weights. As we are talking about events that occur at random intervals with some xed rate, the Poisson distribution is the appropri- ate correlate in a Bayesian bootstrap-like scheme for this problem. For a single weight vector, we evaluate many values of the cut, and use linear interpola- tion to evaluate it in steps of 0.01 in the region [0; 1]. This gives us the true negative rate as a function of the true positive rate for a single weighting cor- responding to a single estimated ROC curve. When constructing a composite classier from multiple states, we are identifying regions of signal eciency in which one should use one of the states rather than the others, namely, we take the maximum background rejection rate over the states for each value of signal eciency. Repeating for many reweightings we get many ROC curves all of which are consistent with our data, and thus the standard deviation across weights on a single training set at each value of signal eciency serves as an estimate of the statistical uncertainty in our ROC curves. To estimate variation due to the choice of the training set towards reproduc- tions of the procedure and results, for a given training size we generate multiple disjoint training sets and use the standard deviation in mean performance across training sets as our estimate of the error on the model resulting from the par- ticular choice of training set. When we compute the dierence between two ROCs or AUROCs, we hold the training set and weight vector xed, take the dierence, and then perform statistics over the weights and training sets in the same manner as above. Errors in the AUROC were estimated similarly, taking the AUROC for each Poisson weight vector and training set (fold) instead of r B ( S ). This is the procedure leading to Fig. 5.3 in the main text. An example of the ROC curves is given in Fig. 5.6. At the scale of that plot, it is virtually impossible to tell the detailed dierences between SA and DW or the various values of f, so we use plots of dierences of AUROCs to extract more detailed information about the ROC curves. This leads to Fig. 5.4. Additional dierence plots are given in Sec. 5.7.7 below. As one can see, this error analysis procedure diverges in the specics from that of chapter 3, but is not fundamentally dierent in spirit. One is still taking into account what information one has about the variables of interest in light of the available data, but in this case we primarily use a Poisson reweighting scheme rather than a Dirichlet distribution as we are simulating a posterior for a physical process where each element occurs with Poisson weights. 5.4 Discussion We proceed to discuss and summarize our results from this study. With this study we explore QAML, a simple method inspired by the prospect of using quantum annealing as an optimization technique for constructing classiers, and apply the technique to the detection of Higgs decays. The training data is 139 represented in a compact representation ofO(N 2 ) couplers and local biases in the Hamiltonian for N weak classiers. The resulting strong classiers perform comparably to the state-of-the-art standard methods used in high energy physics today, and have an advantage when training sizes are small. The role of QA is that of a subroutine for sampling the Ising problem that may in the future have advantages over classical samplers, either when used directly or as a method of seeding classical solvers with-high quality initial states. From the perspective of benchmarking, its role as a subroutine necessitates comparing QA to SA in the same computing pipeline in order to guarantee whatever results we have are not a result of merely our transformation to an Ising-type problem. For our case, SA solved the problems quite eciently, so we did not need to extend our analysis to PIMC/SQA, HFS, or PT. Meanwhile, we still had to compare our QA/SA pipeline (namely QAML) to state of the art methods in the eld, namely neural networks and boosted decision trees. Had we not done so, it would have been in principle possible to fool ourselves about the performance of DW or SA. This is a really general principle that can be applied to any benchmarking task: to demonstrate a real speedup of any kind, you have to replace the algorithms for which you're using quantum resources both by swapping the quantum resources directly with classical ones and by comparing entirely dierent, best-in-class algorithms for the problem class at hand. QAML is resistant to overtting, as the method involves an explicit lin- earization of correlations. It is also less sensitive to errors in the Monte Carlo correlation estimates than DNNs or binary decision trees due to the truncation of the tails of the distributions (see the Section 5.7). A useful aspect of the model is that it is directly interpretable, as each weak classier directly cor- responds to some physically relevant variable or product or ratio of variables, and the strong classier is a simple linear combination thereof. This is in con- trast to the usual creation of black-box machine learning discriminants, such as when using DNNs and XGBoost, where techniques for interpretability are still an active area of research [149]. This example of using quantum annealing to optimize classiers in a physics problem was the rst to-date, to the best of the authors' knowledge, and it opens further opportunities for research. It has since been followed up with a similar study in biology [150]. Quantum machine learning has taken great theoretical strides in the past years [151, 152, 153, 154, 155, 156, 157] and we demonstrate that we can already apply elements of this technique to current and next-generation quantum annealing architectures. The near future will see applications of these techniques to more complex problems in particle physics and other domain sciences. For example, this work has motivated similar studies in computational biology (in review, LFRL).We envision that QAML could be used further in the context of data certication in high energy physics, where training with small dataset size could be particularly useful. One can lever- age the robustness of QAML to both signicantly reduce the level of human intervention and increase the accuracy in quickly assessing and certifying the collision data for physics analysis. Being impervious to overtraining, QAML is 140 a good candidate for boosting[139] and we foresee studies to evaluate this us- ing future versions of the D-Wave quantum annealers or other architectures, or even classical optimization techniques such as SA. Multi-stage classiers|where the most in uential variables are found via training and pulled out to form a smarter classier|could be used to mitigate the in uence of hardware noise in the quantum annealer. With more available qubits, or more ecient architec- tures, integer weights could be used in place of the binary weights through a straightforward extension of the encoding scheme used in this work. In the next and nal, mercifully brief chapter, I will summarize some of the principles of benchmarking we've learned/encountered along the way of this journey, in the hopes of providing a quicker referencefor the community. 5.5 Acknowledgements for this chapter and au- thor contributions statement This project is supported in part by the United States Department of Energy, Oce of High Energy Physics Research Technology Computational HEP and Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359. The project is also supported in part under ARO grant number W911NF-12-1-0523 and NSF grant number INSPIRE-1551064. The work is supported in part by the AT&T Foundry Innovation Centers through INQNET, a program for ac- celerating quantum technologies. We wish to thank the Advanced Scientic Computing Research program of the DOE for the opportunity to rst present and discuss this work at the ASCR workshop on Quantum Computing for Sci- ence (2015). We acknowledge the funding agencies and all the scientists and sta at CERN and internationally whose hard work resulted in the momentous H(125) discovery in 2012. A.M. conducted the rst mapping of the problem on the D-Wave software architecture, data analysis and machine learning methodology, and along with M.S. provided the core knowledge of the domain science (Higgs physics). J.J conducted QA and SA research and data analysis as well as the machine learning work and along with D.L. provided the core of QA/SA knowledge. J-R.V. conducted QA research, data analysis, machine learning methods and error analysis work with J.J. for the submitted version of the work. D.L. and M.S. provided the overall oversight and scrutiny of the research work, data analysis and results. D.L. conceived the quantum machine learning methodology and M.S. conceived the domain application. We consider the two graduate students (A.M. and J.J.) as operationally the rst authors of this work. All authors contributed to writing and reviewing the original manuscript as published in Nature, [6]. The authors declare no competing nancial interests. 141 g g t t t H t t t γ γ Figure 5.1: Representative Feynman diagrams contributing to the simu- lated distributions for Higgs signal and Standard Model background. The signal is a Higgs production through gluon-fusion which decays into two photons (top). Representative Leading Order and Next-to-Leading Order background processes are Standard Model two photon production processes (bottom). 142 0 2 4 6 p 1 T / m γ γ 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 1 2 3 p 2 T / m γ γ 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 2 3 4 Δ R 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 0 2 4 6 8 p γ γ T / m γ γ 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 2 4 6 8 ( p 1 T + p 2 T ) / m γ γ 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 0 1 2 3 4 5 ( p 1 T − p 2 T ) / m γ γ 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 0 1 2 3 Δ η 1 0 0 1 0 1 1 0 2 1 0 3 0 2 4 6 8 10 η γ γ 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 Figure 5.2: Distributions of the eight kinematical variables we used to construct weak classiers. The signal distribution is in green and solid, back- ground in blue and dotted. The vertical axis is the raw count of the number of events. The total number of events simulated in each case is 307732. 143 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.0 0.2 0.4 0.6 0.8 1.0 Background Rejection ROC (a) 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.0 0.2 0.4 0.6 0.8 1.0 Background Rejection ROC (b) 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.0 0.2 0.4 0.6 0.8 1.0 Background Rejection ROC (c) 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.0 0.2 0.4 0.6 0.8 1.0 Background Rejection ROC (d) Figure 5.3: ROC curves for the annealer-trained networks (DW and SA) at f = 0:05, DNN, and XGB. Error bars are dened by the variation over the training sets and statistical error. Both panels show all four ROC curves, with solid lines for DW (green) and SA (blue), and dotted lines for DNN (red) and XGB (cyan). Panels (a) and (c) [(b) and (d)] includes 1 error bars only for DW and DNN [SA and XGB], in light blue and pale yellow, respectively. Results shown are for the 36 variable networks at = 0:05 trained on 100 events for panels (a) and (b), and on 20; 000 events for panels (c) and (d). For 100 events the annealer trained networks have a larger area under the ROC curve, as shown directly in Fig. 5.4. The situation is reversed for 20; 000 training events, where the error bars are too small to be visible. The much smaller error bars are due to the increased number of events. 144 100 1000 5000 10000 15000 20000 Training size 0.54 0.56 0.58 0.60 0.62 0.64 0.66 Area under ROC (AUROC) Figure 5.4: Area under the ROC curves (AUROCs) for the annealer- trained networks (DW (green) and SA (blue), solid lines) at f = 0:05, and the conventional approaches (DNN (red) and XGB (cyan), dotted lines). The vertical lines denote 1 error bars, dened by the variation over the training sets (grey) plus statistical error (green); see the SI (Sec. 6) for details of the uncertainty analysis. While DNN and XGB have an advantage at large train- ing sizes, we nd that the annealer-trained networks perform better for small training sizes. Results shown are for the 36 variable networks at = 0:05. The overall QAML performance and its features, including the advantage at small training sizes and saturation at 0:64, are stable across a range of values for . An extended version of this plot with various values of is available in the SI in Fig. 2. 145 100 1000 5000 10000 15000 20000 Training size −0.02 0.00 0.02 0.04 0.06 Integral of ROC difference: DW-DNN (a) 100 1000 5000 10000 15000 20000 Training size −0.02 0.00 0.02 0.04 0.06 Integral of ROC difference: DW-XGB (b) 100 1000 5000 10000 15000 20000 Training size −0.004 −0.002 0.000 0.002 0.004 0.006 Integral of ROC difference: DW-SA (c) Figure 5.5: Dierence between AUROCs of (a) DW vs DNN, (b) DW vs XGBoost, and (c) DW vs SA, as a function of training size and fraction f above the minimum energy returned (the same values of f are used for DW and SA in (c)). Formally, we plot R 1 0 [r (DW) B ( S ) r (i) B ( S )]d S , where i2fDNN, XGBoost, SAg. The vertical lines denote 1 error bars. The large error bars are due to noise on the programmed Hamiltonian. 146 variable description p 1 T =m transverse momentum of the highest pT photon p 2 T =m transverse momentum of the second-highest pT photon (p 1 T +p 2 T )=m sum of the transverse momentum of the two photons (p 1 T p 2 T )=m dierence of the transverse momentum of the two photons p T =m transverse momentum of the diphoton system dierence in = log tan 2 , is the angle with beam axis R ( p 2 + 2 ) where is the azimuthal angle j j the value of the diphoton system Table 5.1: The kinematical variables used to construct weak classiers. Here, m is the invariant mass of the diphoton pair 147 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.0 0.2 0.4 0.6 0.8 1.0 Background Rejection ROC DW f = 0.05 SA f = 0.05 DNN XGB (a) 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.0 0.2 0.4 0.6 0.8 1.0 Background Rejection ROC DW f = 0.05 SA f = 0.05 DNN XGB (b) Figure 5.6: The ROC curves for the annealer-trained networks (DW and SA) at f = 0:05, DNN, and XGB. Error bars are dened by the variation over the training sets and statistical error. Both panels show all four ROC curves. Panel (a) [(b)] includes 1 error bars only for DW and DNN [SA and XGB], in light blue and pale yellow, respectively. Results shown are for the 36 variable networks at = 0:05 trained on 100 events. The annealer trained networks have a larger area under the ROC curve 148 100 1000 5000 10000 15000 20000 Training size 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 Area under ROC (AUROC) DW, f = 0. 05 SA, f = 0. 05 XGB DNN QAML λ = 0. 0 QAML λ = 0. 1 QAML λ = 0. 2 Figure 5.7: A reproduction of Fig. 5.4 from the main text, now including the optimal strong classier found by SA at f = 0 for various values of the regu- larization parameter = 0:; 0:1; 0:2. We nd that this parameter has negligible impact on the shape of the AUROC curve, and that performance for SA always saturates at 0:64, with an advantage for QAML (DW) and SA over XGB and DNNs for small training sizes. 149 Table 5.2: Map from number to variable/weak-classier name 1 2 3 4 5 6 7 8 9 p 1 T p 2 T R p T p 1 T +p 2 T p 1 T p 2 T (p 1 T +p 2 T ) 10 11 12 13 14 15 16 17 18 p 2 T p 1 T p 2 T p 2 T p 2 T 1 Rp T p 1 T +p 2 T R 1 R(p 1 T p 2 T ) 1 R R 1 (p 1 T p 2 T ) 19 20 21 22 23 24 25 26 27 p 1 T p 2 T p 1 T R p 1 T p T p 1 T (p 1 T +p 2 T ) p 1 T p 1 T p 2 T p 1 T p 1 T p 2 T R p 1 T p 2 T 28 29 30 31 32 33 34 35 36 p 2 T p T p 2 T (p 1 T +p 2 T ) p 1 T +p 2 T p T p T 1 p T 1 p T (p 1 T p 2 T ) p 1 T +p 2 T p 1 T p 2 T p 1 T +p 2 T 150 Table 5.3: Variable inclusion in the ground states of the Ising problem instances. The variables listed are those from which we selected the various variables included in our tests with varying problem size. We list how many out of 20 training sets had the given variable turned on in the ground state conguration. Three of the 36 variables were included for all values of the penalty term and for all of the training sets [p 2 T , (Rp T ) 1 , and p 2 T p T ], the variables p 2 T =(p 1 T p 2 T ) and (p 1 T +p 2 T )= were present in almost all, while seven were never included, among which the original kinematical variables p 1 T and . All momenta (p 1 T ;p 2 T ;p T ) are given in units of m . Variables are given in Table 5.2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0 0 20 20 20 19 20 20 0 5 20 20 20 20 19 20 17 20 20 0.01 0 20 20 20 19 20 20 0 4 20 20 20 20 19 20 17 20 20 0.02 0 20 20 20 19 20 20 0 4 20 20 20 20 19 20 16 20 20 0.05 0 20 20 20 19 20 20 0 1 20 20 20 20 19 20 10 20 17 0.1 0 20 20 20 19 20 20 0 0 20 20 20 20 19 20 6 14 2 0.2 0 20 20 20 19 20 20 0 0 20 14 20 20 12 20 4 1 0 0.4 0 20 0 2 19 20 20 0 0 20 17 20 20 0 20 1 0 0 0.8 0 20 0 0 0 0 9 0 0 18 0 0 20 0 2 0 0 0 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 0 20 0 0 19 0 20 0 3 0 20 19 7 0 15 0 19 20 20 0.01 20 0 0 19 0 20 0 2 0 20 19 6 0 15 0 19 20 20 0.02 20 0 0 19 0 20 0 1 0 20 19 4 0 15 0 19 20 20 0.05 20 0 0 19 0 20 0 0 0 20 16 1 0 11 0 19 20 20 0.1 20 0 0 1 0 20 0 0 0 20 1 0 0 5 0 16 20 20 0.2 18 0 0 0 0 20 0 0 0 20 0 0 0 0 0 0 20 20 0.4 0 0 0 0 0 7 0 0 0 20 0 0 0 0 0 0 20 3 0.8 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 19 0 The following sections are available for those who want much more detailed information on our problem sets and parameters for our algorithms, more de- tailed information on ROC curves, and a large number of plots for those inter- ested which are not necessary to understand the thrust of this study. 5.6 Methods 5.6.1 Problem Construction We simulate 3x10 5 125 GeV mass Higgs particle decays produced by gluon fu- sion at p ^ s = 8 TeV using PYTHIA 6.4 [158] and 3x10 5 SM-process background events using SHERPA [159] after restricting to those processes with realistic detector acceptance and trigger requirements that lie directly under the Higgs peak (to ensure the classier cannot select on mass information): jj < 2:5 with one photon having p T > 32 GeV and the other having p T > 25 GeV, with total di-photon invariant mass 122:5 GeV < m < 127:5 GeV. The dominant Feynman diagrams are given in Figure 5.1. The resulting distributions for eight 151 kinematical variables in this problem are given in Fig. 5.2. The complete pro- cedure of the weak classier construction is given in the SI. There are 666 oating point parameters in our Ising Hamiltonian on 36 variables. XGBoost with a maximum depth of 10 has up to 1024 decisions (each a free variable/parameter) in each tree. Our DNN has 2000 local biases and approximately 500000 weights on/between the two 1000 node hidden layers. 5.6.2 Data collection and analysis For SA, all the Ising Hamiltonians are run 10 4 times at various numbers of sweeps in the range 1000 at initial inverse temperature = 0:1 to nal inverse temperature = 5, with a linear schedule. Ground state energies are esti- mated using SA at 10 4 sweeps with a linear schedule from = 0:1 to = 10. For DW we rst create for each instance a heuristic embedding using the D- Wave API. DW is run with 50 gauges [36] at the minimum possible annealing time of 5s for 200 samples per gauge at a chain strength of 6. This is in line with my proposal from chapter 3, though here we used a xed number of gauges because the data analysis component was far too heavy (at least at the time of running) to derive a posterior density eciently on the y. 5.6.3 Weak classier construction We dene S(v) as the distribution for the signal of variable v, and similarly B(v) is the distribution of the background for variable v. For a given value of the variable v, compute the 70 th percentile of S(v), call it v cut and nd the percentile corresponding to that same value in B(v). If the percentile in B(v cut )< 70 then we center v cut (v 0 =vv cut ) and re ect across the vertical axis (v 00 =v 0 ) so thatS(v 00 )>B(v 00 ) forv 00 > 0, and thus the regionv 00 > 0 is predominantly signal andv 00 < 0 is predominantly background. If the percentile inB(v cut )> 70 then we compute the 30 th percentile ofS(v) (a newv cut ) and if B(v cut )> 30 then we center at the newv cut and do not re ect across the vertical axis, as it already satises the requirement that S(v 00 ) > B(v 00 ) for v 00 > 0. If neither of these conditions is satised, we reject the variable as unsuitable for weak classier construction. Now determine the 10 th percentile for S(v) and the 90 th forB(v) and call themv +1 andv 1 respectively. The weak classier is as follows: c(v) = 8 > > > > > > > > < > > > > > > > > : +1 if v +1 <v 00 (v) v 00 (v) v +1 if 0<v 00 (v)<v +1 v 00 (v) jv 1 j if v 1 <v 00 (v)< 0 1 if v 00 (v)<v 1 (5.1) By construction, c(v) has all of the properties we seek in a weak classier. Now, since this procedure removes information about the tails of the distribu- tions and does not take into account correlations between our kinematical vari- 152 ables, we add products and ratios of the kinematical variables to our description. If we had to ip the distribution for variable i deneg i = 1 vi , otherwiseg i =v i . We then add all functions of the form p(g i ;g j ) =g i g j , and perform the weak classier construction on these combinations. 5.6.4 Mapping weak classier selection to the Ising prob- lem In this section we closely follow Ref. [137], with slight changes of notation. LetV be the event space, consisting of vectorsf~ xg that are either signal or background. We dene a weak classier c i (~ x) :V 7!R, i = 1;:::N, as classifying event ~ x as signal (background) if c i (~ x)> 0 (c i (~ x)< 0). We normalize each weak classier so thatjc i j 1=N. We introduce a binary weights vector ~ w2f0; 1g N and con- struct a strong classierR ~ w (~ x) = P i w i c i (~ x)2 [k~ wk=N;k~ wk=N]. The event~ x is correspondingly classied as signal (background) if R ~ w (~ x)> 0 (R ~ w (~ x)< 0). The weights ~ w are to be determined; they are the target of the solution of the Ising problem. LetT =f~ x ;y g denote a given set of training events, where ~ x is an event vector collecting the values of each of the variables we use, and y =1 is a binary label for whether ~ x is signal (+1) or background (1). Let Q ~ w (~ x) = sign[R ~ w (~ x)], so that Q ~ w (~ x) = +1 (1) denotes signal (background) event clas- sication. Thus y Q ~ w (~ x ) = +1 if ~ x is correctly classied as signal or back- ground (y andQ ~ w (~ x ) agree), andy Q ~ w (~ x ) =1 if~ x is incorrectly classied (y andQ ~ w (~ x ) disagree). The cost functionL(~ w) = P [1y Q ~ w (~ x )]=2 thus counts the number of incorrectly classied training events, and minimizing it over all possible weight vectors returns the optimal set of weights, and hence the optimal strong classier given the training setT . To avoid overtraining and economize on the number of weak classiers used, we can introduce a penalty term proportional to the number of weights, i.e., k~ wk, where > 0 is the penalty strength. Thus the optimal set of weights for given is ~ w opt = argmin f~ wg [L(~ w) +k~ wk]: (5.2) This optimization problem cannot be directly mapped onto a quantum an- nealer, due to the appearance of the sign function. Instead we next introduce a relaxation to a quadratic form that is implementable on the current gener- ation of D-Wave devices. Namely, using the training set we form the vector of strong classier results ~ R ~ w =fR ~ w (~ x )g jTj =1 , the Euclidean distance measure (~ w) =k~ y ~ R ~ w k 2 between the strong classier and the set of training labels, and replace Eq. (5.2) by ~ w min = argmin f~ wg (~ w): (5.3) Finding ~ w opt in this way is equivalent to solving a quadratic unconstrained 153 binary optimization (QUBO) problem: ~ w min = argmin f~ wg " X R ~ w (~ x ) 2 2y R ~ w (~ x ) +y 2 # (5.4) = argmin f~ wg 2 4 X X i;j w i w j c i (~ x )c j (~ x ) 2y X i w i c i (~ x ) +jTj 3 5 (5.5) Regrouping the terms in the sum and dropping the constant we nd: ~ w min = argmin f~ wg 2 4 X i;j w i w j X c i (~ x )c j (~ x ) ! 2 X i w i X c i (~ x )y ! 3 5 (5.6) = argmin f~ wg 2 4 X i;j C ij w i w j 2 X i C i w i 3 5 (5.7) where C ij = P c i (~ x )c j (~ x ) =C ji and C i = P c i (~ x )y . This has a tendency to overtrain. The reason is thatjR ~ w (~ x )jk~ wk=N, so thatjy R ~ w (~ x )j 2 (1k~ wk=N) 2 , and hence (~ w) = P jy R ~ w (~ x )j 2 jTj(1k~ wk=N) 2 . To minimize (~ w) the solution will be biased toward making k~ wk as large as possible, i.e., to include as many weak classiers as possible. To counteract this overtraining tendency we add a penalty term that makes the distance larger in proportion tok~ wk, i.e.,k~ wk with> 0, just as in Eq. (5.2). Thus we replace Eq. (5.7) by ~ w min = argmin f~ wg 2 4 X i;j C ij w i w j + X i ( 2C i )w i 3 5 ; (5.8) The last step is to convert this QUBO into an Ising problem by changing the binary w i into spin variables s i =1, i.e., w i = (s i + 1)=2, resulting in: ~ s min = argmin f~ sg 2 4 1 4 X i;j C ij s i s j + 1 2 X i;j C ij s i + 1 2 X i ( 2C i )s i 3 5 ; (5.9) where we use the symmetry of C ij to write the middle term in the second line, and we drop the constant terms 1 4 P i;j C ij and 1 2 P i ( 2C i ). We now dene the couplings J ij = 1 4 C ij and the local elds h i = 1 2 2C i + P j C ij . The optimization problem is then equivalent to nding the ground state ~ s min = argmin f~ sg H of the Ising Hamiltonian H Ising = N X i<j J ij s i s j + N X i=1 h i s i : (5.10) 154 In the main text and hereafter, when we refer to it is measured in units of max i (C i ) (e.g., = 0:05 is shorthand for = 0:05 max i (C i )). 5.6.5 Instances and variable inclusion We use 8 kinematical variables, listed in Table 5.1. They involve functions of the individual and diphoton mass, as well as the angles of the photons and the diphoton system. Taking the products between them, we get 36 products, with all of them passing the weak classier construction procedure for the vast majority of the training sets. These 36 weak classiers (or a subset thereof) are the set from which we built our strong classiers. For each size of training set in [100; 1000; 5000; 10000; 15000; 20000], we generated 20 training sets and generated the corresponding Ising problem for = 0:05. In order to compare the performance of SA and QA, we estimate ground state solution of these Ising problems by running SA for a large number of sweeps (10 4 ) with a low nal temperature (0:1 in normalized energy units). Data availability: The data that support the ndings of this study are available from the corresponding author upon reasonable request. Source data for all gures are provided with the paper. 5.7 Additional background and pedagogy 5.7.1 DNN and XGB optimization procedure We benchmark the performance of QAML against DNN and XGB. We train a DNN using Keras [138] with the Theano backend,[147] a standard tool in deep learning and increasingly popular in high energy physics. Our network has two fully connected hidden layers with 1000 nodes each. The model is optimized using the Adam algorithm [160] with a learning rate of 0:001 and a mini-batch size of 10. We nd that network performance is not aected by small changes in the number of nodes or the initial guesses for the weights. The model hyperparameters, regularization terms, and optimization parameters for our deep neural net are selected using the Spearmint Bayesian optimization software [161, 162]. Early stopping is used (with patience parameter 10) to avoid overtraining and have sucient generalization. We also train an ensemble of boosted decision trees using XGB [139] with a maximum depth of 10, a learning rate of 0:3, and L2-regularization parameter = 2000. To train and optimize XGB, we use 100 rounds of training and start with the default choices for the various parameters. We evaluate values of the learning rate 2f1 10 i ; 2 10 i ; 3 10 i ; 5 10 i ; 8 10 i g for i2f1;2;3g at tree depths of 5, 8, 10, 12, 15, and 20. Some of these parameters give small improvements in AUC over the defaults at value of the 155 L2-regularization parameter = 1. Far larger improvements are found when is increased. Hence we hold the other parameters xed and evaluate 2f5; 10; 20; 50; 100; 200; 500; 1000; 1500; 1800; 2000; 2200; 2500g , nding the approximate optimum AUC on the test set at 2000. Testing again, the tree depth and are found to have minimal eect on the AUC (signi- cantly smaller than the error), and = 0:3 and tree depth 10 are chosen as the approximate optimum. We note that the DNN and XGB settings are selected so as to prevent overtraining. 5.7.2 Robustness of QAML to MCMC mismodelling Two essential steps are involved in the construction of the weak classiers in our approach. First, we remove information about the tails of the distributions of each variable and use the corresponding truncated single-variable distribu- tions to construct weak classiers. Second, since the single-variable classiers do not include any correlations between variables, we include additional weak classiers built from the products/ratios of the variables, where after taking the products/ratios we again apply the same truncation and remove tails. That is, our weak classiers account only for one and two-point correlations and ignore all higher order correlations in the kinematic variable distribution. The particu- lar truncation choice to dene the weak classiers as a piecewise linear function dened only by a central percentile (30th or 70th, chosen during construction) and two percentiles in the tails (10th and 90th) means that the MC simulations only have to approximately estimate those four percentiles of the marginals and the correlations between the variables. Any MC simulation which is unable to approximate the 10th, 30th, 70th, and 90th percentiles of the marginal distri- bution for each dimension of the dataset and the products between them would surely not be considered acceptably similar to the target distribution for use in HEP data analyses, as it is eectively guaranteed to be wrong in the higher order correlations and thus in its approximation of the true distribution. Meanwhile, typical machine learning approaches for this problem use arbitrary relationships across the entire training dataset, including the tails and high-order correlations, and so are likely to be more sensitive to any mismodelling. 5.7.3 Quantum annealing and D-Wave Current and near-generation quantum annealers are naturally run in a batch mode in which one draws many samples from a single Hamiltonian. Repeated draws for QA are fast. The DW averages approximately 5000 samples per sec- ond under optimal conditions. We take advantage of this by keeping all the trial strong classiers returned and not restricting to the one with minimum energy. 2 The DW has 1098 superconducting Josephson junction ux qubits ar- 2 The energy is eectively a function of error on the training set of the weak classiers, hence is distinct from the measures used to directly judge classier performance, such as the 156 ranged into a grid, with couplers between the qubits in the form shown in Fig. 5.8, known as the Chimera graph. The annealing schedule used in the DW processor is given in Fig. 5.9. The Chimera graph is not fully connected, a recognized limitation as the Ising Hamiltonian (5.10) is fully connected, in gen- eral. To address this, we perform a minor embedding operation [33, 13]. Minor embedding is the process whereby we map a single logical qubit in H Ising into a physical ferromagnetic (J ij =1) chain of qubits on DW. For each instance we use a heuristic embedding found via the D-Wave API, that is as regular and space-ecient as possible for our problem sizes. Given a minor embedding map of logical qubits into a chain of physical qubits, we divide the local elds h i equally among all the qubits making up the chain for logical qubit i, and divide J ij equally among all the physical couplings between the chains making up logical qubits i and j. After this procedure, there remains a nal degree of freedom: the chain strength J F . If the strength of the couplers in the ferromagnetic chains making up logical qubits is dened to be 1, then the maximum magnitude of any other coupler is max max i (fjh i jg); max i;j (fjJ ij jg) = 1 J F . There is an optimal value ofJ F , gen- erally. This is due to a competition between the chain needing to behave as a sin- gle large qubit and the problem Hamiltonian needing to drive the dynamics[83]. If J F is very large, the chains will \freeze out" long before the logical problem, i.e., the chains will be far stronger than the problem early on, and the trans- verse eld terms will be unable to induce the large, multi-qubit ipping events necessary to explore the logical problem space. Similarly, ifJ F is very weak, the chains will be broken (i.e., develop a kink or domain wall) by tension induced by the problem, or by thermal excitations, and so the system will generally not nd very good solutions. Ideally, one wants the chains and the logical prob- lem to freeze at the same time, so that at the critical moment in the evolution both constraints act simultaneously to determine the dynamics. For the results shown here, we used J F = 6 with an annealing time t a = 5s. To deal with broken chains, we use majority vote on the chain with a coin-toss tie-breaker for even length chains. Detailed analysis of the performance of this strategy in the context of error correction can be found in the literature [163, 164]. Figure 5.10 shows the average minimum energy returned by the DW, rescaled by the training size (to remove a linear scaling), as a function of the chain strength and training size. We see that the smallest training size (N = 100) has a smaller average minimum energy than the rest of the training sizes, and that there is only a very slight downward tendency as the chain strength J F increases for the larger training sizes. Figure 5.11 plots the fractional deviation of the minimum energy returned by the DW relative to the true ground state energy, averaged over the training sets. While the DW's minimum energy returned approaches the true ground state, it seems to converge to 5% (i.e. f 0:05) above the ground state as we increase the chain strength, for all training sizes 1000. In this case, we were not able to nd the optimal chain strength in a reasonable range of chain area under the ROC curve. 157 strengths, and instead simply took the best we found, J F = 6. As discussed in Sec. 5.7.5, the DW processor suers from noise sources on the couplers and thermal uctuations, and it seems that this poses signicant challenges for the performance of the quantum annealer. It is possible that even larger chain strengths may resolve the issue, but given the convergence visible in Fig. 5.11, it seems likely that J F = 6 is already near the optimum. 5.7.4 Simulated annealing Our simulations used init = 0:1 and nal = 5, and used a linear anneal- ing schedule (i.e., if we perform S sweeps, we increase after each sweep by final init S ). These parameters have generally performed well in other studies.[5] All SA data in the main text and presented here is at 1000 sweeps, however we also tested SA at 100 sweeps, and found a negligible dierence in overall performance, as seen in Fig. 5.12, where the integrated dierence of the ROC curves is found to be statistically indistinguishable from 0. 5.7.5 Eect of noise on processor Internal control error (ICE) on the current generation of D-Wave processors is eectively modeled as a Gaussian centered on the problem-specied value of each coupler and local eld, with standard deviation 0:025, i.e., a coupler J ij is realized as a value drawn from the distribution N(J ij ; 0:025) when one programs a Hamiltonian. Figure 5.13 contains a histogram of the ideal values of the embedded couplers corresponding to connections between logical qubits across all 20 problem instances of 36 variables at 20000 training events. One can see that the ideal distribution has some structure, with two peaks. However, if one resamples values from the Gaussian distribution induced by ICE, one nds that many of the features are washed out completely. This suggests that the explanation for the attening out of the performance of QA as a function of training size (recall Fig. 5.3 in the main text) is due to this noise issue. Thus we investigate this next. Figures 5.14-5.16 tell the story of the scaling of the couplers with training size. Figure 5.14 shows linear scaling of the maximum Hamiltonian coecient with training size. We observe wider variation at the smallest training sizes, but overall the precision scales linearly with training size. This is conrmed in Fig- ure 5.15, which shows the maximum coecient normalized by the training size. Since this value is constant for suciently large training sizes, the maximum value scales linearly with size. At rst glance, this indeed suggests an explana- tion for why the performance of QA using the DW levels o as a function of training size: the coupling values pass 20 (half the scale of the errors which is 1=0:025 = 40). However, absolute numbers are not necessarily informative, and Fig. 5.16 dispels this explanation. Figure 5.16 shows the ratio of the median coecient to the maximum coef- cient, thereby showing the scale of typical Hamiltonian coecients on the DW 158 prior to rescaling for chain strength (which for the most of the data here would reduce the magnitude by a further factor of 6). Since all the dierent types of coecient ratios are constant with system size, we have eectively no scaling with training size of the precision of the couplers. This means that the scaling of precision with training size cannot explain the saturation of performance with increasing training size. However, the magnitudes here are quite small, and so once one accounts for rescaling the energies, typical couplers are expected to be subject to a signicant amount of noise, even causing them to change sign. This eect likely explains, at least in part, the diculties the DW has in nding the true ground state, as discussed above and seen in Fig. 5.11, where even at the largest chain strength we still nd that the DW's typical minimum energy is 5% above the ground state energy. 5.7.6 Sensitivity to variation of the parameters of weak classier construction When constructing the weak classiers, we choose to dene v cut as the 70th percentile of the signal distribution. This choice is arbitrary. To test the eect of this value on the classier performance we use identical training sets and values of both 60% and 80% and compare them to our primary estimate of 70%. The results for both the minimum energy returned (f = 0) and f = 0:05 for each are shown in Fig. 5.17. Note that every training set has the same ground state conguration at 70% and 80%. The ROCs and AUROC are then invariant across a wide range ofv cut values. Figure 5.7 reproduces Fig. 5.4 from the main text, but also shows the AU- ROC for SA's optimal classier (by energy) for various values of the regulariza- tion parameter . We nd no signicant variation, with the major features of SA being stable, namely the advantage at small training size and the saturation at around an AUROC of 0:64. 5.7.7 Dierence between ROC curves plots We show dierences between ROC curves for various algorithms in Figs. 5.18- 5.23. These form the basis for Fig. 5.4 in the main text, which gives the integral of the dierence over signal eciency. Figures 5.18 and 5.19 show the dier- ence in background rejection r DW B r SA B as a function of the signal eciency for f = 0 and f = 0:05, respectively. For f = 0 DW and SA are indistinguish- able to within experimental error. For f = 0:05 SA slightly outperforms DW in the range of low signal eciencies for training sizes 5000. The primary conclusion to draw from these plots is that SA diers from DW by roughly one standard deviation or less across the whole range, even though DW for training sizes larger than 100 struggles to nd states within less than 5% of the ground state energy. This suggests a robustness of QAML, which (if it generalizes to 159 other problems) signicantly improves the potential to exploit physical quan- tum annealers to solve machine learning problems and achieve close-to-optimal classier performance, even in the presence of signicant processor noise. Figures 5.20 and 5.21 show the ROC dierence between DW and DNN and DW and XGB atf = 0, respectively. The two cases have broadly similar shapes. One clearly sees that QAML on DW outperforms DNN and XGB at the smallest training size in a statistically signicant manner, but that the trend reverses for sizes 5000. Note, that at the scale of these diagrams, the gap between f = 0 and f = 0:05 is negligible. Figures 5.22 (SA) and 5.23 (DW) show the dierence between f = 0 and f = 0:05. SA and DW exhibit broadly similar behavior, with an improvement with excited states of 0:4% in background rejection for SA and of 0:2% for DW. The improvement increases with training size and is slightly larger for SA than DW (though this dierence is likely simply noise, as it is less than half the standard deviation of each distribution). It should be noted that since QAML's comparative advantage against other techniques appears to be in the realm of small training sizes. However, this is the same range where including excited states has no benet. 160 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 138 139 140 141 142 143 144 145 146 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 284 285 286 287 288 289 290 291 292 293 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 315 317 318 319 320 321 322 323 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 430 431 432 433 434 436 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 463 464 465 466 467 468 470 472 473 474 475 476 477 478 479 480 481 482 483 484 485 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 565 566 567 568 570 571 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 683 684 685 687 688 689 690 691 692 693 694 695 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 722 723 724 726 727 728 729 730 731 732 733 734 735 736 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 812 813 814 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 909 910 912 913 914 915 916 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 991 992 994 995 996 997 998 999 1001 1002 1003 1005 1006 1007 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 22 49 137 147 174 196 197 198 199 227 283 294 314 316 324 352 429 435 437 462 469 471 486 512 564 569 572 573 682 686 696 721 725 737 762 811 815 840 907 908 911 917 918 919 968 990 993 1000 1004 1008 1046 1079 1104 1128 Figure 5.8: An 1152 qubit Chimera graph, partitioned into a 12 12 array of 8-qubit unit cells, each unit cell being aK 4;4 bipartite graph. Inactive qubits are marked in red, active qubits in green. There are a total of 1098 active qubits in the DW processor used in our experiments. Black lines denote active couplers. 161 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 30 35 t/t f Annealing schedule (GHz) A(t) B(t) k B T Figure 5.9: Annealing schedule used in our experiments. 0 1 2 3 4 5 6 7 chain strength, J F 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0.02 min. energy returned rescaled by training size N 100 1000 5000 10000 15000 20000 Figure 5.10: A plot of the minimum energy returned by the DW as a function of chain strength, rescaled by the number of training samples. I.e., for training sizeN, we plotE m =N for minimum return energyE m , whereN is given in the legend. 162 0 1 2 3 4 5 6 7 chain strength, J F 0.00 0.05 0.10 0.15 0.20 0.25 0.30 min. fractional residual energy N 100 1000 5000 10000 15000 20000 Figure 5.11: Plot of (E m E 0 )=E 0 for minimum energy returned E m and true ground state energy E 0 , i.e., the minimum fractional reserve energy, averaged over the training sets, for each size and chain strength. 163 100 1000 5000 10000 15000 20000 Training size 0.0020 0.0015 0.0010 0.0005 0.0000 0.0005 0.0010 0.0015 Integral of ROC difference: SA - SA100 f = 0 f = 0. 01 f = 0. 02 f = 0. 03 f = 0. 04 f = 0. 05 Figure 5.12: The integral of the dierence of the ROC curves, i.e., the area between the ROC curves, for SA and SA100 for various thresholds of the energy and training size. SA at 100 and 1000 sweeps are eectively identical by this benchmark. 164 -0.2 -0.1 0.0 0.1 0.2 local bias, h 0 100 200 300 count ideal noisy (a) -0.1 0.0 0.1 coupler values, J 0 25 50 75 count ideal noisy (b) Figure 5.13: Histograms for the true (peaked) distribution of local biases and couplers, and the same distribution subject to point-wise Gaussian noise with zero mean and standard deviation 0:025, which is approximately the magnitude of errors on the DW couplers. 165 0 5000 10000 15000 20000 training size 0 20 40 maximum Hamiltonian coef. coupler, J local bias, h Figure 5.14: The maximum local bias and coupler term in the Hamiltonian across training sizes and training sets. 0 5000 10000 15000 20000 training size 0.000 0.005 0.010 normalized maximum Hamiltonian coef. coupler, J local bias, h Figure 5.15: The maximum local bias and coupler term in the Hamiltonian across training sizes and training sets, normalized by the number of events in the training set. This makes it clear that the scaling of the Hamiltonian coecients is linear in the training size, for training sizes 5000. 166 0 5000 10000 15000 20000 training size 0.00 0.25 0.50 normalized maximum Hamiltonian coef. coupler, J all coefficients local bias, h Figure 5.16: The ratio of the median coecient by the maximum coecient for the non-zero local biases, couplers, and both taken together. 167 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.0006 0.0004 0.0002 0.0000 0.0002 0.0004 0.0006 0.0008 Delta Background Rejection SA f =0, center at 0.7 - center at 0.6 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.010 0.005 0.000 0.005 0.010 Delta Background Rejection SA f =0, center at 0.7 - center at 0.8 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.006 0.004 0.002 0.000 0.002 0.004 0.006 Delta Background Rejection SA f =0.05, center at 0.7 - center at 0.6 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.006 0.004 0.002 0.000 0.002 0.004 0.006 Delta Background Rejection SA f =0.05, center at 0.7 - center at 0.8 Figure 5.17: Dierence between the ROC curve for SA at v cut at the xth per- centile during weak classier construction and the curve using theyth percentile during the same for the ground state conguration. In order of presentation (a) x = 70, y = 60, f = 0. (b) x = 70, y = 80, f = 0. (c) x = 70, y = 60, f = 0:05. (d) x = 70, y = 80, f = 0:05. 168 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.010 0.005 0.000 0.005 0.010 0.015 Delta Background Rejection f = 0. 00, training size = 100 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.030 0.025 0.020 0.015 0.010 0.005 0.000 0.005 0.010 0.015 Delta Background Rejection f = 0. 00, training size = 1000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.004 0.002 0.000 0.002 0.004 Delta Background Rejection f = 0. 00, training size = 5000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.006 0.004 0.002 0.000 0.002 0.004 Delta Background Rejection f = 0. 00, training size = 10000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.008 0.006 0.004 0.002 0.000 0.002 0.004 0.006 Delta Background Rejection f = 0. 00, training size = 15000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.008 0.006 0.004 0.002 0.000 0.002 0.004 Delta Background Rejection f = 0. 00, training size = 20000 DW-SA at f = 0. 00 Figure 5.18: Dierence between the ROC curves for SA and DW using the minimum energy returned. 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.04 0.03 0.02 0.01 0.00 0.01 0.02 0.03 Delta Background Rejection training size = 100 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.025 0.020 0.015 0.010 0.005 0.000 0.005 0.010 0.015 Delta Background Rejection training size = 1000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.008 0.006 0.004 0.002 0.000 0.002 0.004 0.006 Delta Background Rejection training size = 5000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.010 0.008 0.006 0.004 0.002 0.000 0.002 0.004 Delta Background Rejection training size = 10000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.008 0.006 0.004 0.002 0.000 0.002 0.004 Delta Background Rejection training size = 15000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.008 0.006 0.004 0.002 0.000 0.002 0.004 Delta Background Rejection training size = 20000 DW-SA at f = 0. 05 Figure 5.19: Dierence between the ROC curves for SA and DW using all states within 5% of the minimum return energy. 169 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Delta Background Rejection f = 0. 00, training size = 100 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.03 0.02 0.01 0.00 0.01 0.02 0.03 0.04 Delta Background Rejection f = 0. 00, training size = 1000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.05 0.04 0.03 0.02 0.01 0.00 0.01 Delta Background Rejection f = 0. 00, training size = 5000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.05 0.04 0.03 0.02 0.01 0.00 0.01 Delta Background Rejection f = 0. 00, training size = 10000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.05 0.04 0.03 0.02 0.01 0.00 0.01 Delta Background Rejection f = 0. 00, training size = 15000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.05 0.04 0.03 0.02 0.01 0.00 0.01 Delta Background Rejection f = 0. 00, training size = 20000 DW-DNN at f = 0. 00 Figure 5.20: Dierence between the ROC curves for DW and DNN using the minimum energy conguration from DW. 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Delta Background Rejection f = 0. 00, training size = 100 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.04 0.03 0.02 0.01 0.00 0.01 0.02 0.03 0.04 Delta Background Rejection f = 0. 00, training size = 1000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.05 0.04 0.03 0.02 0.01 0.00 0.01 Delta Background Rejection f = 0. 00, training size = 5000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.05 0.04 0.03 0.02 0.01 0.00 0.01 Delta Background Rejection f = 0. 00, training size = 10000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.05 0.04 0.03 0.02 0.01 0.00 0.01 Delta Background Rejection f = 0. 00, training size = 15000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.05 0.04 0.03 0.02 0.01 0.00 0.01 Delta Background Rejection f = 0. 00, training size = 20000 DW-XGB at f = 0. 00 Figure 5.21: Dierence between the ROC curves for DW and XGB using the minimum energy conguration from DW. 170 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.04 0.03 0.02 0.01 0.00 0.01 0.02 0.03 Delta Background Rejection training size = 100 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.015 0.010 0.005 0.000 0.005 0.010 0.015 Delta Background Rejection training size = 1000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.002 0.000 0.002 0.004 0.006 0.008 0.010 Delta Background Rejection training size = 5000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.002 0.000 0.002 0.004 0.006 0.008 0.010 Delta Background Rejection training size = 10000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.001 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 Delta Background Rejection training size = 15000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.001 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 Delta Background Rejection training size = 20000 SA at f = 0. 05 - SA at f = 0. 00 Figure 5.22: Dierence between the ROC curves between the true ground state conguration and the f = 0:05 composite classier from SA. 171 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.04 0.03 0.02 0.01 0.00 0.01 0.02 0.03 Delta Background Rejection training size = 100 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.010 0.005 0.000 0.005 0.010 0.015 0.020 Delta Background Rejection training size = 1000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.003 0.002 0.001 0.000 0.001 0.002 0.003 0.004 0.005 0.006 Delta Background Rejection training size = 5000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.002 0.001 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 Delta Background Rejection training size = 10000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.002 0.000 0.002 0.004 0.006 0.008 0.010 Delta Background Rejection training size = 15000 0.0 0.2 0.4 0.6 0.8 1.0 Signal Efficiency 0.002 0.000 0.002 0.004 0.006 0.008 0.010 Delta Background Rejection training size = 20000 DW at f = 0. 05 - DW at f = 0. 00 Figure 5.23: Dierence between the ROC curves between the minimum energy state returned by DW and the f = 0:05 composite classier from DW. 172 Chapter 6 Conclusions: Principles of benchmarking Rather than simply reiterate each chapter that has come before, in this con- cluding chapter I will instead try to simply state a set of basic principles and lessons that have been learned by the community and that we have encoun- tered in the previous chapters that cover benchmarking quantum annealers in application settings. Before we do that, however, I wish to introduce one last, short, review of the progress that has been made in the eld of benchmarking quantum annealers, by those other than myself. 6.1 Recent progress An interesting critique of the scaling results presented in chapter 2 was made in Ref. [93], which argued that random Ising instances restricted to the Chimera graph are \too easy", essentially since their phase space exhibits only a zero- temperature transition. This would imply that classical thermal algorithms such as SA see a simple energy landscape with a single global minimum throughout the entire anneal (except perhaps at the very end as the simulation temperature is lowered to near zero), instead of the usual glassy landscape with many local traps associated with hard optimization problems. This work highlighted the importance of a careful design of benchmark problems, to ensure that classi- cal solvers would not nd them trivial. Of course, it should be stressed that quantum speedup is always relative, and it can be observed even when e- cient (polynomial-time) classical algorithms exist, as in, e.g., the solution of linear systems of equations [165]. In light of this one may interpret the message of Ref. [93] to mean that a quantum speedup might not be detectable over a nite range of problem sizes if the problem is classically easy, since the dier- ence between the quantum and classical scaling is too subtle to be statistically signicant. Before we turn to a discussion of the evidence for a limited quantum speedup, 173 we rst brie y discuss alternatives to the TTS as a performance measure. One such alternative is the time-to-target (TTT), i.e., the total time required by a solver to reach the target energy at least once with a desired probability, assuming each run takes a xed time [81]. This reduces to the TTS if the target is the ground state. A unied approach that includes a variety of other measures was presented in Ref. [82], drawing upon optimal stopping theory, specically the so-called \house-selling" problem [166]. Within this framework one answers the question of how long, given a particular cost for each sample drawn from a solver, one should sample in order to maximize one's reward, analogously to the decision problem about when to sell one's house given that bids accrue over time but that waiting longer carries a higher monetary cost. This allows the TTS and TTT, among other measures, to be shown to be specic choices of the cost and reward functions. The optimal stopping framework also paves the way for a more detailed comparison between quantum and classical approaches and the tradeos of each, as by altering the cost per sample one can see the impact of the distribution over states (rather than just the ground state) for the various solvers. Optimal stopping is appropriate for applications where nding the minimum energy is not strictly the most important consideration for the application, such as many machine learning contexts and even various business- origin optimization problems. In those cases, there is a tradeo between the cost to perform the computation and the benet from receiving a result. Tests were performed demonstrating the optimal stopping approach with a DW2X device (with 1098 qubits) on frustrated loop problems much like those in Ref. [5], demonstrating identical scaling (modulo concerns about the lack of an optimal annealing time) to the HFS algorithm at multiple values of the cost to draw a sample, an improvement over the DW2. However, these results could still not qualify as a limited quantum speedup due to the problem of suboptimal annealing times. This problem was nally overcome in Ref. [167], which for the rst time demonstrates an optimal annealing time, and can thus make positive claims about limited quantum speedup. Previous studies could not nd an optimal an- nealing time since a class of problem instances had not been identied for which the shortest available annealing time (20s in the DW2, 5s in all other D-Wave devices) was suciently short to observe on optimum given the largest problem size that could be tested. Using the D-Wave 2000Q (DW2KQ) device (with 2027 qubits) Ref. [167] demonstrated a simple one-unit cell gadget Hamiltonian which, when added randomly to a constant fraction of the unit cells on top of similar frustrated loop problems as in Ref. [168], resulted in the observation of an optimal annealing time for frustrated loops dened on the hardware graph (also when using the DW2X device), as well as for frustrated loops dened on the logical graph of unit cells (each unit cell then being bound together tightly as a pseudo-spin in the physical problem, modulo the gadget Hamiltonian). For the latter, logical-planted instances, the DW2KQ exhibited a statistically signif- icant scaling advantage over both single-spin- ip SA and SVMC. These results amount to the rst observation of a limited quantum speedup, since the existence of an optimal annealing time was certied. However, this did not amount to an 174 unqualied quantum speedup since the DW2KQ's scaling was worse than the HFS algorithm, unit-cell cluster- ip SA, and SQA, which was found to have the best scaling. Nevertheless, this result paves the way towards future demonstra- tions of problems with optimal annealing times and hence certiable scaling, a necessary requirement for any type of scaling speedup claim. However, even this may not be sucient since other quantities remain that must eventually be op- timized, such as the annealing schedule, which is known to play a crucial role in provable quantum speedups (specically the Grover search problem [169, 170]), and can conversely be used to potentially overturn (limited) quantum speedup claims. This provides a reasonable update of the status of benchmarking these sys- tems, and is included here for completeness. 6.2 Guidelines for benchmarking quantum an- nealing and related noisy quantum compu- tational devices Now, nally, let us turn to a statement of principles, of sorts, for benchmarking quantum annealers. Resource use comparisons It is vitally important to carefully account for resource use, lest one be led astray with a fake speedup. As was discussed in the chapters 2 and 4, classical resources should, in general, scale at least linearly with the system size of one's quantum computer. This is especially true for annealing type devices, as we have seen that annealing type algorithms are often their biggest competitors and those algorithms are typically very parallelizable. While this can be expected to ward o only linear or near-linear polynomial advantages for annealers, as was seen in chapter 4, this can be decisive, particularly in the case of non-asymptotic problem sizes. Parameter optimization It is also necessary to optimize the parameters of all solvers as best one can to be able to make any potential claims about speedup or advantage of one over an- other. In particular, quantum annealing requires a demonstration of an optimal annealing time for a xed schedule before any denitive conclusion can be drawn about a quantum speedup. We saw this in chapter 2 empirically, and there is further proof (with a proof) of this in chapter 4. More generally, optimizing all known free parameters is almost certainly necessary to demonstrate a quantum speedup which will hold up to scrutiny. Ultimately, if there are free parameters of your various algorithms that have not been optimized, one is not able to make any hard claims about performance. Simulated annealing with an arbitrarily 175 chosen number of sweeps is obviously going to fail, in general. Parallel tem- pering with a poorly chosen temperature spacing will also fail. In classication contexts, using suboptimal hyperparemeters of the learning algorithm will yield poor results. While it is true that for many algorithms one is almost always simply unable to properly optimize all the free parameters due to their enormous number (parallel tempering is an example), one should still choose best-in-class and/or standard methods for selecting said parameters. If a practioner is unfamiliar with such methods, as was the case for my work on the Higgs problem in chapter 5 for XGBoost and DNNs, one should both familiarize oneself as well as consult subject matter/application area experts (as was done in that case). In general, if one leaves free parameters unoptimized or only partially optimized, one can only make lower-bound claims on whatever performance metrics are of interest. This leads to... Distinguish between types of quantum speedup and take care in al- gorithm choice One must distinguish between dierent types of quantum speedup. Compar- isons between a quantum computational device and a single other solver are inherently limited to a demonstration of a \potential quantum speedup". In general, studies of this kind (comparing against only single solvers) have little value to the broader community, as it is simply far too easy to select a \broken" algorithm that is clearly suboptimal for the task at hand. An example is using single-spin ip SA in problems with clusters, as was done in [76], though there have been any number of similar such experiments. The only exception to this is if one is comparing against an algorithm already considered to be best-in-class (for Chimera-structured problems, that would likely be the HFS algorithm, or perhaps PT with isoenergetic cluster moves), and even then one should still, for the community's sake, study other potentially competing algorithms. To go further, one must be sure to compare performance against a suite of algorithms, in particular those that mimic the device to some degree (such as SA or SQA). A speedup against such solvers would be considered a \limited quantum speedup". (Note: This was also done in [76], as they did compare against PIMC, and did not nd any evidence of speedup.) PIMC or some other variant of quantum Monte Carlo is vital in cases of potential quantum speedup, as it has been found to often correlate well with quantum annealers (see references [15, 4] for instance). Finally, if there is a consensus about the solvers that are the best at the original algorithmic task, then a speedup against such solvers would be considered an unqualied \quantum speedup". This would be a game-changing result, but as yet has never been discovered in this space. Analyze full pipelines As was done in chapter 5, one should compare one's quantum algorithm not only with classical alternatives to one's calls to the quantum device itself, but 176 also to the best available classical methods of solving the problem. In that chapter, had we merely introduced QAML and compared DW to SA as a way of solving/sampling from the corresponding Ising Hamiltonian, the study would be been entirely pointless | we would have had no way of knowing if our performance was good or bad, as we would not have had anything to compare QAML, itself, to. Similarly, had we merely compared raw QAML with DW as a solver with DNNs and XGBoost, we would have been unable to make any statements about whether it was the quantum annealer, or merely the QAML algorithm, that produced our performance results. This is a general problem { merely mapping a problem into an Ising model and solving it is insucient, unless one both tests replacements for the quantum device and algorithms for the original problem class (whether the class be solving Ising models [4], solving SAT problems[16], graph coloring[171], job shop scheduling[171], etc.). In general, this is another case where consultation between our community and subject matter/application area experts is key for rigorous and useful benchmarking studies. Gauge-averaging Users of such quantum computational devices should perform something akin to gauge averaging in order to eectively estimate performance, by averaging over many dierent mappings from the logical problem to physical states, at least so long as the devices are not fully error corrected. Nuisance parameters, local biases, Hamiltonian dependent interactions, etc. are all expected to continue to be an issue long into the future, in the absence of error correction, and as long as that is true, the use of gauges (ie sampling over eciently applicable maps from the logical problem to physical states) is key. Given that there is typically no good distribution for problem hardness as a function of this ensemble of mappings, nonparameteric techniques are appropriate, as was described in detail in 3. This leads to... Take your state of knowledge seriously As was discussed in 3, and again to a degree in 5, practioners should think care- fully about what they want to learn from their experiment (time to solution, probability of success, order parameters of TTS over an instance set, classier performance, etc.) and how the results of the experiment eect their knowl- edge. As stated above, for things like estimating TTS, the focus of chapter 3, a theoretically well-founded and simple nonparametric approach is the Bayesian bootstrap over gauges, which may be readily extended to estimates of order pa- rameters of TTS over instances. In other cases, something like the reweighting of our very large event test set with Poisson weights is more appropriate as it simulates the physical process which generates the statistical error. Moreover, as stated in 3, by taking one's state of knowledge seriously, one can then read- ily employ optional stopping with a terminating condition on one's posterior density coecient of variation to, at times, save enormous amounts of resources 177 (as demonstrated in the aforementioned chapters simulation studies, an order of magnitude or more). I'll also refer the reader to the work by Vinci & Lidar [82] on optimal stopping for cases where one can dene meaningful cost and reward functions (typically, I expect this will be limited to real-world business use cases). Choice of benchmark problem and the meaning of \success" Finally, choice of benchmark problem is key, and should be made with an eye toward the day when classical machines are vastly outpaced by quantum de- vices. For example, the transition from random Ising problems to frustrated loop/planted solutions problems seen between studies presented in chapters 2 and 4 was forced by the need to have reasonable benchmarks for devices so large that classical systems cannot solve them in a human lifetime. If no analogue to planted solutions is feasible in the area of interest, the previous sections' exhortations about testing a wide array of solvers, optimizing parameters, and consulting with application area experts become all the more important, as one will be forced to resort to the use of \the best solution found" as one's deni- tion of a \success", and one doesn't want to have missed an obvious or readily available algorithm that could nd better solutions than any solvers one tests. If the reader adheres to these principles and applies them well in their use cases, and these become standard/common knowledge within the community, I believe it will signicantly improve the quality, realiability, and usefulness of future benchmarking studies and, in doing so, accelerate progress in the eld. 178 Appendix A Gauge selection as a multi-armed bandit problem A.1 Introduction In this appendix we'll recast the gauge selection problem in quantum annealing as a multi-armed bandit (MAB) problem, discuss some of the possible methods of applying techniques from the eld of bandit problems to gauge selection, and provide the results of applying these techniques on an existing ensemble of problems with a wide variety of diculty for existing quantum annealers. Further, we discuss some of the intricacies with comparing the performance of MAB techniques with standard benchmarking methods for heuristic solvers in this space. Quantum annealing has a wide variety of potential applications, including circuit fault diagnosis [172], machine learning [173, 174, 175, 157, 6], and op- timization [176, 15, 4, 177]. However, many applications require some form of minor embedding procedure, as in [175, 6, 178], whether for basic utility or for error suppression, and all quantum annealers have a Z 2 , or spin-reversal, sym- metry over which qubit spin states (such as current direction in a ux qubit) is as a logical1. These symmetries in the map from logical to physical Hamiltonian are dubbed \gauges", and maps between them are dubbed \gauge transforma- tions" [15], as discussed in chapter 3. The problem of gauge selection has rarely been explored to date. After the idea of gauge-averaging was introduced in order to get systematic and reliable estimates of performance in Ref. [15], the overwhelmingly common strategy is to simply select gauges at random and perform an average, and indeed has been used in essentially every benchmarking paper with quantum annealers, see Refs. [15, 6, 177, 131, 168] and others, with a detailed discussion making up most of 179 chapter 3. Work on attempting to learn higher quality gauges is largely conned to the performance estimator of Ref. [126], which unfortunately has proven in further (unpublished) work to be not very robust. In this work, we seek to more intelligently select gauges by recasting the problem in the language of multi-armed bandit (MAB) problems. The termi- nology comes from the colloquial saying referring to slot machine as \one-armed bandits" [179]. A MAB problem is essentially that of a slot machine with many arms, each arm providing some a priori unknown reward distribution (though the user may know some properties of the distribution(s), such as their support, density of high-reward arms, etc.). One's goal in solving a MAB problem is to get as much (possibly discounted) reward as possible over some time horizon (or at any given time). Equivalently, one can minimize the expected regret, namely the dierence between the reward would have gotten if one had always selected the arm with highest expected reward and the rewards one actually observes. MAB problems are extremely dicult to analyze and generally one requires many simplifying assumptions, such as looking at Robbins [180] original 2-armed Bernoulli bandit or making assumptions about the distribution of expected re- wards over the arms, as in Ref [181]. However, numerous heuristic algorithms with theoretical guarantees on expected loss are available, including -greedy, Boltzmann exploration, Thompson sampling, upper-condence bound (UCB)- family algorithms, and more [181, 182, 179]. Since gauges each have dierent output probability densities over states and one will generally need to draw many samples from one's quantum annealer in order to get reasonable numbers of states of interest (as has occurred thus far), one can clearly view each gauge as an arm and the gauge selection task as the task of minimizing regret. By recasting the problem of gauge selection in this way, we are able to apply the full history and machinery of MAB problems to the problem of gauge selection. In the following, we will brie y give a more detailed description of gauges and gauge-averaging, discuss in more detail the previous work on the gauge selection problem, and follow with a description of some of the common MAB algorithms and how the special properties of gauge selection aect our choice of algorithms. Finally, we will apply some of these gauge algorithms to the problem of gauge selection in simulation and discuss in what contexts the MAB gauge selection method may yield signicant benets. A.2 Gauges, gauge-averaging, and gauge selec- tion The notion of gauge symmetries was introduced in Ref. [15]. The physical Hamiltonian of the system corresponding to a given logical problem is generally non-unique, with many transformations (able to be applied eciently) corre- sponding to an identical logical problem. This gauge freedom, under ideal circumstances, doesn't matter. If the quan- 180 tum system is noiseless with innite precision couplers (without control errors), then performing a spin-reversal transformation, for instance, by replacing every Z operator in the Hamiltonian with it's spin-reversed form for some ensemble of qubits, namely z i ! x i z i x i = z i for qubit i, will leave the output dis- tribution of the quantum annealer invariant (provided one applies the inverse transformation to the output bitstrings). The same can be said for permuting the variables in a fully connected problem if you have a uniform minor embed- ding [29] (without uniform minor embeddings, one will of course anticipate some dierences between permutation gauges). However, as shown in numerous studies, starting with [15], gauges can have a signicant variation in quality, as measured by the probability of success of the optimization problem (that is, the ground state probability of the annealer's output distribution). These variations in quality are a result of the systematic local biases, stray elds, and internal control errors (ICE) endemic to any analog quantum computing platform such as quantum annealing, or really any NISQ device lacking signicant error correction. While one may, in principle, be able to train a model using reinforcement learning to guess high quality gauges di- rectly from the Hamiltonian, since much of the gauge variation results from crosstalks and other properties which are problem Hamiltonian dependent, this may well prove impossible in practice, as one would have to have a very large number of problems with extremely large amounts of data for each for the train- ing process to work. It may yet prove possible, but testing its feasibility will be left to future work. The response of the quantum annealing community has been from the begin- ning to dismiss the variation over gauges as unwanted noise, and seek to generate more robust estimates of performance by simply averaging over many gauges. This of course makes sense if one seeks to perform some scaling analyses that hold up from processor generation to processor generation. One wouldn't want to declare a speedup that is driven by noise arising from poorly understood pro- cesses that you'll almost certainly be engineering away in future chips. However, there is evidence from running the same problem classes on successive genera- tions of annealers that this simply isn't true | even gauge-averaged results are aected signicantly by changes in the underlying hardware architectures [183]. One can compare the performance shown in Ref [15] on random binary Ising problems with Ref [4] on the same problems on a later generation of D-Wave quantum annealer, where the scaling improves even for the gauge-averaged data. Gauge-averaging does improve robustness of performance estimates on a per-problem basis, but it may have less eect on central percentiles of the dis- tribution over instances, so long as one's instance class is suciently unstruc- tured (for instance, random Ising problems, as in [4]). Moreover, if one seeks to use existing annealers to solve interesting problems then if one can improve performance at low cost, then one would be foolish not to do so. 181 A.2.1 Prior work on gauge selection: the performance es- timator To date, the only known work on intelligent selection of gauges, what we call \the gauge selection problem", is found in the work of Perdemo-Ortiz et. al. on their \performance estimator" [126]. The basic notion of this work was to sample only a relative handful of times from each of an ensemble of gauges, and then pick the best one | the meaning of \best" being whichever had the largest value of their performance estimator. Their performance estimator was dened as the negative of the mean energy of the states with the lowest percent of energies returned in their trial runs, with a free parameter to be tuned. The negative of the energy is taken so that the performance estimator is like a score function to be maximized. Ref. [126] suggested that values of of 1 2% were optimal. Perdemo-Ortiz et. al. restricted their tests to only two instances, preselected for hardness. They demonstrated that for these problems there was reasonably good correlation between the best set of ve gauges selected via this method and the highest quality gauges in their tests. One could then expect signicant improvements in average success probability for the two instances, with perfor- mance improvements of up to an order of magnitude possible for the hardest problem they considered. Nevertheless, restricting themselves to only two in- stances left their claims for improvement on relatively shaky ground, and it has turned out that their performance estimator was not robust in other problem domains or other problems. Nevertheless, their basic idea was likely a good one | performing an estimate of the average energy observed among the highest quality solutions and greedily choosing to run only a small set of gauges that performed the best on that metric is actually broadly comparable to -greedy approaches for MABs, which we will discuss in our review of MAB algorithms in the next section. And their insight that we should seek to exploit the idiosyncracies of our processors as best we can is an important one and served as part of the inspiration for the current chapter. A.3 Multi-Armed Bandits, their algorithms, and the MAB properties of the gauge selection problem As discussed in the introduction, MABs typically involve some nite number of arms with unknown reward distributions, and the task of the learner is to maximize reward or minimize regret over some time horizon. Typically, re- searchers take the reward distributions to have some known range of support, with the most common choice being Bernoulli distributed rewards (0; 1), which is amenable to analysis and proofs concerning the expected regret [182, 179]. 182 A.3.1 Gauge selection as a bandit problem The gauge selection problem is somewhat unusual in the eld of bandits. First and foremost, the number of arms is extremely large | spin-reversal transfor- mations (SRTs) alone compose 2 N transformations for N qubits. Moreover, if one is embedding a fully connected problem onN L qubits, it would involveN L ! permutation transformations as well which combine multiplicatively, and that's for only a single minor embedding, of which one can readily construct many. In typical MAB contexts, the number of arms is taken to be nite and relatively small by comparison to the interesting time horizon whereas in our context the number of gauges/arms is eectively innite. In addition, the gauges (arms) do not directly provide us with a reward value. We are given states with associated energies, but in real life we generally do not know the ground state energyE 0 of the problem and, moreover, we generally do not take the raw energy as our reward | being within 10% of the ground state energy isn't 90% as good as nding the minimizing conguration(s). Much as in the case of applying optimal stopping to these sorts of optimization problems by Vinci & Lidar in Ref [184], one has to rst determine an interesting reward functionR(E) to translate the energiesE of returned states into a number which accurately represents the value of the state for one's purpose. The most studied MAB reward distribution is the Bernoulli bandit, with rewards only being 0 and 1. If one's reward function R(E) = 1 only for E being the ground state energy, one recovers the Bernoulli bandit. We'll dub this the Bernoulli reward function. It is, however, largely inappropriate in the \training" of a MAB algorithm for our context | quite generally we do not, in fact, know what the solution of our problem is in quantum annealing. If we knew, we wouldn't need the annealer. Another suggestive reward distribution, which we'll call the Gibbs-weighted (GW) reward function, is simply the negative of the energy weighted by it's Gibbs factor at a particular inverse temperature (a free parameter to be chosen). Given a list of the energies of every state observed ~ E, then the GW reward would be R GW (E) = exp(E)E. The GW reward function at = 0 is simply the energy, but as increases it places more and more weight on lower energies. 1 For our study below, we will be using the simplest reward function for the MAB algorithm in this family, namely the = 0, R(E) = E (or more accuratelyjEj since typicallyE < 0 and we are solving a minimization problem). One can think of any number of reward functions to use, but these two broadly meet the features one would expect of an interesting reward functions. In real world applications, one would need to take care to dene a truly meaning- 1 Some readers may be concerned by the unbounded nature of the weight factor here | for even modest values of E the reward becomes far too large to directly represent in oating point. One can simply apply a global rescaling factor to all rewards, such as dividing by exp(betaE min ) for E min being the smallest observed energy from any arm. Since it's a global factor across all arms, it doesn't eect anything about their relative rewards, and makes any numerical errors only occur in states with high energies and correspondingly very small weights. 183 ful reward function for the task at hand (in sampling applications, for instance, one may reward variety of solutions). This is very similar to the problem of dening costs and rewards in the optimal stopping context, discussed in Ref [184]. Finally, one other signicant dierence is that in gauge selection one may not have a nite, known time horizon over which one seeks to maximize reward. The horizon may discount future rewards, or have a resource budget [185, 186], or involve some protability criterion as in optimal stopping contexts. For our purposes, we will primarily focus on the anytime case. This is largely because, while optimal stopping protability criteria or budgeted bandits are likely to be the most useful, the former are also fairly unstudied in the bandit literature and both would require extremely careful reward and cost models that are surely problem/eld dependent. In summary, the gauge selection problem is an innitely many-armed bandit with bounded but application specic reward functions/distributions and appli- cation specic time horizons. A nal wrinkle of this description is that in reality we have a restless bandit [187], where the reward distributions change over time (due to things like 1=f noise). However, we'll ignore this for our purposes here since we have limited throughput in our annealers and the variations are rela- tively small over the short timescales that are realistic in this case considering (less than an hour). A.3.2 Algorithms for many-armed bandits In this section, we will review some of the chief algorithms used in solving bandit problems in general, and particularly those that apply to our case of many-armed bandits. We'll review terminology and dene some of our symbols, we will borrow most of our notation from Kuleshov & Precup [182], which makes an extensive overview of bandit algorithms and their performance. We will merely summarize the interesting results applicable to gauge selection here. Assume we have K arms, represented by distributionsfD i g with meansf i g and variancesf 2 i g. At each timestep t the player selects arm j(t) and gets a reward r(t) D j(t) . The task of a MAB algorithm is to selectj(t) base on observed rewardsfr(t)g. A proof from Ref [188] demonstrated that regret for bandit algorithms has to scale at least logarithmically with the time horizon, ie regret is at least (log(T )). We will discuss four basic types of algorithms: -greedy, Boltzmann explo- ration, upper-condence bound (UCB), and Thompson sampling, and what they imply for gauge selection. Only one, UCB, has a fully detailed mathematical extension to the many/innitely-armed bandit case with regret bounds, however that same extension can readily be adapted to the other algorithms. -greedy The-greedy strategy is dead simple: choose the arm with the highest empirical mean reward, but with probability 0 < < 1 select another arm at random. 184 This algorithm has a linear regret bound with constant (of course, as of the time the algorithm ignores all accumulated information) and can be thought of the simplest extension gauge-averaging. Indeed, of the time gauges are selected exactly as with gauge-averaging, while 1 of the time the observed best gauge is played. Reducing with time yields a poly-logarithmic bound on regret, though this wasn't found to be practically useful in Ref [189]. Boltzmann exploration Boltzmann exploration keeps track of the empirical mean rewards s i , and then samples arms according to their softmax distribution, namely p j (t)/ exp(s i =) with a temperature parameter . Much like the Boltzmann-weighted reward function's inverse temperature , modifying can greatly alter the behavior of the algorithm. In this case, !1 takes the algorithm to standard gauge- averaging while ! 0 shifts to a purely greedy strategy. One should take into account the magnitude of the expected rewardss i when determining , much as one does when determining a schedule of temperatures in parallel tempering or simulated annealing. If rewards are bounded between 0 and 1, as in the Bernoulli reward mechanism, smaller values of should be used as compared to the generally very large weights from the Gibbs-weighted reward function. A natural method would be to normalize the reward functions so their maximum is 1 and scale accordingly. It was found in Ref [189] that scaling down with temperature in practice yields no benets even though it allows the derivation of a poly-logarithmic bound on regret. Upper-condence bound (UCB) UCB algorithms are not merely a single algorithm, but a broad family. They enable some of the tightest and most rigorous bounds on regret available for bandit problems [181, 182]. The essence of UCB algorithms, as the name sug- gests, is to take account of both the empirical rewardss i and empirical variances v i = 2 i ( i here representing the standard deviation, not to be confused with one of the Pauli matrices) of each arm. One then eectively estimates an upper bound on the expected reward of each arm, mixed with an exploration function, and optimistically select the arm with the greatest upper bound. In the case of arms of similar empirical mean but dierent variances, this algorithm in essence assumes that the arm with the biggest variance is the one with the best real performance. More rigorously, we will hear use the UCB-V algorithm, and follow the notation in Ref [181], adjusted to match previous denitions. Dene a non-decreasing sequence of numbersE t , indexed by the round of arm selection, which we may call the exploration sequence. Let k i be the number of 185 times arm i has been pulled up to the current round. Then at each round of arm selection, choose whichever arm maximizes the value of B i (t) where B i (t) =s i + r 2v i E t k i + 3E t k i As one can see, arms that have rarely been pulled get a bonus from the exploration sequence and from large variance, but as they are pulled more often B i (t) converges toward the expected reward s i . The UCB-V algorithm was proven to have logarithmic regret. An extension to the case of innitely many armed bandits, as in the real gauge-selection problem, was introduced in Ref [181]. One variant, UCB-V(1) was appropriate for the nite horizon case, wherein one essentially selects an ensemble of armsK at the start and then runs UCB-V on them. This algorithm achieves a bound linear in the number of rounds of play and proportional to K 1= where here arises from an assumption that the fraction of all arms with expected reward within of optimal scales like for small , that is Pr(s s i < ) =O( ), where s denotes the maximum mean reward in the space of all arms. One is advised to pick a number of arms K =n = max(+1;2) for time horizon n, yielding an algorithm the authors of Ref [181] dub UCB-F, for xed horizon. The second variant is an anytime algorithm called UCB-AIR, which essen- tially involves, for an assumed/known , for which arms are selected according to the UCB-V algorithm and the pool of all arms is increased such that it is al- ways as large or larger thant = max(+1;2) , a strategy dubbed the arm-increasing rule or AIR (hence UCB-AIR). The authors of Ref [181] show that at anytime the algorithm is bounded and scales withCn = max(+1;2) log 2 (n) for a constant C dependent on . The proof of the bounds on regret based on this assumption are dicult to extend to other algorithms, but we can take the advise of UCBF and UCBAIR as a rough guideline for the selection of the number of arms in all our algorithms. In essence, one wishes to have at least one near-optimal arm in one's ensemble, and by selecting \enough" arms (ie gauges) one will be able to assure this with reasonable probability. Thus, we will be able to compare our algorithms both for a xed number of gauges selected in advance and with a dynamically increasing number of gauges to sample from with time. The parameter , henceforth known as , so as to minimize confusion, is unknown. For convenience only, we will select = 2, to ease analysis. E t will be taken at the more conservative growth rate suggested in [181], namelyE t = logt. Thompson sampling The nal category of algorithm we will consider is also arguably the oldest | Thompson sampling, named after William R. Thompson who introduced the basic idea in a paper in 1933 [190]. This algorithm also goes by the name of the \Bayesian control rule", and was reintroduced after being largely ignored 186 since its introduction in a paper published in 2010 extending it to arbitrary contexts, Ref [191], which gave it this alternative name. Following an empirical investigation into its performance presented at NIPS in 2011, Ref [192], it has since been shown to be asymptotically optimal in general environments, Ref [193], and been used in reinforcement learning many times, as in Ref [194] where a bootstrap-based variant was used in deep Q-learning. From a Bayesian perspective, all of the aforementioned algorithms are un- founded | they use simple frequentist observed expectations, essentially arbi- trary exploration sequences, and/or softmax weighting in order to determine the arm to play next. Thompson sampling in the bandit setting, by contrast, directly answers the question \Which arm should I play to get maximum ex- pected return?" with the answer \Whichever arm you think gives the maximum expected return." If one keeps a posterior density for the expected returns from each arm, then by drawing a single sample from each posterior one can directly answer which one has the maximum expected return and play that. The resulting play will occur with exactly the frequencies corresponding to the subjective beliefs of the player. More systematically, assume that each arm i begins with a prior density p i (r i ) and is updated via Bayes rule to p i (r i (t)j~ r i ), where ~ r i are the observed returns from arm i. Then Thompson sampling suggests that the next arm to play is found by sampling values from all arms ~ r with ith element r i (t) p i (r i (t)j~ r i ) and j(t) = ind max~ r(t). This can be viewed as taking a sample from the posterior distribution for the identity of the arm with greatest average return. One immediately sees that under initially diuse priors Thompson sampling will explore and, as the number of observations grows, slowly converge onto the real expected reward of the arms, thereby moving toward exploitation of the knowledge gained by experience. In the Bernoulli reward setting, expected return is merely the probability of observing a reward from an arm. The natural prior is the conjugate prior to the Bernoulli distribution, Beta(;) for some positive real parameters and . Common choices are the Jereys prior Beta(1=2; 1=2) and the uniform prior Beta(1; 1) (so named because it assigns equal probability to all values between 0 and 1). The posterior given k i arm pulls for arm i and r i observations of the reward is, in general, Beta( +r i ; +k i r i ), as was seen in 3. This closed form of the posterior is extremely convenient and motivates the choice of conjugate priors, as no computationally expensive MCMC algorithm is required to estimate the posterior densities. This highlights a key issue with the Bernoulli reward function | if one does not have the ground state energy of the problem Hamiltonian on hand one does not actually know whether to grade the observed states from one's quantum annealer as either 0 or 1. One could simply assume the minimum energy E min observed thus far is the ground stateE 0 , updating on the y if one nds a lower energy. This poses little issue for the -greedy and Boltzmann exploration al- gorithms, merely slowing down exploration of the gauge landscape and nding 187 the true ground state, and only a modest problem for UCB algorithms, it un- dermines the conceptual foundation for Thompson sampling. If one is to keep a posterior density,one properly needs to take into account uncertainty in the observed rewards as they may be edited in the future. Luckily, this problem is not so complex | in the case of Bernoulli rewards, one eectively has two models | one in which E min = E 0 and one in which E min 6=E 0 . We can dene for each gauge a prior Dir(;; ) with the compo- nents, respectively, representing the probability of nding a state lower than the lowest observed state, nding the lowest observed state, and nding a state higher than the lowest observed state. Taking these as values indexed by gauge, we can then represent these components as a grand Dirichlet distri- butionDir(~ ; ~ ; ~ gamma) with the marginal distributionDir( P i ; P i ; P i ) representing our state of belief above the likelihood across all gauges of sampling a state less than, equal to, or above the current minimum observed state. Dened as a sampling procedure, we sample a value from Dir( P i ; P i ) (the marginal distribution over the belief that our current minimum state is the true minimum) and with that probability then sample values for each gauge from Dir( i ; i + i ) (if we got 0, ie we sampled that our current minimum is not the true minimum) or Dir( i ; i ) (if we got 1, ie we sampled that the current minimum was indeed the true minimum). We then merely dene a prior over all these possibilities as in the case where we knew the ground state. Due to aforementioned discussions of biases and the diculty of learning in the ideal case for small probability instances for priors other than Haldane, we may choose something like Dir(1; 0; 0) as a prior density. This would lead to a belief for our nding the minimum state of Dir(G; P i x i ) for observations of the minimum state per gauge x i and number of gauges G. This distribution clearly induces signicant exploration across gauges until we observe a number of putative successes much larger than the number of gauges. Of course, other choices could be made, and it is an open question which tends to the best outcomes in practice. A similar, though less severe, problem arises in the case of the Gibbs- weighted reward function. In this case, it is simply the problem of choosing an appropriate prior. If the prior places too much weight on energies smaller than E min , one will eectively wash away any dierences between the weights, and the resulting sampling will be approximately uniform. One possibility is to assume that E min is the true ground state, though this may lead to overtting and rapid convergence to a highly degenerate or unusually likely excited state. We do not investigate these issues in detail here. A.4 Methods To test the impact of applying MAB algorithms to gauge selection, we will employ a dataset constructed by Tameem Albash based on his logical planted instances with gadgets from Ref [195]. These instances are unique in that they have exhibited an optimal annealing time for the rst time on D-Wave devices, 188 and span a wide range of empirical diculty spanning from 8e 6 per trial on average to approximately 3e 3 for the instances discussed here, a span of nearly 3000 fold. For our purposes here we are not concerned with optimal annealing times, but employ these for their wide range of diculty. We extracted 13 prob- lems from the 100 in the original dataset, which corresponded to problems the problems with index 1,2,5,10,15,20,30,40,50,60,70,80,90 when ranked from low- est probability of success to highest. This gives a good range of problems with varying hardness on which to run our MAB gauge selection techniques. We then run each of 100 gauges with 100 programming cycles each, each programming cycle with 1000 anneals, yielding a total of 10 7 samples per problem, and 10 5 per gauge. To simulate the behavior of our algorithms, we take a bootstrap-like ap- proach. When an algorithm chooses a gauge g, one of the 100 programming cycles for gauge g from the device is used as the return distribution on which we calculate our rewards. This can be thought of as a kind of block bootstrap, as it maintains any correlations between the samples in a single programming cycle (should there be any). For each algorithm, we will simulate 100 plays of the many-armed bandit gauge selection game for each instance, each game containing 10 4 simulated programming cycles (each being a lever pull/gauge selection event). This way we will get detailed statistics on the behavior of the algorithm as a function of time for analysis. We will also attempt two variants of each algorithm: a xed-arm version, in which we pin the number of arms to be 10,20,50, and 100 (sampled with replacement from the 100 gauges available), as well as the arm- increasing rule (AIR) variant in which the number of gauges is increased with time with (formerly, ) of 2 chosen for convenience. As for algorithm-specic parameters: -greedy, we will use varying values of , 0.01,0.02,0.05, 0.1, and 0.2. Boltzmann exploration, we will be testing a variety of temperatures [0.1,0.2,0.5,1.,2.,5.,10.]. UCB, we will employ the standard schedule forE t = log(t). Thompson sampling we will test our proposed prior of Dir(1; 0; 0) as dis- cussed above. Note, that uniquely among the algorithms here, this is arguably the only algorithm being run with a Bernoulli reward function, since the MAB algorithm is directly trying to maximize the likelihood of nding the ground state, while the other algorithms are running without any such goal, instead working to maximize average energy returned. The primary reason here is that while we typically do not know the minimum, Thompson sampling involves sampling the possibility that the ground state is lower than the lowest energy state observed. A quick note regarding error estimation in this context. Due to computa- tional constraints, here the 25th and 75th percentiles of the various performance 189 metrics are typically plotted along with the mean and median, in order to give a rough picture of the variation from run to run. A.5 Results Here we will present a small subset of our full dataset, chosen to highlight some of it's key features. First, just to give a general understanding of how one may plot the results, we have Figure A.1. There one can see plots of the cumulative number of successes against the number of simulated gauges, the ratio of the cumulative successes between the MAB algorithm and the blind algorithm, and that same ratio plot- ted against the cumulative number of successes. This last typically approaches an asymptotic value that shows the total benet (in terms of sampling the suc- cess probability) of using the MAB algorithm vs the standard blind averaging. Here we see we can achieve very rapid rises to 2x the rate of successes for most instances. We'll go algorithm by algorithm to give a avor of their behavior as a function of their parameters and with the problem hardness. A.5.1 -greedy -greedy is a dead-simple algorithm, with only one free parameter in the anytime case, namely. I present A.2 and A.3 which represent the equivalent of the 5th and 90th percentiles of instances, and are fairly representative. As one can see, the looser = 0:1 tends to help the 25th percentile for the harder problem, there is very little dierence for fairly easy problems. However, since = 0:1 can only reduce the number of successes per gauge on average by only 10% from the maximum theoretical value, it is likely preferable in practice to use the looser value. Finally, we'll provide a representation of the MAB to blind ratio for the average trial for = 0:1 (given our conversation above), just to give a rough sense of how performance varies with hardness percentile of the underlying problem, see Figure A.4. A.5.2 Boltzmann exploration In our study, we nd that Boltzmann exploration generally performs reasonably well except for the very hardest problems, which seem to suer from over- specialization to suboptimal gauges, but which have at least some probability of nding a success. However for those hardest problems, increasing temperature results in improved performance, in particular it largely prevents a large (> 25%) of trials from performing worse than a blind guess as they do at lower temperatures, though it may cause worse performance on easier problems. For a representative example of the behavior for dierent temperatures, see gures A.6 and A.5. 190 Figure A.1: The basic types of fundamental plots | The rst panel shows cu- mulative number of true successes in the simulation as a function of the number of gauges, the second shows the ratio between said cumulative successes for the MAB algorithm (here Boltzmann with energy 1) to that which one would expect from randomly selecting gauges. And nally there is the third panel, showing the MAB-to-blind cumulative successes against the number of successes ob- served by that point. This latter is quite nice, in that the value at any given point is an estimate of the magnitude of the benet (the improvement in the rate of successes) of running the MAB. This particular plot is instance 10, the 60th percentile, using Boltzmann exploration at = 1 using an adjustable number of arms. 191 Figure A.2: Plot of ratio of MAB to blind cumulative success as a function of the number of programming cycles for the instance representing the 5th percentile of the dataset for =f0:01; 0:05; 0:1g. We see a signicant improvement for the 25th percentile of trials, though not a very large dierence in the scaling of each algorithm individually. 192 Figure A.3: Plot of ratio of MAB to blind cumulative success as a function of the number of programming cycles for the instance representing the 90th percentile of the dataset for =f0:01; 0:05; 0:1g. We see fairly little variation, as easy instances have such high probabilities tend to have somewhat less variation in relative probability magnitudes than harder instances. 193 Figure A.4: Plot of the mean ratio over repeated trials of MAB to blind cu- mulative success as a function of the number of programming cycles for the 13 problems discussed here for -greedy. There is little to additional to say about Boltzmann exploration, but we will provide some additional comments in some cross-algorithmic comparisons below. And we will also reproduce an equivalent of gure A.4 in gure A.7. 194 Figure A.5: Plot of the mean ratio over repeated trials of MAB to blind cu- mulative success as a function of the number of programming cycles for the instance equal to the 5th percentile in probability of success of the instance class. We see similar performance, but some small degradation as we go to higher temperatures. 195 Figure A.6: Plot of the mean ratio over repeated trials of MAB to blind cumu- lative success as a function of the number of programming cycles for the hardest instance of the instance class, and see an improvement in performance as tem- perature increases. By increasing temperature, performance broadly improves, in particular the 25th percentile more rapidly moves above breakeven (ie 1). 196 Figure A.7: Plot of the mean ratio over repeated trials of MAB to blind cu- mulative success as a function of the number of programming cycles for the 13 problems discussed here for Boltzmann exploration at temperature= 1. A.6 UCB Interestingly, of all the algorithms, UCB performed the worst. This may be a result of a suboptimal schedule forE t (here log(t)) but this seems to be very common choice in the literature. UCB occasionally reached high rates of return, but also had a high probability of performing very poorly. This may be due to properties of the underlying distribution of energies, which in our experience tend to be overdispersed (see some of the supplemental information, for instance, in Ref [6]). Here we will simply reproduce a few gures similar to previous sections | at least based on this provisional evidence, UCB may not be the best algorithm for the task of gauge selection. More work is needed to discern exactly why this is the case. A.6.1 Thompson sampling Thompson sampling performs broadly similar to Boltzmann exploration and - greedy, but the cumulative return curves are almost universally much smoother and predictable, and reaches higher maximum returns than any other algorithm. Here, we reproduce the gures for the returns for the 5th percentile of instances A.11, the 20th percentile A.12, and the 90th A.13. The plot for the 5th percentile is especially interesting | here, rather than the semi-greedy approach of - greedy and Boltzmann exploration, we see that as the number of trials (and thus gauges being tested) increases, the algorithm essentially converges to close to random guessing. This is largely because the number of times we ever observe the ground state for that problem is very very low, and thus our priorBeta(G;x) (for number of gauges G and total observations of the lowest energy state thus far x) is strongly biased toward there being lower energy states. 197 Figure A.8: Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of programming cycles for the hardest instance of the instance class. Performance is widely overdis- persed | there is a signicant chance of very high returns, but also very high probability of very poor returns, with the average case being unimpressive by comparison to other algorithms here. Figure A.9: Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of programming cycles for the 10th easiest problem in the class. Compared to the same plot in Figure A.3, for instance, performance is poor. 198 Figure A.10: Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of programming cycles for the 13 problems discussed here. It is fairly clearly quite consistent but much worse, on the whole, than other algorithms we test here. However, by the time we look at the 20th percentile (or the 90th), the algorithm performs quite well and all curves (mean, median, 25th and 75th percentiles) rise smoothly. We plot the mean return for each instance, as before, in Figure A.14, seeing that it is competitive with the best algorithms tested before, and achieves the highest returns of any algorithm on a small set of instances. A.6.2 Comparing dierent algorithms Here, we brie y address the question of \which algorithm is `best'?". Brie y, both for brevity and because this study simply isn't equipped to answer the question. As was seen in Chapter 3, we need to think carefully about what we have learned here. And due to the small set of instances, the lack of complete optimization of the parameters of the algorithms, and limited (even though large) dataset, we simply can't make very strong claims here. Provisionally, we can clearly eliminate UCB in this case. Comparing of the remaining three algorithms, we will simply plot a bar graph representing the maximum ratio of return from the MAB algorithm to that of blind guessing to give some indication of their relative performance in the end, in Figure A.15. Looking at the graph, we see that Thompson sampling generally performs worse than either -greedy or Boltzmann exploration, with Boltzmann exploration narrowly outperforming -greedy. Thus, we can very provisonally recommend Boltzmann exploration. However, much further study would be required before any dependable statements can be made. 199 Figure A.11: Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of programming cycles for the 5th hardest problem in the class. Performance collapses to blind guessing, as our prior density places signicant prior probability that there are lower energy states so long as the number of \arms" in our bandit is larger than the number of successes. Figure A.12: Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of programming cycles for the 20th hardest problem in the class. Performance now smoothly increases, in contrast to A.11. 200 Figure A.13: Plot of the mean ratio over repeated trials of MAB using UCB to blind cumulative success as a function of the number of programming cycles for the 10th easiest problem in the class. Performance is still reasonble, and again rises smoothly across the whole space. Figure A.14: Plot of the mean ratio over repeated trials of MAB using Thomp- son sampling to blind cumulative success as a function of the number of pro- gramming cycles for the 13 problems discussed here. It is fairly clearly quite consistent, and actually reaches by far the highest rates of return for certain problems as compared with UCB, Boltzmann exploration, and -greedy. 201 Figure A.15: The maximum (long-run) ratio of the total return under the MAB algorithm compared to blind guesing for each of our three competitive algo- rithms. As we can see, while Thompson sampling ended up achieving the highest maximum returns for a few instances, it generally fell behind the more greedy algorithms such as -greedy and Boltzmann exploration. If one looks carefully, Boltzmann exploration almost universally slightly outperforms -greedy in this case. 202 A.7 Conclusion Here, I've presented an introduction to the notion of many-armed bandits, which have a rich literature in the mathematics community and have come to be used in machine learning as well. While one can tell, from the often wide gap between the 25th and 75th percentiles of the trials in most of the gures here, that applying MAB algorithms to the gauge selection problem have the risk of a downside and give widely variable results depending on algorithmic choice and problem diculty, it also provides the opportunity to much more quickly sample the ground state a given number of times than is possible in blind sampling of gauges. This may prove useful in certain practical applications, though cashing out exactly how useful will require more extensive tests with far larger amounts of data that can truly capture the entire distribution over gauges and precisely pin down the probability density of results per gauge so that one could check the most important concern remaining | bias in the returned solutions. The primary reason why one may wish to sample purported ground states many times is to a) gain condence that one has truly found the ground state and b) to sample a variety of ground states. Without a complete enumeration of ground states and complete statistical characterization of each gauge, answering (b) may well be impossible. Given the extremely conservative nature of our Thompson sampling prior densities, if one wants to achieve (a), then perhaps even despite its lower rate of sampling the ground state, Thompson sampling may well prove preferable. Further work in rening the answers to (a) and (b) are potentially interesting areas for future research eort. Ultimately, applying MAB algorithms to gauge selection is merely a rst step in the long road of trying to learn the map from gauges to probability of success over the space of all instances, which would allow practitioners of quantum annealing to maximally exploit their chosen systems. 203 Bibliography [1] S. Boixo, T. Albash, F. M. Spedalieri, N. Chancellor, D. A. Lidar, Nat. Commun. 4, 2067 (2013). [2] S. Boixo, et al., Nat Commun 7 (2016). [3] J. Job, D. Lidar, Quantum Science and Technology 3, 030501 (2018). [4] T. F. Rnnow, et al., Science 345, 420 (2014). [5] I. Hen, et al., Phys. Rev. A 92, 042325 (2015). [6] A. Mott, J. Job, J.-R. Vlimant, D. Lidar, M. Spiropulu, Nature 550, 375 (2017). [7] F. Barahona, J. Phys. A: Math. Gen 15, 3241 (1982). [8] E. Farhi, et al., Science 292, 472 (2001). [9] R. Harris, et al., Phys. Rev. B 82, 024511 (2010). [10] M. W. Johnson, et al., Superconductor Science and Technology 23, 065004 (2010). [11] A. J. Berkley, et al., Superconductor Science and Technology 23, 105014 (2010). [12] M. W. Johnson, et al., Nature 473, 194 (2011). [13] V. Choi, Quant. Inf. Proc. 10, 343 (2011). [14] C. C. McGeoch, C. Wang, Proceedings of the 2013 ACM Conference on Computing Frontiers (2013). [15] S. Boixo, et al., Nat. Phys. 10, 218 (2014). [16] S. Santra, G. Quiroz, G. V. Steeg, D. A. Lidar, New J. of Phys. 16, 045006 (2014). [17] S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, Science 220, 671 (1983). 204 [18] S. V. Isakov, I. N. Zintchenko, T. F. Rnnow, M. Troyer, Computer Physics Communications 192, 265 (2015). [19] C. J. Geyer, Computing Science and Statistics Proceedings of the 23rd Symposium on the Interface, E. M. Keramidas, ed. (American Statistical Association, New York, 1991), p. 156. [20] D. J. Earl, M. W. Deem, Physical Chemistry Chemical Physics 7, 3910 (2005). [21] H. G. Katzgraber, S. Trebst, D. A. Huse, M. Troyer, J. Stat. Mech. 2006, P03018 (2006). [22] F. Hamze, N. de Freitas, UAI , D. M. Chickering, J. Y. Halpern, eds. (AUAI Press, Arlington, Virginia, 2004), pp. 243{250. [23] A. Selby, arXiv:1409.3934 (2014). [24] R. Marto n ak, G. E. Santoro, E. Tosatti, Phys. Rev. B 66, 094203 (2002). [25] B. Heim, T. F. Rnnow, S. V. Isakov, M. Troyer, Science 348, 215 (2015). [26] E. Crosson, A. W. Harrow, 2016 IEEE 57th Annual Symposium on Foun- dations of Computer Science (FOCS) pp. 714{723 (2016). [27] S. W. Shin, G. Smith, J. A. Smolin, U. Vazirani, arXiv:1401.7087 (2014). [28] K. M. Zick, O. Shehab, M. French, arXiv:1503.06453 (2015). [29] D. Venturelli, D. J. J. Marchand, G. Rojo, arXiv:1506.08479 (2015). [30] E. G. Rieel, et al., Quantum Information Processing 14, 1 (2015). [31] G. Rosenberg, et al., arXiv:1508.06182 (2015). [32] A. Lucas, Front. Phys. 2, 5 (2014). [33] V. Choi, Quant. Inf. Proc. 7, 193 (2008). [34] C. Klymko, B. D. Sullivan, T. S. Humble, Quant. Inf. Proc. 13, 709 (2014). [35] Z. Zhu, A. J. Ochoa, H. G. Katzgraber, Phys. Rev. Lett. 115, 077201 (2015). [36] T. Albash, W. Vinci, A. Mishra, P. A. Warburton, D. A. Lidar, Phys. Rev. A 91, 042314 (2015). [37] T. Albash, T. F. Rnnow, M. Troyer, D. A. Lidar, Eur. Phys. J. Spec. Top. 224, 111 (2015). [38] U. Wol, Phys. Rev. Lett. 62, 361 (1989). [39] B. W. Reichardt, F. Unger, U. Vazirani, Nature 496, 456 (2013). 205 [40] I. L. Chuang, M. A. Nielsen, Journal of Modern Optics 44, 2455 (1997). [41] M. Mohseni, A. T. Rezakhani, D. A. Lidar, Phys. Rev. A 77, 032322 (2008). [42] R. Blume-Kohout, et al., arXiv:1310.4492 (2013). [43] D. Greenbaum, arXiv:1509.02921 (2015). [44] A. M. Childs, I. L. Chuang, D. W. Leung, Physical Review A 64, 012314 (2001). [45] R. Blume-Kohout, et al., Nature Communications 8, 14485 (2017). [46] T. Kadowaki, H. Nishimori, Phys. Rev. E 58, 5355 (1998). [47] A. Das, B. K. Chakrabarti, Rev. Mod. Phys. 80, 1061 (2008). [48] E. Farhi, et al., Science 292, 472 (2001). [49] W. M. Kaminsky, S. Lloyd, Quantum Computing and Quantum Bits in Mesoscopic Systems, A. Leggett, B. Ruggiero, P. Silvestrini, eds. (Kluwer Academic/Plenum Publ., 2004). [50] W. M. Kaminsky, S. Lloyd, T. P. Orlando, arXiv:quant-ph/0403090 (2004). [51] T. Albash, D. A. Lidar, arXiv:1611.04471 (2016). [52] A. J. Berkley, et al., Phys. Rev. B 87, 020502 (2013). [53] J. Preskill, arXiv:1203.5813 (2012). [54] S. Flammia, The Quantum Ponti blog: \Quantum Advantage" (2017). [55] S. Aaronson, L. Chen, arXiv:1612.05903 (2016). [56] M. J. Bremner, A. Montanaro, D. J. Shepherd, Physical Review Letters 117, 080501 (2016). [57] E. Farhi, A. W. Harrow, arXiv:1602.07674 (2016). [58] X. Gao, S.-T. Wang, L. M. Duan, Physical Review Letters 118, 040502 (2017). [59] S. Boixo, et al., arXiv:1608.00263 (2016). [60] B. Feerman, M. Foss-Feig, A. V. Gorshkov, arXiv:1701.03167 (2017). [61] P. W. Shor, SIAM J. Comput. 26, 1484 (1997). [62] E. Farhi, J. Goldstone, S. Gutmann, arXiv:1412.6062 (2014). 206 [63] K.-W. Yip, T. Albash, D. Lidar, Quantum trajectories for time-dependent adiabatic master equations, in preparation (2017). [64] T. Lanting, et al., Phys. Rev. X 4, 021041 (2014). [65] G. Vidal, R. F. Werner, Phys. Rev. A 65, 032314 (2002). [66] F. M. Spedalieri, Phys. Rev. A 86, 062311 (2012). [67] T. Albash, I. Hen, F. M. Spedalieri, D. A. Lidar, Physical Review A 92, 062328 (2015). [68] T. Albash, S. Boixo, D. A. Lidar, P. Zanardi, New Journal of Physics 14, 123016 (2012). [69] J. A. Smolin, G. Smith, Frontiers in Physics 2, 52 (2014). [70] L. Wang, et al., arXiv:1305.5837 (2013). [71] S. W. Shin, G. Smith, J. A. Smolin, U. Vazirani, arXiv:1404.6499 (2014). [72] P. J. D. Crowley, A. G. Green, Physical Review A 94, 062106 (2016). [73] J. Brooke, D. Bitko, T. F., Rosenbaum, G. Aeppli, Science 284, 779 (1999). [74] J. Brooke, T. F. Rosenbaum, G. Aeppli, Nature 413, 610 (2001). [75] M. W. Johnson, et al., Nature 473, 194 (2011). [76] V. S. Denchev, et al., Phys. Rev. X 6, 031015 (2016). [77] S. Mandr a, Z. Zhu, W. Wang, A. Perdomo-Ortiz, H. G. Katzgraber, Phys- ical Review A 94, 022337 (2016). [78] T. Monz, et al., Phys. Rev. Lett. 106, 130506 (2011). [79] J. G. Bohnet, et al., Science 352, 1297 (2016). [80] Catherine C. McGeoch, A Guide to Experimental Algorithmics (Cam- bridge University Press, Cambride, UK, 2012). [81] J. King, S. Yarkoni, M. M. Nevisi, J. P. Hilton, C. C. McGeoch, arXiv:1508.05087 (2015). [82] W. Vinci, D. A. Lidar, Physical Review Applied 6, 054016 (2016). [83] D. Venturelli, et al., Phys. Rev. X 5, 031040 (2015). [84] D. B. Rubin, et al., The annals of statistics 9, 130 (1981). [85] L. K. Grover, Phys. Rev. Lett. 79, 325 (1997). 207 [86] C. Bennett, E. Bernstein, G. Brassard, U. Vazirani, SIAM Journal on Computing 26, 1510 (1997). [87] A. Papageorgiou, J. F. Traub, Phys. Rev. A 88, 022316 (2013). [88] S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, Science 220, 671 (1983). [89] R. Marto n ak, G. E. Santoro, E. Tosatti, Phys. Rev. B 66, 094203 (2002). [90] G. E. Santoro, R. Marto n ak, E. Tosatti, R. Car, Science 295, 2427 (2002). [91] T. F. Rnnow, et al., Supplementary material for \Dening and detecting quantum speedup". [92] R. D. Somma, D. Nagaj, M. Kieferov a, Phys. Rev. Lett. 109, 050501 (2012). [93] H. G. Katzgraber, F. Hamze, R. S. Andrist, Phys. Rev. X 4, 021008 (2014). [94] K. L. Pudenz, T. Albash, D. A. Lidar, Nat. Commun. 5, 3243 (2014). [95] S. Isakov, I. Zintchenko, T. Rnnow, M. Troyer, arXiv:1401.1084 (2014). [96] G. Bhanot, D. Duke, R. Salvador, J. Stat. Phys. 44, 985 (1986). [97] H.-O. Heuer, Comput. Phys. Commun. 59, 387 (1990). [98] R. Dechter, Articial Intelligence 113, 41 (1999). [99] A. P. Young, H. G. Katzgraber, Phys. Rev. Lett. 93, 207203 (2004). [100] M. S. Sarandy, D. A. Lidar, Phys. Rev. Lett. 95, 250503 (2005). [101] E. T. Jaynes, Probability theory: The logic of science (Cambridge univer- sity press, 2003). [102] B. Efron, Breakthroughs in statistics (Springer, 1992), pp. 569{593. [103] T. S. Ferguson, The annals of statistics pp. 209{230 (1973). [104] D. A. Berry, R. Christensen, The Annals of Statistics pp. 558{568 (1979). [105] L. Kuo, SIAM Journal on Scientic and Statistical Computing 7, 60 (1986). [106] A. E. Gelfand, A. Kottas, Journal of Computational and Graphical Statis- tics 11, 289 (2002). [107] L. Wang, D. B. Dunson, Journal of Computational and Graphical Statis- tics 20, 196 (2011). [108] D. G. Mayo, M. Kruse, Foundations of bayesianism (Springer, 2001), pp. 381{403. 208 [109] L. Mendo, J. M. Hernando, Communications, IEEE Transactions on 54, 231 (2006). [110] K. Binder, A. P. Young, Reviews of Modern Physics 58, 801 (1986). [111] H. Nishimori, Statistical Physics of Spin Glasses and Information Pro- cessing: An Introduction (Oxford University Press, Oxford, UK, 2001). [112] P. I. Bunyk, et al., IEEE Transactions on Applied Superconductivity 24, 1 (Aug. 2014). [113] M. Mezard, G. Parisi and M.A. Virasoro, Spin Glass Theory and Beyond, World Scientic Lecture Notes in Physics (World Scientic, Singapore, 1987). [114] W. Barthel, et al., Phys. Rev. Lett. 88, 188701 (2002). [115] F. Krzakala, L. Zdeborov a, Phys. Rev. Lett. 102, 238701 (2009). [116] S. Bravyi, B. Terhal, SIAM Journal on Computing 39, 1462 (2009). [117] B. Bollob as, C. Borgs, J. T. Chayes, J. H. Kim, D. B. Wilson, Random Struct. Algorithms 18, 201 (2001). [118] A. Selby, D-wave: comment on comparison with classical computers, http://tinyurl.com/dwave-vs-classical (2013). [119] R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman, L. Troyansky, Na- ture 400, 133 (1999). [120] M. Amin, arXiv:1503.04216 (2015). [121] T. Albash, D. A. Lidar, Phys. Rev. A 91, 062320 (2015). [122] T. Neuhaus, arXiv:1412.5460 (2014). [123] T. Albash, S. Boixo, D. A. Lidar, P. Zanardi, New J. of Phys. 14, 123016 (2012). [124] A. M. Childs, E. Farhi, J. Preskill, Phys. Rev. A 65, 012322 (2001). [125] A. D. King, C. C. McGeoch, arXiv:1410.2628 (2014). [126] A. Perdomo-Ortiz, J. Fluegemann, R. Biswas, V. N. Smelyanskiy, arXiv:1503.01083 (2015). [127] A. Perdomo-Ortiz, B. O'Gorman, J. Fluegemann, R. Biswas, V. N. Smelyanskiy, arXiv:1503.05679 (2015). [128] V. Martin-Mayor, I. Hen, Scientic Reports 5, 15324 EP (2015). [129] K. L. Pudenz, T. Albash, D. A. Lidar, Phys. Rev. A 91, 042302 (2015). 209 [130] K. C. Young, R. Blume-Kohout, D. A. Lidar, Phys. Rev. A 88, 062314 (2013). [131] A. D. King, T. Lanting, R. Harris, arXiv:1502.02098 (2015). [132] J. Bezanson, A. Edelman, S. Karpinski, V. B. Shah, arXiv:1411.1607 (2014). [133] T. Neuhaus, arXiv:1412.5361 (2014). [134] S. Chatrchyan, et al., Phys. Lett. B716, 30 (2012). [135] G. Aad, et al., Phys. Lett. B716, 1 (2012). [136] H. Neven, V. S. Denchev, G. Rose, W. G. Macready, arXiv:0811.0416 (2008). [137] K. L. Pudenz, D. A. Lidar, Quantum Information Processing 12, 2027 (2013). [138] F. Chollet, Keras, https://github.com/fchollet/keras (2015). [139] T. Chen, C. Guestrin, arXiv:1603.02754 (2016). [140] V. Khachatryan et al. (CMS collaboration), Eur. Phys. J. C74, 3076 (2014). [141] G. Aad et al. (ATLAS collaboration), Phys. Rev. D90, 112015 (2014). [142] C. Patrignani, et al., Chin. Phys. C40, 100001 (2016). [143] C. Englert, T. Plehn, D. Zerwas, P. M. Zerwas, Phys. Lett. B703, 298 (2011). [144] D. E. Morrissey, M. J. Ramsey-Musolf, New J. Phys. 14, 125003 (2012). [145] D. Buttazzo, et al., JHEP 12, 089 (2013). [146] C. Adam-Bourdarios, et al., Journal of Physics: Conference Series 664, 072015 (2015). [147] R. Al-Rfou, et al., arXiv e-prints abs/1605.02688 (2016). [148] J. A. Hanley, B. J. McNeil, Radiology 143, 29 (1982). [149] J. Chen, et al., 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 2429{2433 (2016). [150] R. Y. Li, R. Di Felice, R. Rohs, D. A. Lidar, NPJ quantum information 4, 14 (2018). [151] S. Lloyd, M. Mohseni, P. Rebentrost, Nat. Phys. 10, 631 (2014). 210 [152] N. Wiebe, A. Kapoor, K. Svore, Quantum Information & Computation 15, 0318 (2015). [153] G. D. Paparo, V. Dunjko, A. Makmal, M. A. Martin-Delgado, H. J. Briegel, Phys. Rev. X 4, 031002 (2014). [154] P. Rebentrost, M. Mohseni, S. Lloyd, Phys. Rev. Lett. 113, 130503 (2014). [155] M. Schuld, I. Sinayskiy, F. Petruccione, Contemporary Physics 56, 172 (2015). [156] I. Cong, L. Duan, arXiv:1510.00113 (2015). [157] J. Biamonte, et al., arXiv:1611.09347 (2016). [158] T. Sj ostrand, S. Mrenna, P. Skands, Journal of High Energy Physics 2006, 026 (2006). [159] T. Gleisberg, et al., Journal of High Energy Physics 2009, 007 (2009). [160] D. P. Kingma, J. Ba, CoRR abs/1412.6980 (2014). [161] J. Snoek, H. Larochelle, R. P. Adams, ArXiv e-prints (2012). [162] J. Snoek, Spearmint, https://github.com/HIPS/Spearmint (2012). [163] W. Vinci, T. Albash, G. Paz-Silva, I. Hen, D. A. Lidar, Phys. Rev. A 92, 042310 (2015). [164] A. Mishra, T. Albash, D. A. Lidar, Quant. Inf. Proc. 15, 609 (2015). [165] A. W. Harrow, A. Hassidim, S. Lloyd, Phys. Rev. Lett. 103, 150502 (2009). [166] T. S. Ferguson, Optimal Stopping and Applications (2008). [167] T. Albash, D. A. Lidar, arXiv:1705.07452 (2017). [168] J. King, et al., arXiv:1701.04579 (2017). [169] J. Roland, N. J. Cerf, Phys. Rev. A 65, 042308 (2002). [170] A. T. Rezakhani, A. K. Pimachev, D. A. Lidar, Phys. Rev. A 82, 052305 (2010). [171] A. Perdomo-Ortiz, et al., arXiv preprint arXiv:1708.09780 (2017). [172] A. Perdomo-Ortiz, et al., arXiv preprint arXiv:1708.09780 (2017). [173] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, R. Melko, arXiv:1601.02036 (2016). [174] M. Benedetti, J. Realpe-G omez, R. Biswas, A. Perdomo-Ortiz, arXiv:1609.02542 (2016). 211 [175] S. H. Adachi, M. P. Henderson, arXiv:1510.06356 (2015). [176] E. Farhi, et al., Science 292, 472 (2001). [177] I. Hen, et al., Performance of d-wave two on problems with planted solu- tions (2014). http://www.isi.edu/sites/default/files/top_level/ events/aqc2014/AQC_14_ItayHen.pdf. [178] W. Vinci, T. Albash, D. A. Lidar, Nature Quantum Information 2, 16017 (2016). [179] S. L. Scott, Applied Stochastic Models in Business and Industry 26, 639 (2010). [180] H. Robbins, Herbert Robbins Selected Papers (Springer, 1985), pp. 169{ 177. [181] Y. Wang, J.-Y. Audibert, R. Munos, Advances in Neural Information Processing Systems (2009), pp. 1729{1736. [182] V. Kuleshov, D. Precup, arXiv preprint arXiv:1402.6028 (2014). [183] B. Pokharel, D. Venturelli, E. Rieel, APS Meeting Abstracts (2017). [184] W. Vinci, D. A. Lidar, Physical Review Applied 6, 054016 (2016). [185] Y. Xia, H. Li, T. Qin, N. Yu, T.-Y. Liu. [186] H. Li, Y. Xia (2017). [187] P. Whittle, Journal of applied probability 25, 287 (1988). [188] T. L. Lai, H. Robbins, Advances in applied mathematics 6, 4 (1985). [189] J. Vermorel, M. Mohri, European conference on machine learning (Springer, 2005), pp. 437{448. [190] W. R. Thompson, Biometrika 25, 285 (1933). [191] P. A. Ortega, D. A. Braun, Journal of Articial Intelligence Research 38, 475 (2010). [192] O. Chapelle, L. Li, Advances in neural information processing systems (2011), pp. 2249{2257. [193] J. Leike, T. Lattimore, L. Orseau, M. Hutter, arXiv preprint arXiv:1602.07905 (2016). [194] I. Osband, C. Blundell, A. Pritzel, B. Van Roy, Advances in neural infor- mation processing systems (2016), pp. 4026{4034. [195] T. Albash, D. A. Lidar, Physical Review X 8, 031016 (2018). 212
Abstract (if available)
Abstract
I present an overview of my work on benchmarking quantum annealing devices, both the experimental work, and my work on understanding the fundamental principles and guidelines required. Here, I provide an overview of much of the work done in the field, define various forms of quantum speedup one may search for and the appropriate statistical methods for benchmarking noisy quantum systems with systematic errors dependent on many free parameters. I then apply much of this thinking in several test cases, including random Ising problems, Ising problems with planted solutions (which address a major challenge of benchmarking—namely the difficulty of success verification for arbitrary large problems), and the training of a binary classifier in the context of a high energy physics seeking to distinguish Higgs boson decays from background processes. I then summarize the lessons learned, and present them to the community in the hopes they can improve the quality and speed of future work in this area.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Open-system modeling of quantum annealing: theory and applications
PDF
Error correction and quantumness testing of quantum annealing devices
PDF
Tunneling, cascades, and semiclassical methods in analog quantum optimization
PDF
Minor embedding for experimental investigation of a quantum annealer
PDF
Theory and simulation of Hamiltonian open quantum systems
PDF
Dissipation as a resource for quantum information processing: harnessing the power of open quantum systems
PDF
Quantum information flow in steganography and the post Markovian master equation
PDF
Coherence generation, incompatibility, and symmetry in quantum processes
PDF
Explorations in the use of quantum annealers for machine learning
PDF
Topics in quantum information and the theory of open quantum systems
PDF
Advancing the state of the art in quantum many-body physics simulations: Permutation Matrix Representation Quantum Monte Carlo and its Applications
PDF
Applications and error correction for adiabatic quantum optimization
PDF
Applications of quantum error-correcting codes to quantum information processing
PDF
Dynamic topology reconfiguration of Boltzmann machines on quantum annealers
PDF
Demonstration of error suppression and algorithmic quantum speedup on noisy-intermediate scale quantum computers
PDF
Error suppression in quantum annealing
PDF
Open quantum systems and error correction
PDF
Towards robust dynamical decoupling and high fidelity adiabatic quantum computation
PDF
Imposing classical symmetries on quantum operators with applications to optimization
PDF
Trainability, dynamics, and applications of quantum neural networks
Asset Metadata
Creator
Job, Joshua A.
(author)
Core Title
The theory and practice of benchmarking quantum annealers
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Physics
Publication Date
12/11/2018
Defense Date
10/18/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
benchmarking,OAI-PMH Harvest,quantum annealing,quantum computation
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lidar, Daniel (
committee chair
), Brun, Todd (
committee member
), Haas, Stephan (
committee member
), Spedalieri, Federico (
committee member
), Zanardi, Paolo (
committee member
)
Creator Email
jjob@usc.edu,joshjob42@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-111735
Unique identifier
UC11676885
Identifier
etd-JobJoshuaA-7004.pdf (filename),usctheses-c89-111735 (legacy record id)
Legacy Identifier
etd-JobJoshuaA-7004.pdf
Dmrecord
111735
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Job, Joshua A.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
benchmarking
quantum annealing
quantum computation