Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Applications and error correction for adiabatic quantum optimization
(USC Thesis Other)
Applications and error correction for adiabatic quantum optimization
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
APPLICATIONS AND ERROR CORRECTION FOR ADIABATIC QUANTUM OPTIMIZATION by Kristen Pudenz A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2014 Copyright 2014 Kristen Pudenz Table of Contents Dedication xiv Acknowledgements xv Abstract 1 1: Introduction and Background 3 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Adiabatic Quantum Optimization . . . . . . . . . . . . . . . . . . . 3 1.3 Machine Learning in the Adiabatic Context . . . . . . . . . . . . . 7 1.4 Error Correction in the Adiabatic Context . . . . . . . . . . . . . . 8 1.5 Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2: Anomaly Detection for Software Verification and Valida- tion 12 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Input and output spaces . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Recognizing software errors . . . . . . . . . . . . . . . . . 14 2.2.3 Validity domain and range . . . . . . . . . . . . . . . . . . 14 2.2.4 Specification and implementation sets . . . . . . . . . . . . 15 2.2.5 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Training a quantum software error classifier . . . . . . . . . . . . . 18 2.3.1 Weak classifiers . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.2 Strong classifier . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.3 The formal weight optimization problem . . . . . . . . . . . 23 2.3.4 Relaxed weight optimization problem . . . . . . . . . . . . 24 2.3.5 From QUBO to the Ising Hamiltonian . . . . . . . . . . . . 25 2.4 Achievable strong classifier accuracy . . . . . . . . . . . . . . . . . 27 2.4.1 Conditions for complete classification accuracy . . . . . . . 29 2.4.2 Perfect strong classifier theorem . . . . . . . . . . . . . . . 34 2.4.3 Imperfect strong classifier theorem . . . . . . . . . . . . . . 38 2.4.4 An alternate weight optimization problem . . . . . . . . . . 41 2.5 Using strong classifiers in quantum-parallel . . . . . . . . . . . . . 45 2.5.1 Using two strong binary classifiers to detect errors . . . . . . 45 2.5.2 Formal criterion . . . . . . . . . . . . . . . . . . . . . . . . 46 2.5.3 Relaxed criterion . . . . . . . . . . . . . . . . . . . . . . . 47 2.5.4 Adiabatic implementation of the relaxed criterion . . . . . . 48 2.5.5 Choosing the weak classifiers . . . . . . . . . . . . . . . . . 50 ii 2.5.6 QUBO-AQC quantum parallel testing . . . . . . . . . . . . 56 2.6 Sample problem implementation . . . . . . . . . . . . . . . . . . . 56 2.6.1 The Triplex Monitor Miscompare problem . . . . . . . . . . 57 2.6.2 Implemented algorithm . . . . . . . . . . . . . . . . . . . . 59 2.6.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . 62 2.6.4 Comparison of results with theory . . . . . . . . . . . . . . 65 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3: Quantum Annealing Correction 70 3.1 Quantum annealing and computational errors . . . . . . . . . . . . 70 3.2 Quantum annealing correction . . . . . . . . . . . . . . . . . . . . 71 3.3 Benchmarking using antiferromagnetic chains . . . . . . . . . . . . 74 3.4 Key experimental results – success probabilities . . . . . . . . . . . 77 3.5 Optimizing the penalty scale . . . . . . . . . . . . . . . . . . . . 77 3.6 Error mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.7 Conclusion and outlook . . . . . . . . . . . . . . . . . . . . . . . . 82 3.8 Hardware parameters of the DW1 and DW2 . . . . . . . . . . . . . 83 3.9 Proof that the encoded graph is non-planar . . . . . . . . . . . . . . 84 3.10 A classical independent errors model . . . . . . . . . . . . . . . . . 87 3.11 Comparison between the DW1 and DW2 . . . . . . . . . . . . . . . 89 3.12 Adiabatic master equation . . . . . . . . . . . . . . . . . . . . . . . 91 3.13 The role of the penalty qubit . . . . . . . . . . . . . . . . . . . . . 94 3.14 The role of the problem scale and the penalty scale . . . . . . . 97 3.14.1 The role of . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.14.2 The role of . . . . . . . . . . . . . . . . . . . . . . . . . 99 4: Randomized Benchmarking with QAC 107 4.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.2 Quantum annealing correction . . . . . . . . . . . . . . . . . . . . 107 4.3 Problem set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.4 Decoding Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.4.1 Two complementary decoding methods . . . . . . . . . . . 111 4.4.2 Behavior of a problem instance . . . . . . . . . . . . . . . . 111 4.5 Results on the benchmark set . . . . . . . . . . . . . . . . . . . . . 114 4.6 Robustness to qubit loss . . . . . . . . . . . . . . . . . . . . . . . . 115 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5: Conclusion 122 Bibliography 124 iii A: The D-Wave Device: Lessons from Experience 132 A.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.2 Physics of the implementation . . . . . . . . . . . . . . . . . . . . 133 A.3 Hardware layout and embedding . . . . . . . . . . . . . . . . . . . 134 A.3.1 The Ising model . . . . . . . . . . . . . . . . . . . . . . . . 134 A.3.2 The Chimera graph . . . . . . . . . . . . . . . . . . . . . . 135 A.4 Practical parameter setting . . . . . . . . . . . . . . . . . . . . . . 138 A.4.1 Thermalization time . . . . . . . . . . . . . . . . . . . . . . 138 A.4.2 Optimal annealing time . . . . . . . . . . . . . . . . . . . . 138 A.4.3 Problem specification issues and gauge averaging . . . . . . 139 A.4.4 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 A.4.5 Problem scaling . . . . . . . . . . . . . . . . . . . . . . . . 141 A.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 iv List of Figures 1.1 The D-Wave unit cell. The long rectangular loops represent qubits, and blue dots where they cross show the locations of couplers. Cou- plers also exist at the ends of each loop, connecting the qubits to their neighbors in the same plane. . . . . . . . . . . . . . . . . . . . 11 2.1 Schematic vector space representation showing regions of vectors satisfying the four definitions. Region 1, of erroneous but imple- mented vectors, is the location of errors. Regions 2, 3, and 4 repre- sent vectors which are correct and implemented, correct and unim- plemented, and erroneous and unimplemented, respectively. . . . . . 17 2.2 Illustration of Condition 2. Two pairs of classifiers showing regions of correct (green) and incorrect (red) classification along a line rep- resenting a lexicographical ordering of all vectors withinV. The top pair, compliant with Condition 2, provides two correct classi- fications for the minimum possible number of vectors, voting once correctly and once incorrectly on all other vectors. The bottom pair, violating Condition 2, provides two correct votes for more vectors than does the top pair, but also undesirably provides two incorrect votes for some vectors; this is why paired weak classifiers must coincide in their classifications on as few vectors as possible. . . . . 32 2.3 Illustration of Condition 3 without the subtracted term. Five pairs of 60% accurate weak classifiers combine to form a completely accu- rate majority-vote strong classifier. Moving from top to bottom through the pairs and from left to right along the vectors in the clas- sification space, each pair of weak classifiers provides two correct votes for 20% of the vector space and neutral votes otherwise. This means that the majority vote is correct for the entire space because no two pairs vote correctly at once. . . . . . . . . . . . . . . . . . . 32 2.4 Illustration of Condition 3 with the subtracted term. Three pairs of 70% accurate weak classifiers combined to form a completely accurate majority-vote strong classifier. In this case, each pair votes twice correctly on 40% of the vector space, which makes it nec- essary for the correct portions of the second and third pairs from the top to overlap. Because they only overlap by the minimum amount necessary,V as a whole is still covered by a correct majority vote. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 v 2.5 Illustration of Condition 3a. Two pairs and one single weak classi- fier form a completely accurate majority-vote strong classifier. The two pairs cover 40% of the vector space with correct votes, and the single weak classifier (the first element of the fourth pair in Fig. 2.3; the faded-out classifiers in the third, fourth, and fifth pairs are omitted from this strong classifier) provides an extra correct vote to tip the balance in the remaining 60% to a correct overall classifica- tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6 Error fractions in 16-member specification classifier calculations; Left: average over 50. Right: best of 50. . . . . . . . . . . . . . . . 59 2.7 Error fractions in 16-member implementation classifier calculations; Left: average over 50. Right: best of 50. . . . . . . . . . . . . . . . 59 2.8 Error fractions of specification (left) and implementation (right) clas- sifiers, for an increasing number of qubits. . . . . . . . . . . . . . . 62 2.9 Accuracy of weak classifier dictionary on input-output vector space. White/black pixels represent a weak classifierh i (~ x) (all weak clas- sifiers meeting Condition 1 indexed in order of increasing error j as in Eq. (2.87) on vertical axis) categorizing an input-output vec- tor (indexed in lexicographical order on horizontal axis, there are 2 9 vectors arising from the 9 Boolean variables in the sample prob- lem) correctly/incorrectly, respectively. These classifications were to determine whether an input-output pair was correct or erroneous, i.e., we are analyzing the performance of the specification classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.1 Unit cell and encoded graph. (a) Schematic of one of the 64 unit cells of the DW2 processor (see Section 3.8). Unit cells are arranged in an 8 8 array forming a “Chimera” graph between qubits. Each circle represents a physical qubit, and each line a programmable Ising coupling z i z j . Lines on the right (left) couple to the corresponding qubits in the neighboring unit cell to the right (above). (b) Two “logical qubits” (i, red andj, blue) embedded within a single unit cell. Qubits labeled 1-3 are the “problem qubits”, the opposing qubit of the same color labeled P is the “penalty qubit”. Problem qubits couple via the black lines with tunable strength both inter- and intra-unit cell. Light blue lines of magnitude are ferro- magnetic couplings between the problem qubits and their penalty qubit. (c) Encoded processor graph obtained from the Chimera graph by replacing each logical qubit by a circle. This is a non-planar graph (see Section 3.9 for a proof) with couplings of strength 3. Green circles represent com- plete logical qubits. Orange circles represent logical qubits lacking their penalty qubit (see Fig. 3.7). Red lines are groups of couplers that cannot all be simultaneously activated. . . . . . . . . . . . . . . . . . . . . . 71 vi 3.2 Success probabilities of the different strategies. Panels (a)-(c) show the results for antiferromagnetic chains as a function of chain length. The solid blue lines in the U case are best fits to 1=(1 +pN 2 ) (Lorentzian), yielding p = 1:94 10 4 ; 5:31 10 4 ; 3:41 10 3 for = 1; 0:6; 0:3 respec- tively. Panel (d) compares the U and QAC strategies atN = N = 86 and 2f0:1; 0:2;:::; 1:0g. Chains shown in Panel (a) respectively depict the U (i), C (ii), and EP and QAC (iii) cases. In (ii) and (iii), respec- tively, vertically aligned and coupled physical qubits of the same color form a logical qubit. Error bars in all our plots were calculated over the set of embeddings and express the standard error of the mean= p S, where 2 = 1 S P S i=1 (x i x) 2 is the sample variance and S is the number of samples. Additional details, including from experiments on a previous generation of the processor (the D-Wave One (DW1) “Rainier”) are given in Section 3.11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3 Effect of varying the penalty strength. Panel (a) shows the numerically calculated gap to the lowest relevant excited state for two antiferromagnet- ically coupled logical qubits for = 0:3 and different values of. Inset: undecoded (decoded) ground state probability P GS (P S ). Panel (b) (top) shows three configurations of two antiferromagnetically coupled logical qubits. Physical qubits denoted by heavy arrows point in the wrong direc- tion. In the left configuration both logical qubits have a bit-flip error, in the middle configuration only one logical qubit has a single bit-flip error, and in the right configuration one logical qubit is completely flipped. The corresponding degeneracies and gaps ( I ) from the final ground state are indicated, and the gaps plotted (bottom). . . . . . . . . . . . . . . . . 75 3.4 Experimental optimization of the penalty strength. The top (bottom) row shows color density plots of the experimental success probability of the EP (QAC) strategy as a function of2f0:1; 0:2;:::; 1:0g and N2 f2; 3;:::; 86g, at = 0:3 (left), 0:6 (middle) and 1 (right). The optimal values are indicated by the white dots. . . . . . . . . . . . . . . . . . . 76 3.5 Hamming distance histograms. Observed errors in encoded 86 qubit antiferromagnetic chains, at = 1 and the near-optimal = 0:2. Panel (a) is a histogram of Hamming distances from the nearest of the two degen- erate ground states, measured in terms of physical qubits. Inset: in terms of encoded qubits. The peaks at Hamming distance zero are cut off and extend to 63.6% (88.3%) for the physical (encoded) case. Panel (b) is a histogram of the errors as a function of logical qubit position (color scale) within the chain. Errors on encoded problem qubits are at Hamming dis- tance 1, 2 or 3. Flipped penalty qubits are shown in the inset. The mirror symmetry is due to averaging over the two equivalent chain directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 vii 3.6 Decodability analysis. The fraction of decodable states out of all states (color scale) observed at a given Hamming distance from the nearest degen- erate ground state (measured in physical qubits), and given energy above the ground state (in units ofJ ij = 1), for N = 86 and = 1. . . . . . . . 81 3.7 The connectivity graph of the D-Wave One (DW1) “Rainier” pro- cessor shown on the left consists of 4 4 unit cells of eight qubits (denoted by circles), connected by programmable inductive cou- plers (lines). The 108 green (red) circles denote functional (inac- tive) qubits. Most qubits connect to six other qubits. The D-Wave Two (DW2) Vesuvius processor shown on the right consists of 8 8 unit cells. The 503 green (red) circles denote functional (inactive) qubits. In the ideal case, where all qubits are functional and all cou- plers are present, one obtains the non-planar “Chimera” connectivity graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.8 The encoded DW1 graph. Symbols are the same as Fig. 3.1(c) in the main text for the encoded DW2 graph. In addition, grey circles represent logical qubits with defective data qubits that were not used in our experiments; the corresponding grey lines represent couplings that were not used. . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.9 The DW1 (a) and DW2 (b) annealing schedules. The functions A and B are the ones appearing in Eqs. (1.4) and (4.5). The solid horizontal black line is the operating temperature energy. . . . . . . 86 3.10 (a) A portion of the encoded graph over logical qubits. (b)-(d) Con- traction of paths in the original graph into edges. Paths consisting of two edges and a vertex are selected (represented in the figure as dotted lines), then contracted into a single edge connecting the ends of the chosen path (shown as a new solid line). . . . . . . . . . . . . 87 3.11 The condensed graph (a) is isomorphic to the standard representa- tion of theK 3;3 bipartite graph (b). . . . . . . . . . . . . . . . . . . 87 3.12 Recap of the results shown in Fig. 3.2 in the main text, for the U and QAC cases, for = 1 (top) = 0:6 (middle), and = 0:3 (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.13 Panels (a)-(c) are DW1 results, panel (d) is DW2 results. Shown are suc- cess probabilities for the various cases discussed in the main text as a func- tion of chain lengthN; N2 2;:::; 16, complementing the results shown in Fig. 3.2. Blue and purple lines are best fits to an independent errors model as described in Sec. 3.10. Panel (d) shows an additional DW2 data set, at = 0:9. The inset shows the same data on the same scale as in panel (c), to emphasize the improved performance of the DW2 over the DW1. The penalty scale was chosen as the experimental optimum for each and N in all cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 viii 3.14 The fraction of decodable states (color scale), for N = 16, = 0:3 and = 0:4 (optimal for DW1), vs the energy (in units ofJ ij = 1) of each observed state relative to the ground state and the Hamming distance from the nearest degenerate ground state, measured in physical qubits. Here we compare the DW1 (a) and the DW2 (b); opt = 0:2 for DW2. . . . . . . . 91 3.15 Experimental (circles) and simulation (crosses) results for an unen- coded AF chain with = 0:3 and annealing timet f = 20s. The effective system-bath coupling is = 6:37 10 6 . . . . . . . . . . 94 3.16 Effect of the penalty qubits. (a) Shown are adiabatic master equation sim- ulation results (with parameters = 3:18 10 4 and t f = 20s) as a function of , with = 0:3, for two antiferromagnetically coupled encoded qubits, one with a penalty qubit and one without (see diagram in upper left corner). Blue diamonds are the probability of a single flip in the problem qubits coupled to the penalty qubit (penalty side – PS), red cir- cles for the uncoupled case (no-penalty side – NPS). Inset: Simulated and experimental probability of Hamming distanced = 1; 2 for the encoded qubit with (PQ) or without (NPQ) the penalty qubit; “Pen” denotes the penalty qubit (the probability at d = 3 is negligible and is not shown). Good agreement is seen between the master equation and the experimental results. The NPQ case has substantially higher error probability, demon- strating the positive effect of the penalty qubit. (b) Similar to Fig. 3.5a, for an N = 16 qubit chain (20; 000 samples collected on the DW1). The logical qubits numbered 1, 7, and 16 did not have a penalty qubit, which is manifested by large peaks at these positions, at bothd = 1 andd = 2. The inset shows the percentage of errors on the penalty qubits (with numbers 1; 7; 16 missing). The d = 3 data illustrates that a flipped penalty qubit increases the probability of all problem qubits flipping even more than an absent penalty qubit. . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.17 Adiabatic master equation simulation results for the probability of finding the undecoded ground stateP GS and the decoded ground stateP S for two logical qubits. We set = 0:3, = 3:18 10 4 , andt f = 20s. . . . . . 96 3.18 The gap to the lowest relevant (2nd and higher) excited state for an antiferromagnetically coupled 8-qubit chain for = 0 and different values of. The gap increases and moves to the left as is increased (see Fig. 3.3a in the main text for a similar plot with fixed and varying). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.19 The gap to the first relevant excited state for the unperturbed (blue curve) Hamiltonian and the perturbed (red curve) Hamiltonian with ! = 1 and =! 0 = 0:1. . . . . . . . . . . . . . . . . . . . . . . . 102 3.20 The gap to the first relevant excited state for the unperturbed (blue curve) Hamiltonian and the perturbed (red curve) Hamiltonian with = 0:1 and! 0 = 0:001. . . . . . . . . . . . . . . . . . . . . . . . 103 ix 4.1 Problem sizes studied. We generated random problems over each of the regions shown in the boxes of increasing size above. The smallest rectangle contains 46 logical qubits, the next largest 66, then 86 and 112. These problem sizes were chosen because they consist of square blocks of unit cells, and the treewidth of a planar square lattice (which is the arrangement of the unit cells here) grows as a function of the smallest dimension of any rectangular region chosen [RS84]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2 Decoding methods. In logical group decoding, portrayed in panel (a), a majority vote takes place within each yellow or orange group- ing. This determines the value of each logical qubit in the decoded answer. For problem group decoding, shown in panel (b), the values of the physical qubits comprising each copy of the problem (boxed in blue or green) are examined as a group to see if any problem copy has the correct solution. . . . . . . . . . . . . . . . . . . . . . . . . 112 4.3 Decodability of observed states. The plot shows the energy and Hamming distance relative to the ground state of all observed states in 1000 annealing cycles of one problem instance (one far outlier state has been cropped out). States colored red are decodable by both methods, and tend to be low in Hamming distance. States col- ored light blue (yellow) are decodable by logical (problem) group only, and occupy higher Hamming distances, with logical group decodable states generally higher in energy than problem group decod- able states. Dark blue states are undecodable, and cluster in groups which represent sets of logical qubits flipping together (see Figure 4.7). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.4 A state decodable by both methods. In this typical example, we see two single bit flips (blue dots) isolated among many logical groups with all their physical qubits aligned with the ground state (green dots). The state is logical group decodable because single bit flips are majority-vote correctable. It is also problem group decodable because there are only two bit flips, which is not enough to ruin all three copies of the problem embedded within the QAC scheme. The magnitude (though not the sign) of the problem couplings is also shown on the diagram: pink lines indicatejJ ij j = 1 6 , and the shade of the line darkens through gradations to indigo forjJijj = 1. . . . . 114 4.5 A logical group decodable state. Note the many single bit flips that were read out. These are all decodable via majority vote, but the state is not problem group decodable because each problem group is corrupted by at least one bit flip. . . . . . . . . . . . . . . . . . . 115 x 4.6 A problem group decodable state. This state has two logical qubits in the upper right hand corner that are loosely coupled to the rest of the problem (pink line,jJ ij j = 1 6 ) and which each have three physical qubits flipped from the ground state values (orange dots). In this case, two of the three problem qubits have flipped for those two groups, taking the penalty qubits with them. The problem qubits that flip are correlated between the two groups; they belong to the same copy of the problem because the problem coupling is strong between counterpart problem qubits (purple line). This leaves one copy of the problem fully intact, and the state can be decoded using problem group decoding. . . . . . . . . . . . . . . . . . . . . . . . 116 4.7 An undecodable state. Examine the cluster of logical qubit flips (red dots) in the lower right hand corner. Here we see that region is loosely coupled to the rest of the problem; all links going out of it are weak pink couplings. This means that the state with these logical qubits flipped together is a low-lying final excited state of the logical Ising problem, which has been suppressed via the repetition energy scale enhancement portion of QAC but is still observable in the problem instance’s output statistics. This state belongs to the cluster we see near Hamming weight 20 in Figure 4.3. . . . . . . . . 117 4.8 Scaling of time to solution. We plot time to solution (defined as the expected number of annealing cycles necessary to observe a ground state once with 99% probability) as a function of the square root of problem size for both the C (solid lines) and QAC (dotted lines) strategies. Various percentiles of difficulty are included, as shown in the figure key. We observe better performance for the QAC strategy at all percentiles for the largest problem size. . . . . . . . . . . . . . 118 4.9 Results for N = 112. Each point represents one of 1000 random problem instances at this size. QAC outperforms the C strategy for approximately 99% of instances studied, in some cases by a very wide margin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.10 Effect of missing penalty qubits. While the percentiles are broken out into several panels here, the blue (C strategy) and green (QAC strategy) lines in each are the same data that was displayed in Fig- ure 4.8. The new red, teal, and purple series show the effects of randomly removing 30%, 60%, and 90% of the penalty qubits from the QAC Hamiltonians, respectively. We see that the 30% and 60% lines track the original QAC lines closely, suggesting the code is highly resilient to qubit loss, if the inoperative physical qubits can be embedded as penalty qubits in the encoding. . . . . . . . . . . . 120 xi 4.11 Optimal beta with qubit loss. In this histogram, we display the experimentally observed value of the penalty scaling parameter from the set of values we tried:f0:1; 0:2; 0:3; 0:4; 0:5g. The true optimal for the perfect QAC scheme lies between 0:1 and 0:2, but for 30% of penalty qubits missing the optimum shifts to 0:2, for 60% missing between 0:2 and 0:3, and for 90% missing to the max- imum allowed value. We conclude that the stronger couplings in the remaining intact logical groups resulting from these higher opti- mal penalty scaling values make up for the missing penalty qubits in other logical groups. . . . . . . . . . . . . . . . . . . . . . . . . 121 A.1 Graph of the qubits (green/gray circles represent operational/inaccessible qubits) and couplers (black lines) available on the USC/ISI D-Wave Two Vesuvius chip. This layout, composed of eight qubit bipartite unit cells connected in a grid, is known as the Chimera graph. . . . . 143 A.2 Coupling parameter as a function of control variable. The curve is much steeper in the ferromagnetic regime (negative coupling val- ues), and the digital to analog converter takes values evenly spaced along the horizontal axis to set the coupling. This enables more precision in the choice of an antiferromagnetic coupling than a fer- romagnetic coupling. . . . . . . . . . . . . . . . . . . . . . . . . . 144 xii List of Tables 2.1 All 16 Boolean functionsf i of two binary variables, and their imple- mentation form in terms of the Pauli matricesZ i j acting on single qubits or pairs of qubitsj2f1; 2; 3g. The subscripta in the Imple- mentation Form column denotes an ancilla qubit, tied to qubits i 1 and i 2 via x a = x i 1 x i 2 , used to reduce all qubit interactions to at most two-body. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.2 Logical bits and their significance in terms of variable comparison in the Triplex Miscompare problem. . . . . . . . . . . . . . . . . . 58 xiii “You have to get your education, because nobody can ever take that away from you. ” Jean Pudenz “As I stood in the farmhouse yard, raking leaves, I was trying to figure out how we could put all of our children through college. ” Myrna Snyder To my family, for their unwavering support. xiv Acknowledgements I am thankful for the many people who have helped me complete this work. I certainly could not have done it alone. My first gratitude is to my parents, Dan and Cathy Pudenz, who encouraged me every step of the way. I didn’t appreciate until I was older that not everyone grows up surrounded by an endless supply of library books, and I’m quite sure I was the only child in the neighborhood with an engineer-themed coloring book. I thank my siblings, Lauren and Alex, for teaching me humility when I was young, tolerating my demands on the family’s time as we grew, and being the most understanding friends anyone could ask for as adults. My grandparents, Clair and Myrna Snyder and LeRoy and Jean Pudenz, also pushed me to develop my mind. My family’s greatest gift was to teach me that if I worked hard and learned well, I could be the equal of anyone. I cannot imagine a better doctoral advisor than Daniel Lidar. He truly listens to his students in discussions, and cares about their professional development. He let me take a chance on an unconventional project, pushed me to keep my work rigorous, then helped me to promote it in the scientific community. My studies would not have been the same with anyone else. I’m grateful to my fellow graduate students and postdocs Soraya Taghavi, Wan-Jung Kuo, Laura Edwards, Diana Warren, Zhihui Wang, Greg Quiroz, Milad Marvian, Josh Job, Anurag Mishra, Siddharth Muthu Krishnan, and Jason Dominy for their knowl- edge and their friendship. Tameem Albash, my coauthor for the error correction papers, deserves special acknowledgement for his patience and excellent ideas. I also appre- ciate numerous constructive discussions with my colleagues Federico Spedalieri, Greg Ver Steeg, Itay Hen, and Bob Lucas at the Information Sciences Institute. xv Finally, I must thank my partner, Pablo Maurin, for his constant support and dedi- cation. He has shared in and moderated all my graduate school stresses without com- plaint. He strengthens me when I feel inadequate, helps me when I am overwhelmed, and always encourages me to pursue my dreams at every opportunity. I am incredibly grateful. xvi Abstract Adiabatic quantum optimization (AQO) is a fast-developing subfield of quantum infor- mation processing which holds great promise in the relatively near future. Here we develop an application, quantum anomaly detection, and an error correction code, Quan- tum Annealing Correction (QAC), for use with AQO. The motivation for the anomaly detection algorithm is the problematic nature of classical software verification and vali- dation (V&V). The number of lines of code written for safety-critical applications such as cars and aircraft increases each year, and with it the cost of finding errors grows exponentially (the cost of overlooking errors, which can be measured in human safety, is arguably even higher). We approach the V&V problem by using a quantum machine learning algorithm to identify charateristics of software operations that are implemented outside of specifications, then define an AQO to return these anomalous operations as its result. Our error correction work is the first large-scale experimental demonstration of quantum error correcting codes. We develop QAC and apply it to USC’s equipment, the first and second generation of commercially available D-Wave AQO processors. We first show comprehensive experimental results for the code’s performance on anti- ferromagnetic chains, scaling the problem size up to 86 logical qubits (344 physical qubits) and recovering significant encoded success rates even when the unencoded suc- cess rates drop to almost nothing. A broader set of randomized benchmarking problems is then introduced, for which we observe similar behavior to the antiferromagnetic chain, specifically that the use of QAC is almost always advantageous for problems of suffi- cient size and difficulty. Along the way, we develop problem-specific optimizations for the code and gain insight into the various on-chip error mechanisms (most prominently thermal noise, since the hardware operates at finite temperature) and the ways QAC 1 counteracts them. We finish by showing that the scheme is robust to qubit loss on-chip, a significant benefit when considering an implemented system. 2 Chapter 1 Introduction and Background 1.1 Motivation It has long been the hope of researchers that quantum computing could provide a means for the expedited solution of certain (classically) hard optimization problems. Over the past 30 years, the advancement of the field has included the development of quan- tum computation beyond the original circuit model [R.P82, Ben80, FGGS00, M.H03, KLM01, R. 01, ZR99], error analysis and correction techniques for the various models, and many potential physical implementations of these systems. Now is a particularly fascinating time for engineers interested in quantum computing because the hardware is beginning to scale beyond a few isolated qubits to tens or even hundreds of interacting devices [JAG + 11]. With this sort of equipment, real-world problems and applications can begin to be addressed and the proposed solutions tested in a practical setting. In this work, we present publications exploring both an application (anomaly detection) and the solution to a real-world problem (error correcting codes to address environmental and device noise), both within the context of a particularly well-developed superconducting processor designed to implement a limited quantum optimization device. 1.2 Adiabatic Quantum Optimization Adiabatic quantum computation (AQC) is well-known as a viable model for universal quantum computation [AvDK + 07, MLM07] that exhibits a certain degree of natural 3 robustness to environmental noise [CFP01, AAN09]. In general, an AQC consists of initialization of the system into the known ground state of an initial Hamiltonian H I , which then undergoes a slow transition using available control parameters into a final HamiltonianH F whose ground state embodies the solution to a computational problem, H AQC (t) =A(t)H I +B(t)H F (1.1) where A(0) B(0) and A(T ) B(T ), t = t f being the end of the computation. The fidelity of the final state of the system to the ground state of H F depends on the minimum gap between the ground state and the first excited state over the course of the adiabatic evolution, the rate at which the Hamiltonian changes, and environmental noise allowing for jumps between states in the instantaneous eigenbasis [FGGS00, DJA + 13, ALT08, CFP01, SL05, ABLZ12, LRH09]. The closed-system adiabatic condition is often stated as max t2[0;t f ] jh 1 (t)j@ t H AQC (t)j 0 (t)ij min t2[0;t f ] 2 (t) (1.2) wherej 0 (t)i andj 1 (t)i represent the time-dependent ground and first excited states of H AQC , respectively, (t) represents the energy gap between the two states, and the satisfied condition guarantees that the fidelityF between the actual final state and the ground state ofH F is greater than or equal to p 1 2 [A. 99]. Adiabatic quantum optimization (AQO) is a form of AQC which seeks to solve a more limited class of problems. This strategy is also sometimes called quantum anneal- ing (QA) in the literature. In this work, we will use the terms AQO and QA inter- changeably to describe the same type of problem solving. Specifically, the targeted problem class is the ground state of the Ising model over a 2-local Hamiltonian with 4 programmable individual biases and couplings, or, equivalently, quadratic binary opti- mization problems (QUBOs). A QUBO is typically posed as x opt = arg min ~ x (~ x 0 Q~ x) (1.3) where~ x is a vector of binary variables and the matrixQ represents quadratic relation- ships between them. The solution of this problem is accomplished by the implementa- tion of an AQO which has the related Ising spin glass as its final Hamiltonian: H AQO (t) =A(t)H X +B(t)H Ising : (1.4) Here, the initial Hamiltonian is specified to have as its ground state the uniform super- position in the computational basis, H X = X i x i : (1.5) The final, Ising Hamiltonian allows for tunable individual biases on each qubit and com- putational basis couplings where physically feasible, H Ising = X i h i z i + X i;j J ij z i z j : (1.6) In accordance with standard conventions in the field, z i and x i denote the spin-1=2 Pauli operators whose eigenstates are, respectively,j0i;j1i andji = (j0ij1i)= p 2, with eigenvalues1. If the local fieldsh i are nonzero or the set of allowed couplings,J ij 6= 0, defines a non-planar graph over the qubits consisting of at least two planar square lattices connected in the third dimension by edges between corresponding vertices, this Ising problem is known to be NP-hard, and therefore computationally interesting [Bar82]. 5 A physical AQO always operates in the presence of a thermal environment at tem- peratureT . ProvidedA(0) k B T , the AQO is initialized in the ground state ofH X , namely the uniform superposition state (j0 0i + +j1 1i)= p 2 N . Provided B(t f ) k B T , the final state at the end of the annealing process is stable against thermal excitations when it is measured. If the evolution is adiabatic, i.e., if H(t) is a smooth function of time and if the gap := min t2[0;t f ] 1 (t) 0 (t) between the first excited state energy 1 (t) and the ground state energy 0 (t) is sufficiently large compared to both 1=t f andT , then the adiabatic approximation for open systems [CFP01, SL05, ABLZ12, DAAS13] guarantees that the desired ground state of H Ising will be reached with high fidelity att f . One reason for the current level of interest in AQO/QA is the existence of com- mercially available quantum annealing processors manufactured by D-Wave Systems [JAG + 11, DJA + 13]. It is still unknown whether these devices exhibit any quantum speedup over known classical optimization algorithms, but they are the first of their kind and provide a useful platform for the investigation of the physics, architectural concerns, and applications of quantum annealers [LPS + 14, SQSL13]. Benchmarking the devices on random Ising models has provided useful data to highlight the contrasts between the data from the chips and classical alternatives, but has failed to produce conclusive results regarding the existence or impossibility of a quantum speedup [RWJ + 14, BRI + 14]. In fact, there have been several proposals of classical models for the behavior of the device, most notably those involving a classical magnetic system as an analog computer to solve the Ising model [SS13, SSSV14], but none has satisfactorily replicated all the data col- lected from the chip [WRB + 13, V AM + 14]. Additional work to experimentally bound the state of an eight-qubit unit cell of the system in mid-computation produced an entan- glement witness [LPS + 14], but whether the intended quantum operation of the chip as 6 a whole has been achieved is still a matter of debate within the quantum computing community. 1.3 Machine Learning in the Adiabatic Context The first published work on machine learning as applied to AQO was the result of a collaboration between D-Wave and Google, the goal of which was to create a better image classifier. A D-Wave One (Rainier) device was used to optimize a combination of many less-powerful image classification criteria from an extensive library developed by Google. This sort of optimization is referred to as “boosting” by the machine learning community. The result of the collaboration was a classifier to recognize the presence or absence of cars in images which was by certain measures more lightweight than (but just as accurate as) similar classifiers generated by classical boosting methods. The authors subsequently refined their methodology, and this sort of machine learning is currently an active line of research at Google. Machine learning has also been studied in a more general quantum computing con- text. Recently developed algorithms for the supervised and unsupervised clustering problems in machine learning claim an exponential speedup in the context of circuit model quantum computation and provide a second formulation for adiabatic quantum computing without rigorous scaling claims [LMR13]. In this dissertation, Chapter 2 contains the material from a published paper which extends the boosting framework developed by the D-Wave/Google collaboration. A quantum anomaly detection step which can be used with the results of any properly posed boosting step is developed and applied to the problem of software verification and validation. Additionally, a second formulation of the boosting problem suitable for AQO is explored, based on provable optimality criteria. 7 1.4 Error Correction in the Adiabatic Context Quantum information processing offers dramatic speedups, yet is famously suscepti- ble to decoherence, the process whereby quantum superpositions decay into mutually exclusive classical alternatives, thus robbing quantum computers of their power. This has made the development of quantum error correction an essential and inescapable aspect of both theoretical and experimental quantum computing. The science of error correction for open systems implementing adiabatic processes was sparse when the work presented in this dissertation was begun, but has attracted some interest in recent years. The primary work on applying quantum error correcting codes to AQC is due to Jordan, Farhi, and Shor [JFS06]. The authors apply several error detecting codes to the entire Hamiltonian of an AQC, producing a gap enhancement to protect against 1- and 2- local errors, depending on the code. The difficulty in physically realizing this important theoretical result lies in the fact that the stabilizer generators of the codes, which must be implemented as part of the encoded Hamiltonian to produce the gap enhancement, are 4-local, something that is far from trivial to implement. Dynamical decoupling has also been explored as a strategy to protect AQC against decoherence, the idea being to rapidly apply decoupling pulses which commute with the adiabatic portion of the Hamiltonian, so as not to disturb the smoothness of the AQC. To achieve this, the AQC is again encoded in an error detecting quantum stabilizer code which allows the application of DD pulses from the stabilizer group to combat errors which take the AQC out of the codespace. Theory and simulated applications of this strategy have been realized with 2-local Hamiltonians, a positive development for implementability [Lid08, QL12] Not all the news regarding error correction schemes for AQC has been good. A series of papers by Young and collaborators has shown that the error suppression effects 8 of gap enhancement due to stabilizer terms, dynamical decoupling, and Zeno effect sta- bilizer measurements are equivalent in power, but argue that such suppression efforts are insufficient for true fault tolerance, which would require some form of active error cor- rection and greater than 2-local interactions [YSBK13, SY13]. Finding such an active scheme is difficult to reconcile with the smooth, continuous controls required to satisfy the adiabatic criterion, and remains an open question. This does not mean that error suppression is without its benefits, however. Gap enhancement techniques using repetition codes have recently been shown to be effec- tive against noise in the specification of the final Hamiltonian [YBKL13]. A new set of implementation-friendly two-local codes has been developed for use in AQC [GOY13]. A scheme with no gap enhancement which embeds a fault-tolerant circuit model calcu- lation in an AQC by using extra qubits (in addition to those needed for the fault toler- ance portion of the circuit) to compute a history state for the calculation is also argued to make the resultant AQC fault tolerant due to an ability to recover the solution from theoretically accessible excited states [Miz14]. In Chapters 3 (representing a published paper) and 4 of this dissertation, we develop error correction for quantum annealing and provide an experimental demonstration using up to 344 superconducting flux qubits in processors which have recently been shown to physically implement programmable quantum annealing. We demonstrate a substantial improvement over the performance of the processors in the absence of error correction. These results pave a path toward large scale noise-protected adiabatic quan- tum optimization devices. 9 1.5 Equipment The work presented here was designed for, and implemented on, the quantum annealing hardware commercially available from D-Wave Systems and installed at the Information Sciences Institute (ISI) of the University of Southern California (USC). The technology has been described extensively in the literature [JAG + 11, HJB + 10, HJL + 10], but we will give a brief overview here. D-Wave’s hardware implements the sort of AQO described in the previous section using superconducting hardware. Each qubit consists of a loop of niobium interrupted by a tunable Josephson junction. The state of the qubit is determined by the direction of current flow around the loop. The qubit can be biased to prefer one direction over the other by applying magnetic flux through the body of the superconducting loop, thereby inducing an enhancement in one current direction and suppressing the other. Couplers are physically similar to qubits, also consisting of superconducting nio- bium loops. Placing a coupler loop adjacent to two qubit loops introduces a mutual inductance between the two qubits, which is tunable in magnitude and sign through the application of magnetic flux. This architecture clearly favors coupling of physically adjacent qubits, and indeed this is how the chip layout is designed. Qubits are man- ufactured on two different layers of the semiconductor fabrication process; one layer holds horizontal loops and the other vertical loops. Four horizontal and four vertical loops evenly spaced and aligned so their layers are stacked comprise a hardware unit cell, with couplers placed in between the “horizontal” and “vertical” layers where the qubit loops cross each other. The processor is a set of unit cells arranged on a plane, which connect with each other via couplers placed between neighboring horizontal (or vertical) qubits within the same layer. This is illustrated in Figure 1.1. For a more in-depth treatment of the experimental setup and the knowledge gained from its use, please refer to the Appendix. 10 Figure 1.1: The D-Wave unit cell. The long rectangular loops represent qubits, and blue dots where they cross show the locations of couplers. Couplers also exist at the ends of each loop, connecting the qubits to their neighbors in the same plane. 11 Chapter 2 Anomaly Detection for Software Verification and Validation 2.1 Overview We propose a new approach to verification and validation of software which makes use of quantum information processing. The approach consists of a quantum learning step and a quantum testing step. In the learning step, our strategy uses quantum optimization to learn the characteristics of the program being tested and the specification it is being tested to. This learning technique is known as quantum boosting and has been previously applied to other problems, in particular image recognition [NRM, NDRMa, NDRMb, BCMR]. Boosting consists of building up a formula to accurately sort inputs into one of two groups by combining simple rules that sort less accurately, and in its classical forms has been frequently addressed in the machine learning literature [Mei03, Sch90, FSA99]. The testing step is novel, and involves turning the classifying formulas generated by the learning step into a function that generates a lower energy the more likely its input is to represent a software error. This function is translated into the problem Hamiltonian of an adiabatic quantum computation (AQC). The AQC allows all potential software errors (indeed, as we will see, all possible operations of the software) to be examined in quantum-parallel, returning only the best candidates for errors which correspond to the lowest values of the classification function. 12 In this chapter, Section 2.2 will begin by establishing the framework through which the quantum V&V problem is attacked, and by defining the programming errors we seek to eliminate. As we proceed with the development of a method for V&V using quantum resources, Section 2.3 will establish an implementation of the learning step as an adiabatic quantum algorithm. We develop conditions for ideal boosting and an alternate quantum learning algorithm in Section 2.4. The testing step will be detailed in Section 2.5. We present simulated results of the learning step on a sample problem in Section 2.6, and finish with our conclusions and suggestions for future work in Section 2.7. 2.2 Formalization In this section we formalize the problem of software error detection by first introducing the relevant vector spaces and then giving a criterion for the occurrence of an error. 2.2.1 Input and output spaces Consider an “ideal” software program ^ P , where by ideal we mean the correct program which a perfect programmer would have written. Instead we are faced with the real life implementation of ^ P , which we denote byP and refer to as the “implemented program.” Suppose we wish to verify the operation of P relative to ^ P . All programs have input and output spacesV in andV out , such that P :V in 7!V out : (2.1) Without loss of generality we can think of these spaces as being spaces of binary strings. This is so because the input to any program is always specified within some finite 13 machine precision, and the output is again given within finite machine precision (not necessarily the same as the input precision). Further, since we are only interested in inputs and outputs which take a finite time to generate (or “write down”), without loss of generality we can set upper limits on the lengths of allowed input and output strings. Within these constraints we can move to a binary representation for both input and out- put spaces, and takeN in as the maximum number of bits required to specify any input, andN out as the maximum number of bits required to specify any output. Thus we can identify the input and output spaces as binary string spaces V in =f0; 1g N in ; V out =f0; 1g Nout : (2.2) It will be convenient to concatenate elements of the input and output spaces into single objects. Thus, consider binary vectors ~ x = (~ x in ;~ x out ), where ~ x out = P (~ x in ), consisting of program input-output pairs: ~ x2f0; 1g N in f0; 1g Nout =f0; 1g N in +Nout V: (2.3) 2.2.2 Recognizing software errors 2.2.3 Validity domain and range We shall assume without loss of generality that the input spaces of the ideal and imple- mented programs are identical. This can always be ensured by extending the ideal pro- gram so that it is well defined for all elements ofV in . Thus, while in general not all elements ofV in have to be allowed inputs into ^ P (for example, an input vector that is out of range for the ideal program), one can always reserve some fixed value for such inputs 14 (e.g., the largest vector inV out ) and trivially mark them as errors. The ideal program ^ P is thus a map from the input space to the spaceR out of correct outputs: ^ P :V in 7!R out V out : (2.4) More specifically, ^ P computes an output string ^ x out for every input string~ x in , i.e., we can write ^ x out = ^ P (~ x in ). Of course this map can be many-to-one (non-injective and surjective), but not one-to-many (multi-valued). 1 The implemented programP should ideally compute the exact same function. In reality it may not. With this in mind, the simplest way to identify a software error is to find an input vector~ x in such that k ^ P (~ x in )P (~ x in )k6= 0: (2.5) in some appropriate norm. This is clearly a sufficient condition for an error, since the implemented program must agree with the ideal program on all inputs. However, for our purposes a more general approach will prove to be more suitable. 2.2.4 Specification and implementation sets A direct way to think about the existence of errors in a software program is to consider two ordered sets within the space of input-output pairs,V. These are the set of ordered, correct input-output pairs ^ S according to the program specification ^ P , and the set of input-output pairsS implemented by the real programP . We call ^ S the “specification set” andS the “implementation set”. The program under test is correct when ^ S =S: (2.6) 1 Random number generation may appear to be a counterexample, as it is multi-valued, but only over different calls to the random-number generator. 15 That is, in a correct program, the specification set of correct input-output pairs is exactly the set that is implemented in code. As stated, (2.6) is impractical since it requires knowledge of the complete structure of the intended input and output spaces. Instead, we can also use the specification and implementation sets to give a correctness criterion for a given input-output pair: Definition 1 A vector~ x2V is erroneous and implemented if ~ x = 2 ^ S & ~ x2S: (2.7) Input-output vectors satisfying (2.7) are the manifestation of software errors (“bugs”) and their identification is the main problem we are concerned with here. Conversely, we have Definition 2 A vector~ x2V is correct and implemented if ~ x2 ^ S & ~ x2S: (2.8) Input-output vectors satisfying (2.8) belong to the “don’t-worry” class. The two other possibilities belong to the “don’t-care” class: Definition 3 A vector~ x2V is correct and unimplemented if ~ x2 ^ S & ~ x = 2S: (2.9) 16 Definition 4 A vector~ x2V is erroneous and unimplemented if ~ x = 2 ^ S & ~ x = 2S: (2.10) A representation of the locations of vectors satisfying the four definitions for a sam- ple vector space can be found in Fig. 2.1. Our focus will be on the erroneous vectors of Definition 1. 1: errors Ŝ S V 2 3 4 Figure 2.1: Schematic vector space representation showing regions of vectors satisfying the four definitions. Region 1, of erroneous but implemented vectors, is the location of errors. Regions 2, 3, and 4 represent vectors which are correct and implemented, correct and unimplemented, and erroneous and unimplemented, respectively. Note that Eq. (2.5) implies that the vector is erroneous and implemented, i.e., Defini- tion 1. Indeed, let~ x out =P (~ x in ), i.e.,~ x = (~ x in ;~ x out )2S, but assume that~ x out 6= ^ x out where ^ x out = ^ P (~ x in ). Then ~ x = 2 ^ S, since ~ x in pairs up with ^ x out in ^ S. Conversely, Definition 1 implies Eq. (2.5). To see this, assume that ~ x = (~ x in ;~ x out ) 2 S but ~ x = (~ x in ;~ x out ) = 2 ^ S. This must mean that ~ x out 6= ^ x out , again because ~ x in pairs up with ^ x out in ^ S. Thus Eq. (2.5) is in fact equivalent to Definition 1, but does not capture the other three possibilities captured by Definitions 2-4. Definitions 1-4 will play a central role in our approach to quantum V&V . 17 2.2.5 Generalizations Note that it may well be advantageous in practice to consider a more general setup, where instead of studying only the map from the input to the output space, we intro- duce intermediate maps which track intermediate program states. This can significantly improve our error classification accuracy. 2 Formally, this would mean that Eq. (2.4) is replaced by ^ P :V in 7!I 1 7!7!I J 7!R out ; (2.11) wherefI j g J j=1 are intermediate spaces. However, we shall not consider this more refined approach in this work. As a final general comment, we reiterate that a solution of the problem we have defined has implications beyond V&V . Namely, Definitions 1-4 capture a broad class of anomaly (or outlier) detection problems [CBK09]. From this perspective the approach we detail in what follows can be described as “quantum anomaly detection,” and could be pursued in any application which requires the batch processing of a large data space to find a few anomalous elements. 2.3 Training a quantum software error classifier In this section we discuss how to identify whether a given set of input-output pairs is erroneous or correct, and implemented or unimplemented, as per Definitions 1-4. To this end we shall require so-called weak classifiers, a strong classifier, a methodology to efficiently train the strong classifier, and a way to efficiently apply the trained strong 2 One important consideration is that, as we shall see below, for practical reasons we may only be able to track errors at the level of one-bit errors and correlations between bit-pairs. Such limited tracking can be alleviated to some extent by using intermediate spaces, where higher order correlations between bits appearing at the level of the output space may not yet have had time to develop. 18 classifier on all possible input-output pairs. Both the training step and the application step will potentially benefit from a quantum speedup. 2.3.1 Weak classifiers Consider a class of functions which map from the input-output space to the reals: h i :V7!R: (2.12) We call these functions “weak classifiers” or “feature detectors,” wherei2f1;:::;Ng enumerates the features. These are some predetermined useful aggregate characteristics of the programP which we can measure, such as total memory, or CPU time average [SLS + ]. Note that N will turn out to be the number of qubits we shall require in our quantum approach. We can now formally associate a weak classification with each vector in the input- output space. Definition 5 Weak classification of~ x2V. Weakly classified correct (WCC): a vector~ x is WCC ifh i (~ x)> 0. Weakly classified erroneous (WCE): a vector~ x is WCE ifh i (~ x)< 0. Clearly, there is an advantage to finding “smart” weak classifiers, so as to minimize N. This can be done by invoking heuristics, or via a systematic approach such as one we present below. For each input-output pair ~ x we have a vector ~ h(~ x) = (h 1 (~ x);:::;h N (~ x))2 R N . Such vectors can be used to construct geometric representations of the learning problem, e.g., a convex hull encompassing the weak classifier vectors of clustered correct input- output pairs. Such a computational geometry approach was pursued in [SLS + ]. 19 We assume that we can construct a “training set” T f~ x s ;y s g S s=1 ; (2.13) where each ~ x s 2 V is an input-output pair and y s = y(~ x s ) = +1 iff ~ x s is correct (whether implemented or not, i.e.,~ x s 2 ^ S) whiley s =1 iff~ x s is erroneous (again, implemented or not, i.e.,~ x s = 2 ^ S). Thus, the training set represents the ideal program ^ P , i.e., we assume that the training set can be completely trusted. Note that Eq. (2.4) presents us with an easy method for including erroneous input pairs, by deliberately misrepresenting the action of ^ P on some given input, e.g., by setting~ x out = 2R out ( ^ P ). This is similar to the idea of performing V&V by building invariants into a program [TBJ06]. We are free to normalize each weak classifier so thath i 2 [1=N; 1=N] (the reason for this will become clear below). Given Definition 5 we choose the sign of each weak classifier so that h i (~ x s ) < 0 for all erroneous training data, while h i (~ x s ) > 0 for all correct training data. Each point ~ h(~ x s )2 [1=N; 1=N] N (a hypercube) has associated with it a labely s which indicates whether the point is correct or erroneous. The convex hull approach to V&V [SLS + ] assumes that correct training points ~ h(~ x s ) cluster. Such an assumption is not required in our approach. 2.3.2 Strong classifier We would like to combine all the weak classifiers into a single “strong classifier” which, given an input-output pair, will determine that pair’s correctness or erroneousness. The problem is that we do not know in advance how to rank the weak classifiers by rela- tive importance. We can formally solve this problem by associating a weightw i 2R 20 with each weak classifierh i . The problem then becomes how to find the optimal set of weights, given the training set. The process of creating a high-performance strong classifier from many less accu- rate weak classifiers is known as boosting in the machine learning literature. Boosting is a known method for enhancing to arbitrary levels the performance of known sets of clas- sifiers that exhibit weak learnability for a problem, i.e., they are accurate on more than half of the training set [Sch90, MM01]. The most efficient method to combine weak classifiers into a strong classifier of a given accuracy is an open question, and there are many competing algorithms available for this purpose [Kot07, YL04]. Issues com- monly considered in the development of such algorithms include identification of the data features that are relevant to the classification problem at hand [ZZY03, CYHH07] and whether or not provisions need to be taken to avoid overfitting to the training set (causing poor performance on the general problem space) [Bre98, BEHW87]. We use an approach inspired by recent quantum boosting results on image recognition [NRM, NDRMa, NDRMb, BCMR]. This approach has been shown to outperform clas- sical boosting algorithms in terms of accuracy (but not speed) on selected problems, and has the advantage of being implementable on existing quantum optimization hardware [BL08, Cho08, KDH + 11, HJL + 10]. Since we shall map thew i to qubits we use binary weightsw i 2f0; 1g. It should be straightforward to generalize our approach to a higher resolution version of real-valued w i using multiple qubits per weight. Let ~ w = (w 1 ;:::;w N )2f0; 1g N , and let R ~ w (~ x) ~ w ~ h(~ x) = N X i=1 w i h i (~ x)2 [1; 1]: (2.14) This range is a direct result of the normalizationh i 2 [1=N; 1=N] introduced above. 21 We now define the weight-dependent “strong classifier” Q ~ w (~ x) sign [R ~ w (~ x)]; (2.15) and use it as follows: Definition 6 Strong classification of~ x2V. Strongly classified correct (SCC): a vector~ x is SCC ifQ ~ w (~ x) = +1. Strongly classified erroneous (SCE): a vector~ x is SCE ifQ ~ w (~ x) =1. There is a fundamental difference between the “opinions” of the strong classifier, as expressed in Definition 6, and the actual erroneousness/correctness of a given input- output pair. The strong classifier associates an erroneous/correct label with a given input-output pair according to a weighted average of the weak classifiers. This opinion may or may not be correct. For the training set we actually know whether a given input- output pair is erroneous or correct. This presents us with an opportunity to compare the strong classifier to the training data. Namely, if y s Q ~ w (~ x s ) =1 then Q ~ w (~ x s ) and y s have opposite sign, i.e., disagree, which means thatQ ~ w (~ x s ) mistakenly classified~ x s as a correct input-output pair while in fact it was erroneous, or vice versa. On the other hand, ify s Q ~ w (~ x s ) = +1 thenQ ~ w (~ x s ) andy s agree, which means thatQ ~ w (~ x s ) is correct. Formally, y s Q ~ w (~ x s ) = +1() 8 > > < > > : (~ x s is SCC) = true or (~ x s is SCE) = true (2.16a) y s Q ~ w (~ x s ) =1() 8 > > < > > : (~ x s is SCC) = false or (~ x s is SCE) = false (2.16b) 22 The higher the number of true instances is relative to the number of false instances, the better the strong classifier performance over the training set. The challenge is, of course, to construct a strong classifier that performs well also beyond the training set. To do so we must first solve the problem of finding the optimal set of binary weights ~ w. 2.3.3 The formal weight optimization problem LetH [z] denote the Heaviside step function, i.e.,H [z] = 0 if z < 0 andH [z] = 1 if z > 0. Thus H [y s Q ~ w (~ x s )] = 1 if the classification of ~ x s is wrong, but H [y s Q ~ w (~ x s )] = 0 if the classification of~ x s is correct. In this mannerH [y s Q ~ w (~ x s )] assigns a penalty of one unit for each incorrectly classified input-output pair. Consider L(~ w) S X s=1 H [y s Q ~ w (x s )]: (2.17) This counts the total number of incorrect classifications. Therefore minimization of L(~ w) for a given training setf~ x s ;y s g S s=1 will yield the optimal set of weights ~ w opt = fw opt i g N i=1 . However, it is important not to overtrain the classifier. Overtraining means that the strong classifier has poor generalization performance, i.e., it does not classify accu- rately outside of the training set [BEHW87, CYHH07]. To prevent overtraining we can add a penalty proportional to the Hamming weight of ~ w, i.e., to the number of non- zero weightsk~ wk 0 = P N i=1 w i . In this manner an optimal balance is sought between the accuracy of the strong classifier and the number of weak classifiers comprising the strong classifier. The formal weight optimization problem is then to solve ~ w 0opt = arg min ~ w [L(w) +k~ wk 0 ]; (2.18) where> 0 can be tuned to decide the relative importance of the penalty. 23 2.3.4 Relaxed weight optimization problem Unfortunately, the formulation of (2.18) is unsuitable for adiabatic quantum computa- tion because of its discrete nature. In particular, the evaluation of the Heaviside func- tion is not amenable to a straightforward implementation in AQC. Therefore, following [NDRMa], we now relax it by introducing a quadratic error measure, which will be implementable in AQC. Let~ y = (y 1 ;:::;y S )2f1; 1g S and ~ R ~ w = (R ~ w (~ x 1 );:::;R ~ w (~ x S ))2 [1; 1] S . The vector~ y is the ordered label set of correct/erroneous input-output pairs. The components R ~ w (~ x) of the vector ~ R ~ w already appeared in the strong classifier (2.15). There we were interested only in their signs and in Eq. (2.16) we observed that ify s R ~ w (~ x s ) < 0 then ~ x s was incorrectly classified, while ify s R ~ w (~ x s )> 0 then~ x s was correctly classified. We can consider a relaxation of the formal optimization problem (2.18) by replacing the counting of incorrect classifications by a sum of the values of y s R ~ w (~ x s ) over the training set. This makes sense since we have normalized the weak classifiers so that R ~ w (~ x)2 [1; 1], while each labely s 2f1; 1g, so that all the termsy s R ~ w (~ x s ) are in principle equally important. In other words, the inner product~ y ~ R ~ w = P S s=1 y s R ~ w (~ x s ) is also a measure of the success of the classification, and maximizing it (making~ y and ~ R ~ w as parallel as possible) should result in a good training set. Equivalently, we can consider the distance between the vectors~ y and ~ R ~ w and min- imize it by finding the optimal weight vector ~ w opt , in general different from that in Eq. (2.18). Namely, consider the Euclidean distance (~ w) =k~ y ~ R ~ w k 2 = S X s=1 y s N X i=1 w i h i (x s ) 2 =k~ yk 2 + N X i;j=1 C 0 ij w i w j 2 N X i=1 C 0 iy w i ; (2.19) 24 where ~ h i = (h i (x 1 );:::;h i (x S ))2 [1=N; 1=N] S and where C 0 ij = ~ h i ~ h j = S X s=1 h i (x s )h j (x s ); (2.20) C 0 iy = ~ h i ~ y = S X s=1 h i (x s )y s (2.21) can be thought of as correlation functions. Note that they are symmetric: C 0 ij = C 0 ji and C 0 iy = C 0 yi . The termk~ yk 2 = S is a constant offset so can be dropped from the minimization. If we wish to introduce a sparsity penalty as above, we can do so again, and thus ask for the optimal weight in the following sense: ~ w opt = arg min ~ w [(~ w) + 0 k~ wk 0 ] = arg min ~ w " N X i;j=1 C 0 ij w i w j + 2 N X i=1 (C 0 iy )w i # ; (2.22) where 0 = 2. 2.3.5 From QUBO to the Ising Hamiltonian Equation (2.22) is a quadratic binary optimization (QUBO) problem [NRM]. One more step is needed before we can map it to qubits, since we need to work with optimization 25 variables whose range isf1; 1g, notf0; 1g. Define new variablesq i = 2(w i 1=2)2 f1; 1g. In terms of these new variables the minimization problem is ~ q opt = arg min ~ q " 1 4 N X i;j=1 C 0 ij (q i + 1)(q j + 1) + N X i=1 (C 0 iy )(q i + 1) # = arg min ~ q " N X i;j=1 C ij q i q j + N X i=1 (C iy )q i # ; (2.23) where in the second line we dropped the constant terms 1 4 P N i;j=1 C 0 ij and P N i=1 (C 0 iy ), used the symmetry ofC 0 ij for P N i=1 q i P N j=1 C 0 ij = P N i;j=1 C 0 ij q j , and where we defined C ij = 1 4 C 0 ij ; C iy =C 0 iy 1 2 N X j=1 C 0 ij : (2.24) Thus, the final AQC Hamiltonian for the quantum weight-learning problem is H F = N X i;j=1 C ij Z i Z j + N X i=1 (C iy )Z i ; (2.25) whereZ i is the Pauli spin-matrix z acting on theith qubit. This represents Ising spin- spin interactions with coupling matrixC ij , and an inhomogeneous magnetic field C iy acting on each spin. Note how H F encodes the training datafh i (x s );y s g i;s via the coupling matrix C ij = 1 4 P S s=1 h i (x s )h j (x s ) and the local magnetic field C iy = P S s=1 h i (x s )y s 1 2 P S s=1 h i (x s ) P N j=1 h j (x s ). Thus, in order to generateH F one must first calculate the training data using the chosen set of weak classifiers. In this final form [Eq. (2.25)], involving only one and two-qubitZ i terms, the prob- lem is now suitable for implementation on devices such as D-Wave’s adiabatic quan- tum optimization processor [BCMR, Cho08]. It is not known whether this problem 26 is amenable to a quantum speedup. A study of the gap dependence of our Hamilto- nian H(t) on N, which is beyond the scope of the present work, will help to deter- mine whether such a speedup is to be expected also in the problem at hand. A related image processing problem has been shown numerically to require fewer weak classi- fiers than in comparable classical algorithms, which gives the strong classifier a lower Vapnik-Chernovenkis dimension and therefore a lower generalization error [NDRMb]. Quantum boosting applied to a different task, 30-dimensional clustering, demonstrated increasingly better accuracy as the overlap between the two clusters grew than that exhibited by the classical AdaBoost algorithm [NDRMa]. More generally, numerical simulations of quantum adiabatic implementations of related hard optimization prob- lems (such as Exact Cover) have shown promising scaling results for N values of up to 128 [FGG + 01, YKS08a, KDH + 11]. We shall thus proceed here with the requisite cautious optimism. In Section 2.4.4 we shall formulate an alternative weight optimization problem, based on a methodology we develop in Section 2.4 for pairing weak classifiers to guar- antee the correctness of the strong classifier. 2.4 Achievable strong classifier accuracy We shall show in this section that it is theoretically possible to construct a perfect, 100% accurate majority-vote strong classifier from a set of weak classifiers that are more than 50% accurate - if those weak classifiers relate to each other in exactly the right way. Our construction in this section is analytical and exact; we shall specify a set of conditions weak classifiers should satisfy for perfect accuracy of the strong classifier they comprise. We shall also show how to construct an imperfect strong classifier, with bounded error 27 probability, by a relaxation of the conditions we shall impose on the weak classifiers. We expect the quantum algorithm to find a close approximation to this result. Consider a strong classifier with a general binary weight vector ~ w2f0; 1g N , as defined in Eq. (2.14). Our approach will be to show that the strong classifier in Eq. (2.14) is completely accurate if a set of three conditions is met. The conditions work by using pairs of weak classifiers which both classify some~ x correctly and which disagree for all other~ x. An accurate strong classifier can be constructed by covering the entire spaceV with the correctly classifying portions of such weak classifier pairs. To start, every vector~ x2V has a correct classification, as determined by the speci- fication set: ~ x2 ^ S()y(~ x) = +1; (2.26a) ~ x = 2 ^ S()y(~ x) =1 (2.26b) A strong classifier is perfect if Q ~ w (~ x) =y(~ x) 8~ x2V: (2.27) The weak classifiers either agree or disagree with this correct classification. We define the correctness value of a weak classifier for a given input~ x: c i (~ x) =h i (~ x)y(~ x) = 8 > > < > > : +1 h i (~ x) =y(~ x) 1 h i (~ x)6=y(~ x) (2.28) 28 Thus, similarly to the strong classifier case [Eq. (2.16)] we have, formally, c i (~ x) = +1() 8 > > < > > : (~ x is WCC) = true or (~ x is WCE) = true (2.29a) c i (~ x) =1() 8 > > < > > : (~ x is WCC) = false or (~ x is WCE) = false (2.29b) where WCC and WCE stand for weakly classified correct and weakly classified erro- neous, respectively (Definition 5). A given input-output vector ~ x receives either a true or false vote from each weak classifier comprising the strong classifier. Let us denote the index set of the weak classi- fiers comprising a given strong classifier byI. If the majority of the votes given by the weak classifiers inI are true then the vector receives a strong classification that is true. Let us loosely denote by ~ w2I the set of weak classifiers whose indices all belong to I. Thus X i2I c i (~ x)> 0 =)Q ~ w (~ x) =y(~ x) if ~ w2I: (2.30) It follows from Eq. (2.27) that if we can find a set of weak classifiers for which P i2I c i (~ x) > 0 for all input-output vectors~ x, then the corresponding strong classifier is perfect. This is what we shall set out to do in the next subsection. 2.4.1 Conditions for complete classification accuracy First, we limit our working set to those weak classifiers with greater than 50% accuracy. This is a prerequisite for the feasibility of the other conditions. To ensure that at least half the initial dictionary of weak classifiers is more than 50% accurate, we include each potential weak classifier in the dictionary, as well as its opposite. The opposite classifier 29 follows the same rule as its counterpart, but makes the opposite binary decision every time, making each right where the other is wrong and ensuring that at least one of them will have 50% or greater accuracy. Condition 1, therefore, defines the setA, ADf1;:::;Ng; (2.31) of sufficiently accurate weak classifiers, whereD is the set of all possible values of the indexi of weak classifiers in Eq. (2.14). Condition 1 For an input-output vector~ x2V selected uniformly at random A =fi :P [c i (~ x) = 1]> 1=2g: (2.32) P [!] denotes the probability of event !. We use a probabilistic formulation for our conditions since we imagine the input-output spaceV to be very large and accessed by random sampling. Conditions 2 and 3 (or 3a) specify the index set JAA; (2.33) labeling pairs of weak classifiers which will make up the final strong classifier. Con- dition 2 groups the weak classifiers into pairs which classify the minimal number of vectors~ x correctly at the same time and give opposite classifications on all other vec- tors. Condition 3 completes the specification of the index setJ : it states that the subsets of vectors~ x that are classified correctly by the classifier pairs inJ must cover the entire spaceV. 30 Condition 2 If (j;j 0 )2J then P [(c j (~ x) = 1)\ (c j 0(~ x) = 1)] =P [c j (~ x) = 1] +P [c j 0(~ x) = 1] 1 (2.34) for an input-output vector~ x2V selected uniformly at random. This condition has the following simple interpretation, illustrated in Fig. 2.2. Sup- pose the entire input-output spaceV is sorted lexicographically (e.g., according to the binary values of the vectors~ x2V) so that thejth weak classifier is correct on all first N j vectors but erroneous on the rest, while thej 0 th weak classifier is correct on all last N j 0 vectors but erroneous on the rest. Thus the fraction of correctly classified vectors by thejth classifier is (1 j ) =N j =jVj, the fraction of correctly classified vectors by thej 0 th classifier is (1 j 0) =N j 0=jVj, and they overlap on a fraction of 1 j j 0 vectors (all vectors minus each classifier’s fraction of incorrectly classified vectors), as illustrated in the top part of Fig. 2.2. By “pushing classifierj 0 to the left”, as illustrated in the bottom part of Fig. 2.2, the overlap grows and is no longer minimal. This is what is expressed by Eq. (2.34). Condition 2 considers only one pair of weak classifiers at a time, which does not suffice to cover all ofV. Consider a set of weak classifier pairs each satis- fying Condition 2 which, together, do cover all ofV. Such a set would satisfy P (j;j 0 )2J P [(c j (~ x) = 1)\ (c j 0(~ x) = 1)] = 1 for a randomly chosen ~ x2V. This is illustrated in Fig. 2.3. However, it is also possible for two or more pairs to overlap, a situation we would like to avoid as much as possible, i.e., we shall impose minimal overlap similarly to Condition 2. Thus we arrive at: 31 Condition 3 X (j;j 0 )2J P [(c j (~ x) = 1)\ (c j 0(~ x) = 1)] X (j;j 0 )6=(k;k 0 )2J P [(c j (~ x) = 1)\ (c j 0(~ x) = 1) \ (c k (~ x) = 1)\ (c k 0(~ x) = 1)] = 1; (2.35) where the overlap between two pairs of weak classifiers with labels (j;j 0 ) and (k;k 0 ) is given by the subtracted terms. Condition 3 is illustrated in Fig. 2.4. V pair satisfying Condition 2 pair violating Condition 2 Figure 2.2: Illustration of Condition 2. Two pairs of classifiers showing regions of cor- rect (green) and incorrect (red) classification along a line representing a lexicographical ordering of all vectors withinV. The top pair, compliant with Condition 2, provides two correct classifications for the minimum possible number of vectors, voting once cor- rectly and once incorrectly on all other vectors. The bottom pair, violating Condition 2, provides two correct votes for more vectors than does the top pair, but also undesirably provides two incorrect votes for some vectors; this is why paired weak classifiers must coincide in their classifications on as few vectors as possible. V Figure 2.3: Illustration of Condition 3 without the subtracted term. Five pairs of 60% accurate weak classifiers combine to form a completely accurate majority-vote strong classifier. Moving from top to bottom through the pairs and from left to right along the vectors in the classification space, each pair of weak classifiers provides two correct votes for 20% of the vector space and neutral votes otherwise. This means that the majority vote is correct for the entire space because no two pairs vote correctly at once. 32 V Figure 2.4: Illustration of Condition 3 with the subtracted term. Three pairs of 70% accurate weak classifiers combined to form a completely accurate majority-vote strong classifier. In this case, each pair votes twice correctly on 40% of the vector space, which makes it necessary for the correct portions of the second and third pairs from the top to overlap. Because they only overlap by the minimum amount necessary,V as a whole is still covered by a correct majority vote. It is possible to substitute a similar Condition 3a for the above Condition 3 to create a different, yet also sufficient set of conditions for a completely accurate strong classi- fier. The number of weak classifiers required to satisfy the alternate set of conditions is expected to be smaller than the number required to satisfy the original three conditions. This is due to the fact that the modified conditions make use of one standalone weak classifier to cover a larger portion of the space correctly than is possible with a pair of weak classifiers. Condition 3a X (j;j 0 )2J P [(c j (~ x) = 1)\ (c j 0(~ x) = 1)] +P [c a (~ x) = 1] X (j;j 0 )6=(k;k 0 )2J P [(c j (~ x) = 1)\ (c j 0(~ x) = 1) \ (c k (~ x) = 1)\ (c k 0(~ x) = 1)] X (j;j 0 )2J P [(c a (~ x) = 1)\ (c j (~ x) = 1)\ (c j 0(~ x) = 1)] = 1 (2.36) This condition is illustrated in Fig. 2.5. Its interpretation is similar to that of Condition 3, except that the standalone classifier with the subscripta is added to the other classifier pairs, and its overlap with them is subtracted separately in the last line. 33 V Figure 2.5: Illustration of Condition 3a. Two pairs and one single weak classifier form a completely accurate majority-vote strong classifier. The two pairs cover 40% of the vector space with correct votes, and the single weak classifier (the first element of the fourth pair in Fig. 2.3; the faded-out classifiers in the third, fourth, and fifth pairs are omitted from this strong classifier) provides an extra correct vote to tip the balance in the remaining 60% to a correct overall classification. The perfect strong classifier can now be constructed from the weak classifiers in the setJ defined by the conditions above. DefineJ L as the set of all j from pairs (j;j 0 ) 2 J . Similarly, defineJ R as the set of all j 0 from pairs (j;j 0 ) 2 J . Note that, since any pair for whichj = j 0 would not have minimum correctness overlap and therefore could not be inJ , it follow thatj6=j 0 for all pairs (j;j 0 ), i.e.,J L \J R =;. The strong classifier is then (2.14) with eachw i being one of the elements of a pair, i.e., w i = 8 > > < > > : 1 i2 (J L [J R ) 0 otherwise (2.37) 2.4.2 Perfect strong classifier theorem We will now prove that any strong classifier satisfying Conditions 1-3, or 1-3a, is com- pletely accurate. 34 Lemma 1 Assume Condition 1 and (j;j 0 )2J . Then the sum of the correctness val- ues of the corresponding weak classifiers is nonnegative everywhere with probability 1, namely P [c j (~ x) +c j 0(~ x) 0] = 1 (2.38) for an input-output vector~ x2V selected uniformly at random. For any pair (j;j 0 )2J we have P [(c j (~ x) = 1)[ (c j 0(~ x) = 1)] =P [c j (~ x) = 1] +P [c j 0(~ x) = 1] P [(c j (~ x) = 1)\ (c j 0(~ x) = 1)] = 1 (2.39) by Condition 2. Eq. (2.39) means that at least one of the two weak classifiers evaluates to 1. Since by definitionc i (~ x)2f1; 1g8i, the sum is 2 or 0 with probability 1, i.e., P [c j (~ x) +c j 0(~ x)2f0; 2g] = 1: (2.40) Recall that if the majority of the votes given by the weak classifiers comprising a given strong classifier is true then the input-output vector being voted on receives a strong classification that is true [Eq. (2.30)], and that if this is the case for all input- output vectors then the strong classifier is perfect [Eq. (2.27)]. We are now in a position to state that this is the case with certainty provided the weak classifiers belong to the set J defined by the conditions given above. Theorem 1 A strong classifier comprised solely of a set of weak classifiers satisfying Conditions 1-3 is perfect. 35 It suffices to show that the correctness sum is at least 2 with probability 1 when Condi- tions 1-3 are met, namely that P 2 4 X (j;j 0 )2J (c j (~ x) +c j 0(~ x)) 2 3 5 = 1: (2.41) Now, P [ (j;j 0 )2J (c j (~ x) +c j 0(~ x) = 2) =P 2 4 [ (j;j 0 )2J (c j (~ x) = 1)\ (c j 0(~ x) = 1) 3 5 (2.42a) X (j;j 0 )2J P [(c j (~ x) = 1)\ (c j 0(~ x) = 1)] X (j;j 0 )6=(k;k 0 )2J P [(c j (~ x) = 1)\ (c j 0(~ x) = 1) \ (c k (~ x) = 1)\ (c k 0(~ x) = 1)] (2.42b) = 1 by Cond. 3. (2.42c) where equality (2.42c) holds for the inequality (2.42b) 3 because the probability of an event cannot be greater than 1. Thus, for any randomly selected vector~ x2V, the correctness sum of at least one of the pairs is 2, i.e., P [9(j;j 0 )2J : (c j (~ x) +c j 0(~ x) = 2)] = 1: (2.43) Lemma 1 tells us that the correctness sum of each pair of weak classifiers is positive, while Eq. (2.43) states that for at least one pair this sum is not just positive but equal 3 This inequality reflects the fact that forn overlapping sets,P [ S n i=1 s i ] = P n i=1 P [s i ] P i6=j P [s i \ s j ] + P i6=j6=k P [s i \s j \s k ] P i6=j6=k6=m P [s i \s j \s k \s m ] +::: Each term is larger than the next in the series;n + 1 sets cannot intersect wheren sets do not. Our truncation of the series is greater than or equal to the full value because we stop after a subtracted term. 36 to 2. Therefore the correctness sum of all weak classifiers inJ is at least 2, which is Eq. (2.43). Theorem 2 A strong classifier comprised solely of a set of weak classifiers satisfying Conditions 1, 2, and 3a is perfect. It suffices to show that the correctness sum is at least 1 with probability 1 when Condi- tions 1, 2, and 3a are met, namely that P 2 4 X (j;j 0 )2J (c j (~ x) +c j 0(~ x)) +c a (~ x) 1 3 5 = 1: (2.44) We proceed similarly to the proof of Theorem 1. P 2 4 [ (j;j 0 )2J (c j (~ x) +c j 0(~ x) = 2)[ (c a (~ x) = 1) 3 5 =P 2 4 [ (j;j 0 )2J (c j (~ x) = 1)\ (c j 0(~ x) = 1)[ (c a (~ x) = 1) 3 5 = X (j;j 0 )2J P [(c j (~ x) = 1)\ (c j 0(~ x) = 1)] +P [c a (~ x) = 1] X (j;j 0 )6=(k;k 0 )2J P [(c j (~ x) = 1)\ (c j 0(~ x) = 1) \ (c k (~ x) = 1)\ (c k 0(~ x) = 1)] X (j;j 0 )2J P [(c a (~ x) = 1)\ (c j (~ x) = 1)\ (c j 0(~ x) = 1)] = 1 by Cond. 3a: (2.45) 37 Thus the correctness sum of at least one of the pairs together with the singled-out weak classifier is greater than or equal to 1, i.e., P [9(j;j 0 )2J : (c j (~ x) +c j 0(~ x) = 2)[ (c a (~ x) = 1)] = 1: (2.46) This result, together with Lemma 1, implies the correctness sum of all weak classifiers inJ is at least 1, which is Eq. (2.44). 2.4.3 Imperfect strong classifier theorem Because the three conditions on the setJ of weak classifiers guarantee a completely accurate strong classifier, errors in the strong classifier must mean that the conditions are violated in some way. For instance, Condition 2 could be replaced by a weaker condition which allows for more than minimum overlap of vectors~ x categorized correctly by both weak classifiers in a pair. Condition 2a If (j;j 0 )2J then P [(c j (~ x) = 1)\ (c j 0(~ x) = 1)] =P [c j (~ x) = 1] +P [c j 0(~ x) = 1] 1 + jj 0 (2.47) for an input-output vector~ x2V selected uniformly at random. The quantity jj 0 is a measure of the “overlap error”. We can use it to prove relaxed versions of Lemma 1 and Theorem 1. Lemma 1a Assume Condition 1 and (j;j 0 ) 2 J . Then the sum of the correctness values of the corresponding weak classifiers is nonnegative everywhere with probability 1 jj 0, namely P [c j (~ x) +c j 0(~ x) 0] = 1 jj 0 (2.48) 38 for an input-output vector~ x2V selected uniformly at random. The proof closely mimics that of Lemma 1. P [(c j (~ x) = 1)[ (c j 0(~ x) = 1)] =P [c j (~ x) = 1] +P [c j 0(~ x) = 1] P [(c j (~ x) = 1)\ (c j 0(~ x) = 1)] =P [c j (~ x) = 1] +P [c j 0(~ x) = 1] P [c j (~ x) = 1]P [c j 0(~ x) = 1] + 1 jj 0 = 1 jj 0 (2.49) by Condition 2a. As in the proof of Lemma 1, this implies P [c j (~ x) +c j 0(~ x)2f0; 2g] = 1 jj 0: (2.50) We can now replace Theorem 1 by a lower bound on the success probability when Condition 2 is replaced by the weaker Condition 2a. Let us first define an imperfect strong classifier as follows: Definition 7 A strong classifier is-perfect if, for~ x2V chosen uniformly at random, it correctly classifies~ x [i.e.,Q ~ w (~ x) =y(~ x)] with probability at least 1. Theorem 3 A strong classifier comprised solely of a set of weak classifiers satisfying Conditions 1, 2a and 3 is-perfect, where = P (j;j 0 )2J jj 0. It suffices to show that the correctness sum is positive with probability 1 minus the sum of the overlap errors when Conditions 1, 2a and 3 are satisfied, namely P 2 4 X (j;j 0 )2J c j (~ x) +c j 0(~ x)> 0 3 5 1 X (j;j 0 )2J jj 0: (2.51) 39 Now, by definitionc j (~ x) +c j 0(~ x)2f2; 0; 2g, and the correctness sum of at least one of the pairs must be negative in order for the correctness sum over all weak classifiers in J to be negative, so that P 2 4 X (j;j 0 )2J c j (~ x) +c j 0(~ x)< 0 3 5 (2.52a) P [9(j;j 0 )2J :c j (~ x) +c j 0(~ x) =2]: (2.52b) However, we also need to exclude the case of all weak classifier pairs summing to zero (otherwise the strong classifier can be inconclusive). This case is partially excluded by virtue of Condition 3, which tells us thatV as a whole is always covered by a correct majority vote. Formally, P 2 4 X (j;j 0 )2J c j (~ x) +c j 0(~ x) = 0 3 5 =P 2 4 \ (j;j 0 )2J (c j (~ x) +c j 0(~ x)) = 0 3 5 = 1P [9(j;j 0 )2J :c j (~ x) +c j 0(~ x)> 0] = 0; (2.53) where in the last equality we invoked the calculation leading from Eq. (2.42c) to Eq. (2.43), which only required Condition 3. Alternatively, we could use Condition 3a to prove thatP h P (j;j 0 )2J c j (~ x) +c j 0(~ x)+c a (~ x) = 0 i = 0. There is another way for the classifier to return an inconclusive result: if one weak classifier pair has a correctness sum of 2 and another weak classifier pair has a correctness sum of2. This case is included in the bound in Eq. 2.52b because one of the weak classifier pairs in this sce- nario has a negative correctness sum. We can thus conclude that the strict inequality in Eq. (2.52a) can be replaced by. 40 Now, the probability of there being one weak classifier pair such as in Eq. (2.52b) cannot be greater than the probability of at least one of the pairs having a negative correctness sum, which in turn—by the union bound—cannot be greater than the sum of such probabilities: Eq. (2.52b)P 2 4 [ (j;j 0 )2J (c j (~ x) +c j 0(~ x) =2) 3 5 X (j;j 0 )2J P [c j (~ x) +c j 0(~ x) =2] (2.54) = X (j;j 0 )2J jj 0; where the last equality follows from Lemma 1a. This proves Eq. (2.51). It is interesting to note that—as alluded to in this proof—if we were to drop Condi- tions 3 and 3a, then Eq. (2.51) would become P h P (j;j 0 )2J c j (~ x) +c j 0(~ x) 0 i 1 P (j;j 0 )2J jj 0 (note the change from> to), so that Theorem 3 would change to a statement about inconclusive-perfect strong classi- fiers, which can—with finite probability—yield a “don’t-know” answer. This may be a useful tradeoff if it turns out to be difficult to construct a set of weak classifiers satisfying Condition 3 or 3a. 2.4.4 An alternate weight optimization problem The conditions and results established in the previous subsection for correctness of the strong classifier suggest the creation of an alternate weight optimization problem to select the weak classifiers that will be included in the final majority vote, replacing the optimization problem of Section 2.3.4. The new optimization problem is defined over the space of pairs of weak classifiers, rather than singles, which can be constructed using 41 elements of the setAA, withA as defined in Condition 1. We define the ideal pair weight as ~ w ij = 8 > > < > > : 1 (i;j)2JJ 0 otherwise ; (2.55) Since we do not know the setJ a priori, we shall define a QUBO whose solutions w ij 2f0; 1g, with (i;j)2AA, will be an approximation to the ideal pair weights ~ w ij . In the process, we shall map the pair weight bitsw ij to qubits. Eachw ij determines whether its corresponding pair of weak classifiers,h i andh j , will be included in the new strong classifier, which can thus be written as: Q pair (~ x) = sign [R ~ w pair(~ x)] = sign 2 4 X (i;j)2AA w ij (h i (~ x) +h j (~ x)) 3 5 (2.56) Recall that we do not know thew ij a priori; they are found in our approach via the solution of a QUBO, which we set up as follows: ~ w opt pair = arg min ~ w 2 4 X (i;j)2AA ij w ij + X (i;j)6=(k;l)2AA J ijkl w ij w kl 3 5 ; (2.57) where the second term is a double sum over all sets of unequal pairs. The solution of this QUBO will provide us with an approximation to the setJ , which yields the desired set of weak classifiers as in Eq. (2.37). Sparsity can be enforced as in Eq. (2.22) by 42 replacing ij with ij +, where > 0, i.e., by including a penalty proportional to k~ wk 0 . The terms ij andJ ijkl reward compliance with Conditions 2 and 3, respectively. To define ij , we first define the modified correctness functionc 0 i :T 7!f0; 1g, whereT is the training set (2.13): c 0 i (~ x s ;y s ) = 1 2 (h i (~ x s )y s + 1) = 8 > > < > > : 1 h i (~ x s ) =y s 0 h i (~ x s )6=y s (2.58) Below we write c 0 i (s) in place of c 0 i (~ x s ;y s ) for notational simplicity. The term ij rewards the pairing of weak classifiers which classify the minimal number of vectors ~ x incorrectly at the same time, as specified by Condition 2. Each pair included gains negative weight for the training set vectors its members classify correctly, but is also given a positive penalty for any vectors classified incorrectly by both weak classifiers at once: ij = 1 S S X s=1 c 0 i (s) +c 0 j (s) (1c 0 i (s)) 1c 0 j (s) (2.59) The termJ ijkl penalizes the inclusion of pairs that are too similar to each other, as codified in Condition 3. This is accomplished by assigning a positive weight for each vector that is classified correctly by two pairs at once: J ijkl = 1 S S X s=1 c 0 i (s)c 0 j (~ x s )c 0 k (s)c 0 l (s) (2.60) We now have a QUBO for the alternate weight optimization problem. This can be translated to the Ising Hamiltonian as with the original optimization problem in Section 2.3.5. We again map from our QUBO variables w ij to variables q ij = 2(w ij 1=2), yielding the following optimization function: 43 ~ q opt pair = arg min ~ q 2 4 1 2 X (i;j)2AA ij q ij + 1 4 X (i;j)6=(k;l)2AA J ijkl q ij q kl 3 5 ; (2.61) where ij = ij + 1 2 0 @ X (k;l)2AA;(k;l)6=(i;j) J ijkl +J klij 1 A : (2.62) Constant terms were omitted because they have no bearing on the minimization. This optimization function is now suitable for direct translation to the final Hamiltonian for an AQC: H F = 1 2 X (i;j)2AA ij Z ij + 1 4 X (i;j)6=(k;l)2AA J ijkl Z ij Z kl : (2.63) The qubits now represent weights on pairs rather than on an individual classifier. Z ij is therefore the Pauli z operator on the qubit assigned to the pair (i;j)2AA. UsingjAj 2 qubits, this approach will give the optimal combination of weak classifier pairs over the training set according to the conditions set forth previously. 44 2.5 Using strong classifiers in quantum-parallel Now let us suppose that we have already trained our strong classifier and found the optimal weight vector ~ w opt or ~ w opt pair . For simplicity we shall henceforth limit our dis- cussion to ~ w opt . We can use the trained classifier to classify new input-output pairs ~ x = 2T to decide whether they are correct or erroneous. In this section we shall address the question of how we can further obtain a quantum speedup in exhaustively testing all exponentially many (2 N in +Nout ) input-output pairs ~ x. The key observation in this regard is that if we can formulate software error testing as a minimization problem over the spaceV of all input-output pairs~ x, then an AQC algorithm will indeed perform a quantum-parallel search over this entire space, returning as the ground state an erroneous state. 2.5.1 Using two strong binary classifiers to detect errors Recall that we are concerned with the detection of vectors~ x2V that are erroneous and implemented [Eq. (2.7)]. To accomplish this, we use two strong classifiers. The speci- fication classifier is the binary classifier developed in Section 2.3. Ideally, it behaves as follows: Q ~ w (~ x) = 8 > > < > > : 1 ~ x2 ^ S 1 ~ x = 2 ^ S (2.64) The second classifier, which we will call the implementation classifier, determines whether or not an input-output vector is in the program as implemented. It is con- structed in the same way asQ ~ w (~ x), but with its own appropriate training set. Ideally, it behaves as follows: T ~ z (~ x) = 8 > > < > > : 1 ~ x = 2S 1 ~ x2S (2.65) 45 The four possible combinations represented by Eqs. (2.64) and (2.65) correspond to the four cases covered by Definitions1-4. The worrisome input-output vectors, those that are erroneous and implemented, cause both classifiers to evaluate to1. 2.5.2 Formal criterion As a first step, suppose we use the optimal weights vector in the original strong specifi- cation classifier. We then have, from (2.15), Q opt (~ x) = sign [R ~ w opt(~ x)] = sign " N X i=1 w opt i h i (~ x) # (2.66) This, of course, is imprecise since our adiabatic algorithm solves a relaxed optimiza- tion problem (i.e., returns ~ w opt , not ~ w 0opt ), but we shall assume that the replacement is sufficiently close to the true optimum for our purposes. With this caveat, Eq. (2.66) is the optimal strong specification classifier for a given input-output vector~ x, with the classification of~ x as erroneous ifQ opt (~ x) =1 or as correct ifQ opt (~ x) = +1. The strong implementation classifier is constructed similarly to the specification classifier: T opt (~ x) = sign [U ~ z opt(~ x)] = sign " N X i=1 z opt i h i (~ x) # (2.67) Here,h i are the same weak classifiers as those used to train the specification classifier, but T opt is constructed independently from a training setT 0 which may or may not overlap withT . This training set is labeled according to the possibility or impossibility of producing the input-output pairs inT 0 from the implemented program. The result of this optimization is the weight vector~ z opt . Given the results of the classifiersQ opt (~ x) andT opt (~ x) for any vector~ x, the V&V task of identifying whether or not~ x2 (S\: ^ S) reduces to the following. Any vector ~ x is flagged as erroneous and implemented ifQ opt (~ x) +T opt (~ x) =2. We stress once 46 more that, due to our use of the relaxed optimization to solve for ~ w opt and~ z opt , a flagged ~ x may in fact be neither erroneous nor implemented, i.e., our procedure is susceptible to both false positives and false negatives. 2.5.3 Relaxed criterion As was the case with Eq. (2.18),Q opt +T opt is unfortunately not directly implementable in AQC, but a simple relaxation is. The trick is again to remove the sign function, this time from (2.66) and (2.67), and consider the sum of the two classifiers’ majority vote functions directly as an energy function: C opt (~ x) =R ~ w opt(~ x) +U ~ z opt(~ x) (2.68) The combination of the two classifiers gives different results for vectors falling under each of the Definitions from Section 2.2.4. Case 1:~ x = 2 ^ S and~ x2S The vector ~ x is an error implemented in the program and manifests a software error. These vectors gain negative weight from both classifiersR ~ w opt andU ~ z opt. Vectors falling under this definition should receive the lowest values ofC opt , if any such vectors exist. Case 2:~ x2 ^ S and~ x2S The vector~ x satisfies the don’t-worry condition, that is, it is a correct input-output string, part of the ideal program ^ P . In this case, R ~ w opt > 0 andU ~ z opt < 0. In the programs quantum V&V is likely to be used for, with very infrequent, elusive errors, the specifi- cation and implementation will be similar and the negative weight ofU ~ z opt < 0 should be moderated enough by the positive influence of R ~ w opt > 0 that don’t-worry vectors should not populate the lowest-lying states. 47 Case 3:~ x2 ^ S and~ x = 2S The input portion of the vector ~ x is a don’t-care condition. It does not violate any program specifications, but is not important enough to be specifically addressed in the implementation. This vector will gain positive weight from both R ~ w opt and U ~ z opt and should therefore never be misidentified as an error. Case 4:~ x = 2 ^ S and~ x = 2S The vectors~ x in this category would be seen as erroneous by the program specification - if they ever occurred. Because they fall outside the program implementationS, they are not the errors we are trying to find. This case is similar to the don’t-worry situation in that the two strong classifiers will have opposite signs, in this case R ~ w opt < 0 and U ~ z opt > 0. By the same argument as Definition 2, Definition 4 vectors should not receive more negative values ofC opt than the targeted errors. Having examined the values of C opt (~ x) for the relevant categories of ~ x, we can formulate error detection as the following minimization problem: ~ x e = arg min ~ x C opt (~ x): (2.69) Suppose the algorithm returns a solution~ x e (e for “error”). We then need to test that it is indeed an error, which amounts to checking that it behaves incorrectly when con- sidered as an input-output pair in the program implementationP . Note that testing that R ~ w opt(~ x e )< 0 is insufficient, since our procedure involved a sequence of relaxations. 2.5.4 Adiabatic implementation of the relaxed criterion In order to implement the error identification strategy (2.69) we need to consider C opt (~ x) = N X i=1 (w opt i +z opt i )h i (~ x) (2.70) 48 as an energy function. We then consider C opt (~ x) as the final Hamiltonian H F for an AQC, with Hilbert space spanned by the basisfj~ xig. The AQC will then find the state which minimizes C opt (~ x) out of all 2 N in +Nout basis states and thus identify an error candidate. Because the AQC always returns some error candidate, our procedure never generates false negatives. However, Cases 2 and 4 would correspond to false positives, if an input-output vector satisfying either one of these cases is found as the AQC output. We can rely on the fact that the AQC actually returns a (close approximation to the) Boltzmann distribution Pr[~ x] = 1 Z exp[C opt (~ x)=(k B T )]; (2.71) wherek B is the Boltzmann constant,T is the temperature, and Z = P ~ x exp[C opt (~ x)=(k B T )] is the partition function. For sufficiently low temper- ature this probability distribution is sharply peaked around the ground state, with con- tributions from the first few excited states. Thus we can expect that even if there is a low-lying state that has been pushed there by only one of the two binary classifiersQ opt orT opt , the AQC will return a nearby state which is both erroneous and implemented some of the time and an error will still be detected. Even if the undesirable state [~ x2 ^ S and~ x2 S, or~ x = 2 ^ S and~ x = 2 S] is the ground state, and hence all erroneous states [~ x = 2 ^ S and~ x2S] are excited states, their lowest energy member will be found with a probability that ise (t F )=(k B T ) smaller than the unlooked-for state, where (t F ) is the energy gap to the first excited state at the end of the computation. Providedk B T and (t F ) are of the same order, this probability will be appreciable. To ensure that errors which are members of the training set are never identified as ground states we construct the training setT so that it only includes correct states, i.e., y s = +18s. This has the potential drawback that the classifier never trains directly on 49 errors. It is in principle possible to include errors in the training set (y s =1) by adding another penalty term to the strong classifier which directly penalizes such training set members, but whether this can be done without introducing many-body interactions in H F is a problem that is beyond the scope of this work. 2.5.5 Choosing the weak classifiers Written in the form P N i=1 (w opt i +z opt i )h i (~ x), the energy functionC opt (~ x) is too general, since we haven’t yet specified the weak classifiersh i (~ x). However, we are free to choose these so as to moldC opt (~ x) into a Hamiltonian that is physically implementable in AQC. Suppose, e.g., thath i (~ x) measures a Boolean relationship defined by a functionf i : f0; 1g ` 7!f0; 1g between several bits of the input-output vector;x k = bit k (~ x), thekth bit of~ x2V. For example, h i (~ x) = (x i 3 ==f i (x i 1 ;x i 2 )); (2.72) where “a ==b” evaluates to 1 ifa =b or to 0 ifa6=b. Herei 1 andi 2 are the positions of two bits from the input vector ~ x in and i 3 is the position of a bit from the output vector~ x out , so thath i measures a correlation between inputs and outputs. The choice of this particular form for the weak classifiers is physically motivated, as it corresponds to at most three-body interactions between qubits, which can all be reduced to two- body interaction by the addition of ancilla qubits (see below). Let us enumerate these weak classifiers. The number of different Boolean functionsf i is 2 2 ` [?]. 4 Much more efficient representations are possible under reasonable assumptions [?], but for the time being we shall not concern ourselves with these. In the example of the classifier (2.72) 4 Any Boolean function of ` variables can be uniquely expanded in the form f i (x 1 ;:::;x ` ) = P 2 ` 1 =0 i s , where i 2 f0; 1g and s are the 2 ` “simple” Boolean functions s 0 = x 1 x 2 x ` , s 1 = x 1 x 2 x ` ,::: ,s 2 ` 1 = x 1 x 2 x ` , wherex denotes the negation of the bitx. Since each i can assume one of two values, there are 2 2 ` different Boolean functions. 50 Function # Boolean Logic Intermediate Form Implementation Form i = 0 x i 3 == 0 not applicable Z i 3 i = 1 x i 3 == (x i 1 _x i 2 ) 4 (x i 1 x i 2 x i 3 x i 1 x i 3 x i 2 x i 3 ) 2 (x i 1 x i 2 x i 1 x i 2 x i 3 ) 1 Z a Z i 3 Z i 1 Z i 3 Z i 2 Z i 3 i = 2 x i 3 ==x i 1 ^x i 2 4(x i 1 x i 2 x i 3 + x i 2 x i 3 ) + 2(x i 3 + x i 1 x i 2 x i 2 ) + 1 Z a Z i 3 +Z i 2 Z i 3 Z i 3 i = 3 x i 3 ==x i 1 not applicable Z i 3 Z i 1 i = 4 x i 3 ==x i 1 ^x i 2 4(x i 1 x i 3 x i 1 x i 2 x i 3 )2(x i 1 x i 1 x i 2 + x i 3 ) + 1 Z i 1 Z i 3 Z a Z i 3 Z i 3 i = 5 x i 3 ==x i 2 not applicable Z i 3 Z i 2 i = 6 x i 3 ==x i 1 x i 2 8x i 1 x i 2 x i 3 + 4(x i 1 x i 3 + x i 2 x i 3 + x i 1 x i 2 ) 2(x i 1 +x i 2 +x i 3 ) + 1 2Z a Z i 3 +Z i 1 Z i 3 +Z i 2 Z i 3 Z i 3 i = 7 x i 3 == (x i 1 ^x i 2 ) 4x i 1 x i 2 x i 3 + 2(x i 3 +x i 1 x i 2 ) 1 Z a Z i 3 i = 8 x i 3 ==x i 1 ^x i 2 4x i 1 x i 2 x i 3 2(x i 3 +x i 1 x i 2 ) + 1 Z a Z i 3 i = 9 x i 3 == (x i 1 x i 2 ) 8x i 1 x i 2 x i 3 4(x i 1 x i 3 + x i 2 x i 3 + x i 1 x i 2 ) + 2(x i 1 +x i 2 +x i 3 ) 1 2Z a Z i 3 Z i 1 Z i 3 Z i 2 Z i 3 +Z i 3 i = 10 x i 3 ==x i 2 not applicable Z i 3 Z i 2 i = 11 x i 3 ==x i 1 _x i 2 4(x i 1 x i 3 x i 1 x i 2 x i 3 ) + 2(x i 1 x i 1 x i 2 +x i 3 ) 1 Z i 1 Z i 3 +Z a Z i 3 +Z i 3 i = 12 x i 3 ==x i 1 not applicable Z i 3 Z i 1 i = 13 x i 3 ==x i 1 _x i 2 4(x i 1 x i 2 x i 3 +x i 2 x i 3 ) 2(x i 3 + x i 1 x i 2 x i 2 ) 1 Z a Z i 3 Z i 2 Z i 3 +Z i 3 i = 14 x i 3 ==x i 1 _x i 2 4 (x i 1 x i 2 x i 3 x i 1 x i 3 x i 2 x i 3 ) + 2 (x i 1 x i 2 x i 1 x i 2 x i 3 ) + 1 Z a Z i 3 +Z i 1 Z i 3 +Z i 2 Z i 3 i = 15 x i 3 == 1 not applicable Z i 3 Table 2.1: All 16 Boolean functionsf i of two binary variables, and their implementation form in terms of the Pauli matrices Z i j acting on single qubits or pairs of qubits j2 f1; 2; 3g. The subscripta in the Implementation Form column denotes an ancilla qubit, tied to qubitsi 1 andi 2 viax a = x i 1 x i 2 , used to reduce all qubit interactions to at most two-body. 51 there are N in (N in 1) input bit combinations for each of the N out output bits. The number of different Boolean functions in this example, where` = 2, is 2 2 2 = 16. Thus the dimension of the “dictionary” of weak classifiers is N = 16N in (N in 1)N out (2.73) for the case of Eq. (2.72). We wish to find a two-local quantum implementation for each h i (~ x) in the dictio- nary. It is possible to find a two-local implementation for any three-local Hamiltonian using so-called “perturbation gadgets”, or three ancilla bits for each three-local term included [JF08], but rather than using the general method we rely on a special case which will allow us to use only one ancilla bit per three-local term. We first devise an intermediate form function using products of the same bitsx i 2f0; 1g used to define the logical behavior of each weak classifier. This function will have a value of 1 when the Boolean relationship specified for h i (~ x) is true, and1 otherwise. For example, consider function number 8,x i 3 ==x i 1 ^x i 2 , the AND function. Its intermediate form is 4x i 1 x i 2 x i 3 2 (x i 3 +x i 1 x i 2 ) + 1. For the bit values (x i 1 ;x i 2 ;x i 3 ) = (0; 0; 0), the value of the intermediate function is 1, and the Boolean form is true: 0 AND 0 yields 0. If instead we had the bit values (x i 1 ;x i 2 ;x i 3 ) = (0; 0; 1), the intermediate form would yield1, and the Boolean form would be false, because the value forx i 3 does not follow from the values forx i 1 andx i 2 . The two-body implementation form is obtained in two steps from the intermediate form. First, an ancilla bit tied to the product of the two input bits, x a = x i 1 x i 2 , is substituted into any intermediate form expressions involving three-bit products. This is permissible because such an ancilla can indeed be created by introducing a penalty into the final Hamiltonian for any states in which the ancilla bit is not equal to the product 52 x i 1 x i 2 . We detail this method below. Then, the modified intermediate expression is translated into a form that uses bits valued asx 0 i 2f1; 1g rather thanx i 2f0; 1g using the equivalencex i = 2x 0 i 1. The modified intermediate form is now amenable to using the implemented qubits. Note that the Pauli matrixZ i acts on a basis ketj~ xi as Z i j~ xi = (1) bit i (~ x) j~ xi: (2.74) This means that we can substitute Z i for x 0 i and Z i Z j for x 0 i x 0 j in the intermediate form, resulting in the implementation form given in Column 4 of Table 2.1. Some weak classifiers do not involve three-bit interactions. Their implementation forms were devised directly, a simple process when there is no need for inclusion of an ancilla. We have reduced the dictionary functions from three-bit to two-bit interactions by adding an ancilla bit to represent the product of the two input bits involved in the func- tion. Therefore, the maximum number of qubits needed to implement this set of weak classifiers on a quantum processor isQ = N in +N out +N 2 in . In practice, it is likely to be significantly less because not every three-bit correlation will be relevant to a given classification problem. Let us now discuss how the penalty function is introduced. For example, consider again the implementation of weak classifier function i = 8, whose intermediate form involves three-qubit products, which we reduced to two-qubit interactions by including x a . We ensure thatx a does indeed represent the product it is intended to by making the function a sum of two terms: the product of the ancilla qubit and the remaining qubit from the original product, and a term that adds a penalty if the ancilla is not in fact equal to the product of the two qubits it is meant to represent, in this casef penalty = x i 1 x i 2 2(x i 1 +x i 2 )x a + 3x a . In the case where (x i 1 ;x i 2 ;x a ) = (1; 0; 0),f penalty = 0, but in a 53 case wherex a does not represent the intended product such as (x i 1 ;x i 2 ;x a ) = (1; 0; 1), f penalty = 1. In fact, the penalty function behaves as follows: f penalty = 8 > > < > > : 0 x a =x i 1 x i 2 positive otherwise (2.75) In the end, we have the modified intermediate formf 8 = 4x a x i 3 2 (x i 3 +x a ) + 1 + f penalty , which involves only two-qubit interactions. This would be implemented on the quantum computer as the sum of two Hamiltonian terms: H 8 =Z a Z i 3 ; (2.76) from the implementation column of Table 2.1, and H penalty (i 1 ;i 2 ) = 1 4 Z i 1 Z i 2 1 2 Z i 1 Z a 1 2 Z i 2 Z a 1 4 Z i 1 1 4 Z i 2 + 1 2 Z a + 3 4 ; (2.77) the implementation form off penalty , so a Hamiltonian to find input-output vectors classi- fied negatively by this weak classifier would be H weak =H 8 +H penalty (i 1 ;i 2 ): (2.78) When the strong classifier is implemented as a whole, multiple weak classifiers with weight 1 may use the same two input bits, and therefore share an ancilla bit that is the product of those input bits. When this is the case, it is sufficient to add the penalty function to the final Hamiltonian once, though the ancilla is used multiple times. 54 The inclusion of ancilla qubits tied to products of other qubits and their associated penalties need not interfere with the solution of the V&V problem, although the ancilla penalty terms must appear in the same final Hamiltonian as this optimization. If the ancilla penalty terms are made reasonably large, they will put any states in which the ancillas do not represent their intended products (states which are in fact outside ofV) far above the levels at which errors are found. For instance, consider an efficient, nearly optimal strong classifier closely approximating the conditions set forth in Section 2.4. Such a classifier makes its decision on the strength of two simultaneously true votes. If two such classifiers are added together, as in the verification problem, the lowest energy levels will have an energy near4. If the penalty on a forbidden ancilla state is more than a reasonable 4 units, such a state should be well clear of the region where errors are found. This varied yet correlation-limited set of weak classifiers fits nicely with the idea of tracking intermediate spaces [Eq. (2.11)], where we can use an intermediate spaceI j to construct a set of weak classifiers feeding into the next intermediate spaceI j+1 . This is further related to an obvious objection to the above classifiers, which is that they ignore any correlations involving four or more bits, without one-, two-, or three-bit correlations. By building a hierarchy of weak classifiers, for intermediate spaces, such correlations can hopefully be accounted for as they build up by keeping track instead of one-, two-, and three-bit terms as the program runs. 55 2.5.6 QUBO-AQC quantum parallel testing With the choice of Boolean functions for the weak classifiers, the quantum implemen- tation of the energy functionC opt (~ x) [Eq. (2.70)] becomes H test F = N X i=1 (w opt i +z opt i )H i + X j6=k H penalty (j;k); (2.79) whereH i denotes the implemented form given in the third column of Table 2.1, and the indicesj;k2f1;:::;N in g denote all possible pairings of input qubits tied to ancillas. The ground state ofH test F , which corresponds to the optimal weight setsw opt i andz opt i derived from the set of weak classifiers detailed in Subsection 2.5.5, is an erroneous state, which, by construction, is not a member of the training setT . How do we construct the AQC such that all input-output pairs~ x are tested in par- allel? This is a consequence of the adiabatic interpolation Hamiltonian (1.4), and in particular the initial Hamiltonian H I of the type given in Eq. (1.5). The ground state of this positive semi-definiteH I is an equal superposition over all input-output vectors, i.e.,H I P ~ x2V j~ xi = 0, and hence when we implement the AQC every possible~ x starts out as a candidate for the ground state. The final (Boltzmann) distribution of observed states strongly favors the manifold of low energy states, and by design these will be implemented erroneous states, it they exist. 2.6 Sample problem implementation In order to explore the practicality of our two-step adiabatic quantum approach to finding software errors, we have applied the algorithm to a program of limited size containing a logical error. We did this by calculating the results of the algorithm assuming per- fect adiabatic quantum optimization steps on a processor with few (N < 30) available 56 qubits. Preliminary characterizations of the accuracy achievable using such an algorithm given a set of weak classifiers with certain characteristics are also presented. 2.6.1 The Triplex Monitor Miscompare problem The problem we chose to implement is a toy model of program design practices used in mission critical software systems. 5 This program monitors a set of three redundant vari- ablesfA t ;B t ;C t g for internal consistency. The variables could represent, e.g., sensor inputs, control signals, or particularly important internal program values. If one value is different from the other two over a predetermined number of snapshots in timet, a problem in the system is indicated and the value of the two consistent redundant vari- ables is propagated as correct. Thus the program is supposed to implement a simple majority-vote error-detection code. We consider only the simplest case of two time snapshots, i.e., t = 1; 2. As just explained, a correct implementation of the monitoring routine should fail a redundant variableA,B, orC if that same variable miscompares with both of the other variables in each of the two time frames. The erroneous implemented program we shall consider has the logical error that, due to a mishandled internal implementation of the miscompare tracking over multiple time frames, it fails a redundant variable any time there has been a miscompare in both time frames, even if the miscompare implicated a different variable in each time frame. In order to facilitate quantum V&V using the smallest possible number of qubits, we assume the use of classical preprocessing to reduce the program to its essential structure. The quantum algorithm does not look at the values of the three redundant variables in each time frame. Instead, it sees three logical bits per snapshot, telling it whether each pair of variables is equal. This strategy is also reflected in the program outputs, which 5 We are grateful to Greg Tallant from the Lockheed Martin Corporation for providing us with this problem as an example of interest in flight control systems. 57 are three logical bits indicating whether or not each redundant variable is deemed correct by the monitoring routine. Thus there are nine logical bits, as specified in Table 2.2. Bit Significance x 1 A 1 6=B 1 x 2 B 1 6=C 1 x 3 A 1 6=C 1 x 4 A 2 6=B 2 x 5 B 2 6=C 2 x 6 A 2 6=C 2 x 7 A failed x 8 B failed x 9 C failed Table 2.2: Logical bits and their significance in terms of variable comparison in the Triplex Miscompare problem. In terms of Boolean logic, the two behaviors are as follows: Program Specification x 7 =x 1 ^x 3 ^x 4 ^x 6 ; (2.80a) x 8 =x 1 ^x 2 ^x 4 ^x 5 ; (2.80b) x 9 =x 2 ^x 3 ^x 5 ^x 6 ; (2.80c) i.e., a variable is flagged as incorrect if and only if it has miscompared with all other variables in all time frames. Erroneous Program Implementation x 7 = ((x 1 ^x 2 )_ (x 2 ^x 3 )_ (x 1 ^x 3 ))^x 4 ^x 6 ; (2.81a) x 8 = ((x 1 ^x 2 )_ (x 2 ^x 3 )_ (x 1 ^x 3 ))^x 4 ^x 5 ; (2.81b) 58 x 9 = ((x 1 ^x 2 )_ (x 2 ^x 3 )_ (x 1 ^x 3 ))^x 5 ^x 6 ; (2.81c) i.e., a variable is flagged as incorrect if it miscompares with the other variables in the final time frame and if any variable has miscompared with the others in the previous time frame. 2.6.2 Implemented algorithm 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 λ average valid classifier error fraction 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 λ valid classifier error fraction Figure 2.6: Error fractions in 16-member specification classifier calculations; Left: aver- age over 50. Right: best of 50. 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 λ average implementation classifier error fraction 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 λ implementation classifier error fraction Figure 2.7: Error fractions in 16-member implementation classifier calculations; Left: average over 50. Right: best of 50. 59 The challenges before us are to train classifiers to recognize the behavior of both the program specification and the erroneous implementation, and then to use those classi- fiers to find the errors. These objectives have been programmed into a hybrid quantum- classical algorithm using the quantum techniques described in Sections 2.3 and 2.5 and classical strategy refinements based on characteristics of available resources (for example, the accuracy of the set of available weak classifiers). The performance of this algorithm has been tested through computational studies using a classical optimization routine in place of adiabatic quantum optimization calls. The algorithm takes as its inputs two training sets, one for the specification classifier and one for the implementation classifier. The two strong classifiers are constructed using the same method, one after the other, consulting the appropriate training set. When constructing a strong classifier, the algorithm first evaluates the performance of each weak classifier in the dictionary over the training set. Weak classifiers with poor performance, typically those with over 40% error, are discarded. The resulting, more accurate dictionary is fed piecewise into the quantum optimization algorithm. Ideally, the adiabatic quantum optimization using the final Hamiltonian (2.25) would take place over the set of all weak classifiers in the modified, more accurate dictionary. However, the reality of quantum computation for some time to come is that the number of qubits available for processing will be smaller than the number of weak classifiers in the accurate dictionary. This problem is addressed by selecting random groups ofQ classifiers (the number of available qubits) to be optimized together. An initial random group ofQ classifiers is selected, the optimal weight vector~ q opt is calculated by clas- sically finding the ground state ofH F , and the weak classifiers which receive weight 0 are discarded. The resulting spaces are filled in with weak classifiers randomly selected from the set of those which have not yet been considered, until allQ classifiers included 60 in the optimization return a weight of 1. This procedure is repeated until all weak clas- sifiers in the accurate dictionary have been considered, at which time the most accurate group ofQ generated in this manner is accepted as the strong classifier for the training set in question. Clearly, alternative strategies for combining subsets ofQ weak classi- fiers could be considered, such as genetic algorithms, but this was not attempted here. Both the specification and implementation strong classifiers are generated in this way, resulting in R ~ w Q(~ x) = N X i=1 w Q i h i (~ x) (2.82) T ~ z Q(~ x) = N X i=1 z Q i h i (~ x) (2.83) wherew Q i andz Q i take the value 1 if the corresponding weak classifierh i (~ x) is selected using the iterative procedure described in the preceding paragraph, and are zero other- wise. This is the same structure as that seen in Eqs. (2.66) and (2.67), but with different vectors ~ w and~ z due to the lack of available qubits to perform a global optimization over the accurate dictionary. The two strong classifiers of Eqs. (2.82) and (2.83) are summed as in Eq. (2.68) to create a final energy function that will push errors to the bottom part of the spectrum. This is translated to a final HamiltonianH F as in Eq. (2.79) and the result of the opti- mization (i.e., the ground state of thisH F ) is returned as the error candidate. This portion of the algorithm makes it crucial to employ intelligent classical preprocessing in order to keep the length of the input and output vectors as small as possible, because each bit in the input-output vector corresponds to a qubit, and the classical cost of finding the ground state ofH F grows exponentially with the number of qubits. 61 16 18 20 22 24 26 0.26 0.28 0.3 0.32 0.34 0.36 0.38 number of qubits valid classifier error percentage 16 18 20 22 24 26 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 number of qubits implementation classifier error percentage Figure 2.8: Error fractions of specification (left) and implementation (right) classifiers, for an increasing number of qubits. 2.6.3 Simulation results Our simulation efforts have focused on achieving better accuracy from the two strong classifiers. If the strong classifiers are not highly accurate, the second part of the algo- rithm, the quantum-parallel use of the classifiers, will not produce useful results. In the interest of pushing the limits of accuracy of the strong classifiers, some simu- lations were performed on the miscompare problem in a single time frame. Under this simplification, the program specification and implementation are identical (the error arises over multiple time frames), and indeed the numerical results will show that the results for the two classifiers are the same (see Figs. 2.6 and 2.7, right). The algorithm described in Subsection 2.6.2 was run 50 times, each time producing two strong classifiers comprising 16 or fewer weak classifier members. The figure of 16 qubits was chosen because it allowed the computations to be performed in a reasonable amount of time on a desktop computer while still allowing for some complexity in the makeup of the strong classifiers. This set of 50 complete algorithmic iterations was performed for 26 values of , the sparsity parameter introduced in Eq. (2.18). The average percentage of error for both strong classifiers was examined, as was the best error fraction achieved in the 50 iterations. These two quantities are defined as follows: 62 err avg = 1 50 50 X i=1 L i (~ w opt ) (2.84) err min = min i L i (~ w opt ); (2.85) where L is the function that counts the total number of incorrect classifications, Eq. (2.17). The weight vector~ z opt can be substituted for ~ w opt in Eqs. (2.84) and (2.85) if the strong classifier being analyzed is the implementation rather than the specification classifier. Both the average and minimum error for the specification and implementation clas- sifiers are plotted in Figs. 2.6 and 2.7, respectively, as a function of. As shown in Figs. 2.6 and 2.7, while the average percent error for both classifiers hovered around 25%, the best percent error was consistently just below 16% for both the specification and implementation classifiers. The consistency suggests two things: that the randomness of the algorithm can be tamed by looking for the best outcome over a limited number of iterations, and that the sparsity parameter, , did not have much effect on classifier accuracy. Noting in particular the lack of dependency on, we move forward to examine the results of simulations on more difficult and computationally intensive applications of the algorithm. These results address the triplex monitor miscompare problem exactly as described in subsection 2.6.1 and increase the number of qubits as far as 26. The error fractions of the best strong classifiers found, defined as err min (Q) = min i L i (~ w opt ) i2f1;:::;n sim (Q)g (2.86) wheren sim (Q) is the number of simulations performed atQ qubits, are plotted in Fig. 2.8 as a function of the number of qubits allowed in the simulation. 63 Figure 2.9: Accuracy of weak classifier dictionary on input-output vector space. White/black pixels represent a weak classifierh i (~ x) (all weak classifiers meeting Con- dition 1 indexed in order of increasing error j as in Eq. (2.87) on vertical axis) cat- egorizing an input-output vector (indexed in lexicographical order on horizontal axis, there are 2 9 vectors arising from the 9 Boolean variables in the sample problem) cor- rectly/incorrectly, respectively. These classifications were to determine whether an input-output pair was correct or erroneous, i.e., we are analyzing the performance of the specification classifier. For Q = 16 through Q = 23, the error fraction shown is for the best-performing classifier, selected from 26 iterations of the algorithm that were calculated using differ- ent values of. The consistently observed lack of dependence on in these and other simulations (such as the 50-iteration result presented above) justifies this choice. For Q = 24 toQ = 26, it was too computationally intensive to run the algorithm multiple times, even on a high performance computing cluster, so the values plotted are from a single iteration with assigned to zero. This was still deemed to be useful data given the uniformity of the rest of the simulation results with respect to. The dependence on the parity of the number of qubits is a result of the potential for the strong classifier to return 0 when the number of weak classifiers in the majority vote is even. Zero is not technically a misclassification in that the classifier places the vector~ x in the wrong 64 class, but neither does the classifier give the correct class for ~ x. Rather, we obtain a “don’t-know”answer from the classifier, which we do not group wtih the misclassifica- tions because it is not an outright error in classification. It is a different, less conclusive piece of information about the proper classification of~ x which may in fact be useful for other applications of such classifiers. The important conclusion to be drawn from the data quantifying strong classifier errors as a function of the number of available qubits is that performance seems to be improving only slightly as the number of available qubits increases. This may indicate that even with only 16 qubits, if the algorithm is iterated a sufficient number of times to compensate for its random nature, the accuracy achieved is close to the limit of what can be done with the current set of weak classifiers. This is encouraging in the context of strong classifier generation and sets a challenge for improving the performance of weak classifiers or breaking the problem into intermediate stages. 2.6.4 Comparison of results with theory In light of the conditions for an ideal strong classifier developed in Section 2.4, it is reasonable to ask the following questions: How close do the weak classifiers we have for the problem studied here come to satisfying the conditions? What sort of accuracy can we expect our simulations to yield? Fig. 2.9 and a few related calculations shed some light on the answers. In the figure, each row of pixels represents a single weak classifier in the dictionary and each column represents one vector in the input-output space. Horizontal red lines divide the different levels of performance exhibited by the weak classifiers. White pixels represent a given weak classifier categorizing a given input-output vector correctly. Black pixels represent incorrect classifications. The problematic aspect of Fig. 2.9 is the vertical bars of white and black exhibited by some of the more accurate classifiers. The method detailed above for constructing 65 a completely accurate strong classifier relies on pairs of classifiers which are correct where others fall short, and which do not both classify the same input-output vector incorrectly. This is impossible to find in the most accurate group of weak classifiers alone, given that there are black bars of erroneous classifications spanning the entire height of the set. For numerical analysis of the performance of the set of Boolean weak classifiers on the sample problem, we relate the statistics of the dictionary on the input-output vector spaceV to Conditions 2 and 2a. Three quantities will be useful for this analysis. The first is the error fraction of an individual weak classifier j = 1 1 S S X s=1 H [y s h j (~ x s )]; (2.87) that is, the fraction of the training set incorrectly classified by the weak classifierh j (~ x). We use the Heaviside step function to count the number of vectors correctly classified. Next is the minimum possible overlap of correctly classified vectors for a pair of weak classifiers overV: jj 0 = 1 j j 0 (2.88) In Eq. (2.88), we add the correctness fraction (1 j ) of each weak classifier, then subtract 1 to arrive at the number of vectors that must be classified correctly by both weak classifiers at once. The next definition we shall require is that of the actual overlap of correct classifi- cations: jj 0 = 1 S S X s=1 H [y s (h j (~ x s ) +h j 0(~ x s ))] jj 0 + jj 0 (2.89) 66 In Eq. (2.89), we count the number of vectors that are actually classified correctly by both weak classifiers. If the minimum possible and actual overlaps are the same, i.e., jj 0 = 0, then Condi- tion 2 holds, and the weak classifier pair has minimum correctness overlap. Otherwise, if jj 06= jj 0, only the weaker Condition 2a is satisfied, so the weak classifier pair has a greater than minimal correctness overlap and a forced overlap of incorrect classifica- tions jj 0 > 0 (see Fig. 2.2) that could cancel out the correct votes of a different weak classifier pair and cause the strong classifier to be either incorrect or inconclusive. Our numerical analysis of the weak classifiers satisfying Condition 1 (having j < 0:5) showed that the average correctness overlap jj 0 between any two weak classifiers was 0:3194. The maximum correctness overlap for any pair of weak classifiers was jj 0 = 0:6094. The minimum was jj 0 = 0:1563, between two weak classifiers with respective error fractions (amount of the training set misclassified by each individual weak classifier) of j = 0:4844 and j 0 = 0:4531. Compare this to the minimum possible overlap with two such classifiers, jj 0 = 0:0625, and it becomes apparent that this set of weak classifiers falls short of ideal, given that jj 0 = 0:0938 for the weak classifier pair with minimum overlap. When only the most accurate weak classifiers ( j = 0:3906; above the top red hor- izontal line in Fig. 2.9) were included, the average correctness overlap was jj 0 = 0:4389, the maximum was jj 0 = 0:6094, and the minimum was jj 0 = 0:3594. In order to come up with a generous estimate for the accuracy achievable with this group of weak classifiers, we focus on the minimum observed correctness overlap. The minimum pos- sible correctness overlap for two classifiers with j = 0:3906 is jj 0 = 0:2188. With an ideal set of weak classifiers of error j = 0:3906 and correctness overlap jj 0 = 0:2188, it would take seven weak classifiers to construct a completely accurate strong classifier: three pairs of two classifiers each to cover a fraction 0:6564 of the solution space with 67 a correctness overlap from one of the pairs, and one more weak classifier to provide the extra correct vote on the remaining 0:3436 fraction of the space. Assuming that three pairs of weak classifiers with minimum overlap and optimal relationships to the other weak classifier pairs could be found, there will still be a significant error due to the over- lap fractions of the pairs being larger than ideal. In fact, each pair of weak classifiers yields an error contribution of jj 0 = 0:1406, guaranteeing that a fraction 3 jj 0 = 0:4218 of the input-output vectors will be classified incorrectly by the resulting strong classifier. This is not far from the simulation results for odd-qubit strong classifiers (Fig. 2.8, left), which suggests that the algorithm currently in use is producing near-optimal results for the dictionary of weak classifiers it has access to. 2.7 Conclusions We have developed a quantum adiabatic machine learning approach and applied it to the problem of training a quantum software error classifier. We have also shown how to use this classifier in quantum-parallel on the space of all possible input-output pairs of a given implemented software programP . The training procedure involves selecting a set of weak classifiers, which are linearly combined, with binary weights, into two strong classifiers. The first quantum aspect of our approach is an adiabatic quantum algorithm which finds the optimal set of binary weights as the ground state of a certain Hamiltonian. We presented two alternatives for this algorithm. The first, inspired by [NRM, NDRMa], gives weight to single weak classifiers to find an optimal set. The second algorithm for weak classifier selection chooses pairs of weak classifiers to form the optimal set and is based on a set of sufficient conditions for a completely accurate strong classifier that we have developed. 68 The second quantum aspect of our approach is an explicit procedure for using the optimal strong classifiers in order to search the entire space of input-output pairs in quantum-parallel for the existence of an error in P . Such an error is identified by performing an adiabatic quantum evolution, whose manifold of low-energy final states favors erroneous states. A possible improvement of our approach involves adding intermediate training spaces, which track intermediate program execution states. This has the potential to fine-tune the weak classifiers, and overcome a limitation imposed by the desire to restrict our Hamiltonians to low-order interactions, yet still account for high-order correlations between bits in the input-output states. An additional improvement involves finding optimal interpolation paths s(t) (1.4) from the initial to the final Hamiltonian [RKH + 09, RALZ10], for both the classifier training and classifier implementation problems. We have applied our quantum adiabatic machine learning approach to a problem with real-world applications in flight control systems, which has facilitated both algo- rithmic development and characterization of the success of training strong classifiers using a set of weak classifiers involving minimal bit correlations. 69 Chapter 3 Quantum Annealing Correction 3.1 Quantum annealing and computational errors Hard problems in quantum annealing are characterized by gaps that close polynomially or even exponentially with increasing problem size [FGG + 01, YKS08b]. If the gap is too small, then both non-adiabatic transitions and thermal excitations can result in com- putational errors, manifested in the appearance of excited states att f . While the non- adiabatic transition rate can in principle be suppressed to an arbitrarily high degree by enforcing a smoothness condition on the annealing functionsA(t) andB(t) [LRH09], thermal excitations will cause errors at any non-zero temperature. Additionally, even if is large enough, inaccuracies in the implementation ofH Ising may result in the evo- lution ending up in the “wrong” ground state. Overcoming such errors requires error correction. Numerous experiments have demonstrated the utility of quantum error correction in gate-model quantum computing with up to 9 qubits using, e.g., NMR [CPM + 98, ZLS12], trapped ions [CLS + 04, SBM + 11], and optical systems [LGZ + 08, ATK + 09], and superconducting circuits [RDN + 12]. However, such demonstrations require far more control than is available in QA. Likewise, methods developed for adiabatic quan- tum computing [JFS06, Lid08, QL12, YSBK13], require operations which are not included in the QA repertoire. Here we show how QA can nevertheless be error cor- rected. We provide an experimental demonstration using up to 344 superconducting flux qubits in D-Wave processors [JAG + 11, HJB + 10] which have recently been shown to 70 physically implement QA [DJA + 13, BAS + 13, BRI + 14]. The qubit connectivity graph of these processors is depicted in Fig. 3.1a. 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 (a) i 1 i 1 i 1 i 2 i 2 i 2 i 3 i 3 i 3 j P j P j P j 1 j 1 j 1 j 2 j 2 j 2 j 3 j 3 j 3 i P i P i P ↵ 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 (b) 3↵ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 1 (c) Figure 3.1: Unit cell and encoded graph. (a) Schematic of one of the 64 unit cells of the DW2 processor (see Section 3.8). Unit cells are arranged in an 8 8 array forming a “Chimera” graph between qubits. Each circle represents a physical qubit, and each line a programmable Ising coupling z i z j . Lines on the right (left) couple to the corresponding qubits in the neighboring unit cell to the right (above). (b) Two “logical qubits” (i, red andj, blue) embedded within a single unit cell. Qubits labeled 1-3 are the “problem qubits”, the opposing qubit of the same color labeledP is the “penalty qubit”. Problem qubits couple via the black lines with tunable strength both inter- and intra-unit cell. Light blue lines of magnitude are ferromagnetic couplings between the problem qubits and their penalty qubit. (c) Encoded processor graph obtained from the Chimera graph by replacing each logical qubit by a circle. This is a non-planar graph (see Section 3.9 for a proof) with couplings of strength 3. Green circles represent complete logical qubits. Orange circles represent logical qubits lacking their penalty qubit (see Fig. 3.7). Red lines are groups of couplers that cannot all be simultaneously activated. 3.2 Quantum annealing correction We devise a strategy we call “quantum annealing correction” (QAC), comprising the introduction of an energy penalty along with encoding and error correction. Our main tool is the ability to independently control pairwise Ising interactions, which can be viewed as the generators of the bit-flip stabilizer code [Gai08]. We first encodeH Ising , replacing each z i term by its encoded counterpart z i = P n `=1 z i ` and each z i z j by 71 10 20 30 40 50 60 70 80 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Antiferromagnetic chain length N orN Probability of correct answer Unprotected (U) Classical (C) Encoded with penalty (EP) Complete (QAC) 1 1 1 (i) (ii) (iii) (a) = 1 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Unprotected (U) Classical (C) Encoded with penalty (EP) Complete (QAC) (b) = 0:6 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Antiferromagnetic chain length N orN Probability of correct answer Unprotected (U) Classical (C) Encoded with penalty (EP) Complete (QAC) (c) = 0:3 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Problem energy α Unprotected (U) Complete (QAC) (d) U and QAC vs,N = N = 86 Figure 3.2: Success probabilities of the different strategies. Panels (a)-(c) show the results for antiferromagnetic chains as a function of chain length. The solid blue lines in the U case are best fits to 1=(1 +pN 2 ) (Lorentzian), yieldingp = 1:94 10 4 ; 5:31 10 4 ; 3:41 10 3 for = 1; 0:6; 0:3 respectively. Panel (d) compares the U and QAC strategies atN = N = 86 and 2f0:1; 0:2;:::; 1:0g. Chains shown in Panel (a) respectively depict the U (i), C (ii), and EP and QAC (iii) cases. In (ii) and (iii), respectively, vertically aligned and coupled physical qubits of the same color form a logical qubit. Error bars in all our plots were calculated over the set of embeddings and express the standard error of the mean= p S, where 2 = 1 S P S i=1 (x i x) 2 is the sample variance and S is the number of samples. Additional details, including from experiments on a previous generation of the processor (the D-Wave One (DW1) “Rainier”) are given in Section 3.11. 72 z i z j = P n `=1 z i ` z j ` , where the subindices` refer to the problem qubits as depicted in Fig. 3.1b. After these replacements we obtain an encoded Ising Hamiltonian H Ising = N X i=1 h i z i + N X i<j J ij z i z j ; (3.1) where N is the number of encoded qubits. The “code states”j0 i i =j0 i 1 0 in i and j1 i i =j1 i 1 1 in i are eigenstates of z i with eigenvaluesn andn respectively. “Non- code states” are the remaining 2 n 2 eigenstates, having at least one bit-flip error. The statesj0 i ij0 j i;j1 i ij1 j i andj0 i ij1 j i;j1 i ij0 j i are eigenstates of z i z j , also with eigenval- uesn andn respectively. Therefore the ground state ofH Ising is identical, in terms of the code states, to that of the original unencoded Ising Hamiltonian, withN =N. This encoding allows for protection against bit-flip errors in two ways. First, the overall problem energy scale is increased by a factor ofn, wheren = 3 in our imple- mentation on the D-Wave processors. Note that since we cannot also encodeH X (this would requiren-body interactions), it does not directly follow that the gap energy scale also increases; we later present numerical evidence that this is the case (see Fig. 3.3a), so that thermal excitations will be suppressed. Second, the excited state spectrum has been labeled in a manner which can be decoded by performing a post-readout majority- vote on each set ofn problem qubits, thereby error-correcting non-code states into code states in order to recover some of the excited state population. The (n; 1) repetition code has minimum Hamming distancen, i.e., a non-code state with more thanbn=2c bit-flip errors will be incorrectly decoded; we call such states “undecodable”, while “decodable states” are those excited states that are decoded via majority-vote to the correct code state. 73 To generate additional protection we next introduce a ferromagnetic penalty term H P = N X i=1 z i 1 + + z in z i P ; (3.2) the sum of stabilizer generators of then + 1-qubit repetition code, which together detect and energetically penalize [JFS06] all bit-flip errors except the full logical qubit flip. The role of H P can also be understood as to lock the problem qubits into agreement with the penalty qubit, reducing the probability of excitations from the code space into non-code states; see Fig. 3.1b for the D-Wave processor implementation of this penalty. The encoded graph thus obtained in our experimental implementation is depicted in Fig. 3.1c. Including the penalty term, the total encoded Hamiltonian we implement is H(t) =A(t)H X +B(t)H Ising;P (;); (3.3) whereH Ising;P (;) := H Ising +H P , and the two controllable parameters and are the “problem scale” and “penalty scale”, respectively, which we can tune between 0 and 1 in our experiments and optimize. Note that our scheme, as embodied in Eq. (4.5), implements quantum annealing correction: H P energetically penalizes every errorE it does not commute with, e.g., every single-qubit errorE2U(2) such thatE6/ z i . 3.3 Benchmarking using antiferromagnetic chains Having specified the general scheme, which in particular is applicable to any problem that is embeddable on the encoded graph shown in Fig. 3.1c, we now focus on antiferro- magnetic chains. In this case, the classical ground states att =t f are the trivial doubly 74 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 t/tf Gap,Δ β=0.1 β=0.5 β=1.0 0 0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1 β Probability PGS PS (a) ×6 ×12 ×2 Correctly decoded? Yes Yes No ⇑ ⇓ ⇑ ⇑ ⇑ ⇑ ⇑ degeneracy: Ising gap: 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 4 β ΔI ∆ I =4β ∆ I =2α+2β ∆ I =6α (b) Figure 3.3: Effect of varying the penalty strength. Panel (a) shows the numerically calcu- lated gap to the lowest relevant excited state for two antiferromagnetically coupled logical qubits for = 0:3 and different values of. Inset: undecoded (decoded) ground state probabilityP GS (P S ). Panel (b) (top) shows three configurations of two antiferromagnetically coupled logical qubits. Physical qubits denoted by heavy arrows point in the wrong direction. In the left config- uration both logical qubits have a bit-flip error, in the middle configuration only one logical qubit has a single bit-flip error, and in the right configuration one logical qubit is completely flipped. The corresponding degeneracies and gaps ( I ) from the final ground state are indicated, and the gaps plotted (bottom). degenerate states of nearest-neighbor spins pointing in opposite directions. This allows us to benchmark our QAC strategy while focusing on the role of the controllable param- eters, instead of the complications associated with the ground states of frustrated Ising models [BRI + 14, SQSL13]. Moreover, chains are dominated by domain wall errors [JAG + 11], which as we explain below are a particularly challenging scenario for our QAC strategy. As a reference problem we implemented an N-qubit antiferromagnetic chain with H Ising () = P N1 i=1 z i z i+1 . We call this problem “unprotected” (U) since it involves no encoding or penalty. As a second reference problem we implemented three unpenal- ized parallelN-qubit chains:H Ising () = P 3 j=1 P N1 i=1 z i j z i j +1 . We call this problem “classical” (C) since this results in a purely classical repetition code, whereby each 75 Penalty strength β 10 20 30 40 50 60 70 80 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (a) EP, = 0:3 10 20 30 40 50 60 70 80 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (b) EP, = 0:6 10 20 30 40 50 60 70 80 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (c) EP, = 1 Penalty strength β Antiferromagnetic chain length N 10 20 30 40 50 60 70 80 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (d) QAC, = 0:3 Antiferromagnetic chain length N 10 20 30 40 50 60 70 80 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5 0.6 0.7 0.8 0.9 (e) QAC, = 0:6 Antiferromagnetic chain length N 10 20 30 40 50 60 70 80 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 (f) QAC, = 1 Figure 3.4: Experimental optimization of the penalty strength . The top (bottom) row shows color density plots of the experimental success probability of the EP (QAC) strategy as a function of2f0:1; 0:2;:::; 1:0g and N2f2; 3;:::; 86g, at = 0:3 (left), 0:6 (middle) and 1 (right). The optimal values are indicated by the white dots. triple of bitsfi j g 3 j=1 forms a logical bit i, decoded via majority-vote. As a third ref- erence problem we implemented a chain ofN encoded qubits with an energy penalty (EP):H Ising;P (;) = P N1 i=1 z i z i+1 +H P . When we add majority-vote decoding to the EP strategy we have our complete QAC strategy. Comparing the probability of finding the ground state in the U, C, EP and QAC cases allows us to isolate the effects of the various components of the error correction strategy. Because antiferromagnetic chains have two degenerate ground states, below we consider the ground state for any given experimentally measured state to be that with which the majority of the decoded qubits align. 76 3.4 Key experimental results – success probabilities The performance of the different strategies are shown in Fig. 3.2. Our key finding is the high success probability of the complete QAC strategy for = 1 (Fig. 3.2a), improving significantly over the three other strategies, and resulting in a fidelity > 90% for all chain lengths. The relative improvement is highest for low values of , as seen in Fig. 3.2d. The C strategy is competitive with QAC for relatively short chains, but drops eventually. The EP probability is initially intermediate between the U and C cases, but always catches up with the C data for sufficiently long chains. This shows that the energy penalty strategy by itself is insufficient, and must be supplemented by decoding as in the complete QAC strategy. Since sets the overall problem energy scale, it is inversely related to the effective noise strength. This is clearly visible in Fig. 3.2a-3.2d (see also Fig. 3.4), where the overall success probability improves significantly over a range of values. The unpro- tected chains are reasonably well fit by a Lorentzian, whereas a classical model of inde- pendent errors (see Section 3.10) fails to describe the data as it predicts an exponential dependence onN. We turn next to an analysis and explanation of our results. 3.5 Optimizing the penalty scale To obtain the performance of the EP and QAC strategies shown in Fig. 3.2 we optimized separately for each strategy and for each setting of and N. In order to understand the role of consider first how increasing affects the size and position of the gap . The excitations relevant to our error correction procedure are to the second excited state and above, since the ground state becomes degenerate att f . In Fig. 3.3a we show that the relevant gap grows with increasing , as desired. The gap position also shifts to the left, which is advantageous since it leaves less time for thermal excitations to act 77 [t] 0 20 40 60 80 100 120 140 160 0 0.05 0.1 0.15 0.2 0.25 0.3 Hamming distance from nearest degenerate ground state Percentage of observations 0 20 40 0 0.2 0.4 0.6 (a) 1 2 3 0 1 2 3 4 5 6 Hamming distance from logical qubit ground state Percentage of observations 10 20 30 40 50 60 70 80 1 0 1 2 3 4 5 6 (b) Figure 3.5: Hamming distance histograms. Observed errors in encoded 86 qubit antiferro- magnetic chains, at = 1 and the near-optimal = 0:2. Panel (a) is a histogram of Hamming distances from the nearest of the two degenerate ground states, measured in terms of physical qubits. Inset: in terms of encoded qubits. The peaks at Hamming distance zero are cut off and extend to 63.6% (88.3%) for the physical (encoded) case. Panel (b) is a histogram of the errors as a function of logical qubit position (color scale) within the chain. Errors on encoded problem qubits are at Hamming distance 1, 2 or 3. Flipped penalty qubits are shown in the inset. The mirror symmetry is due to averaging over the two equivalent chain directions. while the transverse field dominates. However, the role of is more subtle than would be suggested by considering only the gap. When the penalty has no effect, and when the penalty dominates the problem scale and the chains effectively comprise decoupled logical qubits. Thus there should be an optimal for each ( N;) pair, which we denote opt . Without decoding we expect opt based on the argument above, which is confirmed in Fig. 3.4 (top row). Note that when = 0:1 the penalty is too small to be beneficial, hence the poor performance for that value in the EP case. In the QAC case another effect occurs: the spectrum is reordered so that undecodable states become lower in energy than decodable states. This is explained in Fig. 3.3b. Consider the three configurations shown. While the left and middle configurations are decodable, the right-side configuration is not. For sufficiently large the undecodable 78 state is always the highest of the three indicated excited states. The graph at the bottom of panel (b) shows the Ising gap as a function of for = 0:3. While for sufficiently small such that 4; 2 + 2 < 6 both decodable states are lower in energy than the undecodable state, the undecodable state becomes the first excited state for sufficiently large . This adversely affects the success probability after decoding, as is verified numerically in the inset of panel (a), which shows the results of an adiabatic master equation [ABLZ12] calculation for the same problem, yielding the undecoded ground state probability P GS and the decoded ground state probability P S (for model details and parameters see Section 3.12). While for < 0:6 decoding helps, this is no longer true when for > 0:6 the undecodable state becomes the first relevant excited state. Consequently we again expect there to be an optimal value of for the QAC strategy that differs from opt for EP. These expectations are borne out in our experiments: Fig. 3.4 (bottom row) shows that opt is significantly lower than in the EP case, which differs only via the absence of the decoding step. The decrease in opt with increasing and chain length can be understood in terms of domain wall errors (see below), which tend to flip entire logical qubits, thus resulting in a growing number of undecodable errors. For additional insight into the roles of the penalty qubits, and see Appendices 3.13 and 3.14. 3.6 Error mechanisms Solving for the ground state of an antiferromagnetic Ising chain is an “easy” problem, so why do we observe decreasing success probabilities? As alluded to earlier, domain walls are the dominant form of errors for antiferromagnetic chains, and we show next how they account for the shrinking success probability. We analyze the errors on the problem vs the penalty qubits and their distribution along the chain. Figure 3.5a is a histogram of 79 the observed decoded states at a given Hamming distanced from the ground state of the N = 86 chains. The large peak neard = 0 shows that most states are either correctly decoded or have just a few flipped bits. The quasi-periodic structure seen emerging at d 20 can be understood in terms of domain walls. The period is four, the number of physical qubits per logical qubit, so this periodicity reflects the flipping of an integer multiple of logical qubits, as in Fig. 3.3b. Once an entire logical qubit has flipped and violates the antiferromagnetic coupling to, say, its left (thus creating a kink), it becomes energetically preferable for the nearest neighbor logical qubit to its right to flip as well, setting off a cascade of logical qubit flips all the way to the end of the chain. The inset is the logical Hamming distance histogram, which looks like a condensed version of the physical Hamming distance histogram because it is dominated by these domain wall dynamics. Rather than considering the entire final state, Fig. 3.5b integrates the data in Fig. 3.5a and displays the observed occurrence rates of the various classes of errors per logical qubit in N = 86 chains. The histograms for one, two, and three problem qubits flipping in each location are shown separately. Flipped penalty qubits are shown in the inset and are essentially perfectly correlated withd = 3 errors, meaning that a penalty qubit flip will nearly always occur in conjunction with all problem qubits flipping as well. Thus the penalty qubits function to lock the problem qubits into agreement, as they should (further analysis of the role of the penalty qubit in error suppression is presented in Section 3.13). The overwhelming majority of errors are one or more domain walls between logical qubits. The domains occur with higher probability the closer they are to the ends of the chain, since kink creation costs half the energy at the chain boundaries. The same low barrier to flipping a qubit at the chain ends also explains the large peaks atd = 1. 80 0 50 100 150 0 5 10 15 20 Physical Hamming distance from ground state Energy of observed state relative to ground state 0 0.2 0.4 0.6 0.8 1 Figure 3.6: Decodability analysis. The fraction of decodable states out of all states (color scale) observed at a given Hamming distance from the nearest degenerate ground state (measured in physical qubits), and given energy above the ground state (in units ofJ ij = 1), for N = 86 and = 1. Our majority-vote decoding strategy correctly decodes errors withd = 1, incorrectly decodes the much less frequent d = 2 errors, and is oblivious to the dominant d = 3 domain wall errors, which present as logical errors. Therefore the preponderance of domain wall errors at large N is largely responsible for the drop seen in the QAC data in Fig. 3.2. The two-qubit problem analyzed in Fig. 3.3a and 3.3b suggests that logical errors can dominate the low energy spectrum. We observe this phenomenon in Fig. 3.6, which shows that decodable and undecodable states separate cleanly by Hamming distance but not by energy, with many low energy states being undecodable states. In this sense the problem of chains we are studying here is in fact unfavorable for our QAC scheme, and we can expect better performance for computationally hard problems involving frustration. 81 Figure 3.6 lends itself to another interesting interpretation. Quantum annealing is normally understood as an optimization scheme that succeeds by evolving in the ground state, but how much does the energy of the final state matter when we implement error correction? Figure 3.6 shows that a small Hamming distance is much more strongly correlated with decodability than the final state energy: the latter can be quite high while the state remain decodable. Thus the decoding strategy tolerates relatively high energy final states. 3.7 Conclusion and outlook This work demonstrates that QAC can significantly improve the performance of pro- grammable quantum annealing even for the relatively unfavorable problem of antifer- romagnetic chains, which are dominated by logical qubit errors manifested as domain walls. We have shown that (i) increasing the problem energy scale via encoding into log- ical qubits, (ii) introducing an optimum penalty strength to penalize errors that do not commute with the penalty term, and (iii) decoding the excited states, reduces the overall error rate relative to any strategy that does less than these three steps which comprise the complete QAC strategy. The next step is to extend QAC to problems where the correct solution is not known in advance, and is in fact the object of running the quantum annealer. Optimization of the decoding scheme would then be desirable. For example, detected errors could be corrected by solving a local optimization problem, whereby the values of a small cluster of logical qubits that were flagged as erroneous and their neighbors are used to find the lowest energy solution possible. Other decoding schemes could be devised as needed, drawing, e.g., on recent developments in optimal decoding of surface codes [FWH12]. 82 Another important venue for future studies is the development of more efficient QAC- compatible codes capable of handling larger weight errors. Ultimately, the scalability of quantum annealing depends on the incorporation of fault-tolerant error correction techniques, which we hope this work will help to inspire. 3.8 Hardware parameters of the DW1 and DW2 The experiments described in the main text were performed on the D-Wave Two (DW2) “Vesuvius” processor at the Information Sciences Institute of the Univer- sity of Southern California. The device has been described in detail elsewhere [HJB + 10, HJL + 10, BJB + 10]. The annealing functions A(t) and B(t) are specified below. All our results were averaged over 24 embeddings of the chains on the processor (except U in Fig. 3.2d, which used 188 embeddings), where an embedding assigns a spe- cific set of physical qubits to a given chain. After programming the couplings, the device was cooled for 10 ms, and then 5000 annealing runs per embedding were performed using an annealing time oft f = 20s for every problem sizeN;N 2f2; 3;:::; 86g, 2f0:1; 0:2;:::; 1:0g and2f0:1; 0:2;:::; 1:0g. Annealing was performed at a temper- ature of 17 mK ( 2:2 GHz), with an initial transverse field starting atA(0) 33:8 GHz, going to zero during the annealing, while the couplings are ramped up from near zero to B(t f ) 20:5 GHz. The D-Wave processors are organized into unit cells consisting of eight qubits arranged in a complete, balanced bipartite graph, with each side of the graph con- necting to a neighboring unit cell, as seen in Fig. 3.7, known as the “Chimera” graph [Cho08, Cho11]. The D-Wave One (DW1) “Rainier” processor is the predecessor of the DW2, and was used in our early experiments. The minimum DW1 annealing time is 83 5s, compared to 20s for the DW2. The analog part of the DW2 circuitry is very sim- ilar to the DW1 processors other than improved qubit parameters (lower inductance and capacitance, shorter qubit length, higher critical current). The DAC (digital to analog) and readout technologies were also improved (non-dissipative readout scheme instead of dc SQUIDs). The DW2 has an XYZ addressing scheme (to eliminate static power dis- sipation when programming), and a smaller spread on coupling strength of key on-chip control transformers allowing better synchronization between the h and J parameters [Lan13]. The reduction in control noise sources is noticeable in our experimental data, as shown in Sec. 3.11. Figure 3.8 shows the encoded hardware graph used in our DW1 experiments. The annealing schedules for the DW1 and DW2 are shown in Fig. 3.9. Details for the DW1 experiments are as follows (for the analogous DW2 details see Methods in the main text). After programming the couplings, the device was cooled for 1:5 s and then 20; 000 annealing runs per embedding were performed using an anneal- ing time of t f = 5s for every problem size N;N 2f2; 3;:::; 16g. Only a single embedding was used. Error bars in all our DW1 plots were calculated over the set of S = 20; 000 annealing runs and express the standard error of the mean= p S, where 2 = 1 S P S i=1 (x i x) 2 is the sample variance. 3.9 Proof that the encoded graph is non-planar The solution of the Ising model over the encoded graph over the processor, shown in Fig. 3.1 in the main text, is an NP-hard problem, just as the same problem over the original hardware graph is NP-hard. The key lies in the three-dimensional nature of both graphs; the ground state of Ising spin glasses over non-planar lattices is an NP- hard problem [Bar82]. 84 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 (a) DW1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 (b) DW2 Figure 3.7: The connectivity graph of the D-Wave One (DW1) “Rainier” processor shown on the left consists of 4 4 unit cells of eight qubits (denoted by circles), con- nected by programmable inductive couplers (lines). The 108 green (red) circles denote functional (inactive) qubits. Most qubits connect to six other qubits. The D-Wave Two (DW2) Vesuvius processor shown on the right consists of 88 unit cells. The 503 green (red) circles denote functional (inactive) qubits. In the ideal case, where all qubits are functional and all couplers are present, one obtains the non-planar “Chimera” connec- tivity graph. We provide a graphical proof of non-planarity for the encoded graph here. The existence of a subgraph homeomorphic to theK 3;3 complete bipartite graph with three vertices on each side is sufficient to prove that a given graph is non-planar [BM04]. This subgraph may take as its edges paths within the graph being studied. We take a section of the encoded graph and, by performing a series of allowed moves of condensing paths to edges, show that the section is indeed homeomorphic toK 3;3 . 85 3↵ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 Figure 3.8: The encoded DW1 graph. Symbols are the same as Fig. 3.1(c) in the main text for the encoded DW2 graph. In addition, grey circles represent logical qubits with defective data qubits that were not used in our experiments; the corresponding grey lines represent couplings that were not used. 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 30 35 t/tf Annealing Schedules (GHz) A(t) B(t) (a) DW1 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 30 35 t/tf Annealing Schedules (GHz) A(t) B(t) (b) DW1 Figure 3.9: The DW1 (a) and DW2 (b) annealing schedules. The functions A and B are the ones appearing in Eqs. (1.4) and (4.5). The solid horizontal black line is the operating temperature energy. We begin with an 18-qubit section of the regular encoded graph, shown in Fig. 3.10(a). This encoded graph is then condensed along its paths by repeatedly remov- ing two edges and a vertex and replacing them with a single edge representing the path. A clear sequence of these moves is shown in Fig. 3.10(b)-(d). The studied subgraph has now been condensed into the form of the desiredK 3;3 graph, as is made clear by label- ing and rearranging the vertices as in Fig. 3.11. The encoded graph is thereby proved non-planar. 86 (a) (b) (c) (d) Figure 3.10: (a) A portion of the encoded graph over logical qubits. (b)-(d) Contraction of paths in the original graph into edges. Paths consisting of two edges and a vertex are selected (represented in the figure as dotted lines), then contracted into a single edge connecting the ends of the chosen path (shown as a new solid line). 1 2 3 4 5 6 1 2 3 4 5 6 (a) 1 2 3 4 5 6 1 2 3 4 5 6 (b) Figure 3.11: The condensed graph (a) is isomorphic to the standard representation of theK 3;3 bipartite graph (b). 3.10 A classical independent errors model It is tempting to explain the decay of success probability seen in Fig. 3.2 in the main text in terms of a classical model of uncorrelated errors. To this end, consider an anti- ferromagnetic chain ofN spins, with HamiltonianH Ising = P N1 i=1 z i z i+1 . Its exci- tations are kinks, resulting in domain walls. A kink is a single disagreement between nearest-neighbor spins. There areN 1 possible kink locations since there areN 1 nearest-neighbor spin pairs. Each kink costs an energy of 2. The ground state energy is(N 1) and so the energy of a state withk kinks isE N (k) = (N + 1 + 2k). The degeneracyd N (k) of a state withk kinks is the number of ways of placingk kinks 87 inN1 slots, i.e.,d N (k) = N1 k . Assuming a classical system in thermal equilibrium at temperaturek B T , the probability of a state withk kinks is P N (k) = 1 Z N d N (k)e E N (k)=k B T ; (3.4) whereZ N = P N1 k=0 P N (k) is the normalization factor. We have Z N = N1 X k=0 N 1 k e (N+1+2k)=k B T (3.5a) =e (N1)=k B T N1 X k=0 N 1 k e 2=k B T k (3.5b) = (2 cosh(=k B T )) N1 : (3.5c) Thus the probability of no kinks, i.e., no errors in the chain, is P N (0) = e (N1)=k B T (2 cosh(=k B T )) N1 = 1 1 +e 2=k B T N1 (3.6a) (1p()) N1 ; (3.6b) whence we identify the bit-flip probability from the main text as p() = 1 1 +e 2=k B T : (3.7) This purely classical thermal error model predicts an exponential fidelity decay with N, that does not agree well with our data. Indeed, the success probability depends quadratically onN for small chain lengths, whence the Lorentzian fits shown in Fig. 3.2 in the main text. This suggests that a different error mechanism is at work, which we can capture using an adiabatic master equation presented in Sec. 3.12. 88 3.11 Comparison between the DW1 and DW2 In the main text we provided DW2 results at various values of. Figure 3.12 collects these results for the U and QAC cases, at three different values of , to simplify the comparison. It clearly demonstrates the advantage of QAC over the U case at every value of the problem scale, and shows how increasing increases the success probability. 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Antiferromagnetic chain length N orN Probability of correct answer Unprotected (U) Complete (QAC) Figure 3.12: Recap of the results shown in Fig. 3.2 in the main text, for the U and QAC cases, for = 1 (top) = 0:6 (middle), and = 0:3 (bottom). Figure 3.13 complements the DW2 data with results from the DW1, and an addi- tional DW2 data set at = 0:9, alongside the corresponding DW1 data. The DW1 results are seen to agree overall with those from the DW2, though the absence of penalty qubits in up to three logical qubits in the DW1 (as explained in the caption of Fig. 3.16), 89 and larger control errors, resulted in lower success probabilities overall. Moreover, the DW1 accommodated at most 16 logical qubits, so some of the larger scale trends observed in the DW2 case, such as the eventual decline of the QAC success probability as seen in Fig. 3.2, are not visible in the DW1 data. 0 2 4 6 8 10 12 14 16 18 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Probability of correct answer U C EP QAC fit fit (a) DW1, = 0:3. 0 2 4 6 8 10 12 14 16 18 0.7 0.75 0.8 0.85 0.9 0.95 1 U C EP QAC fit fit (b) DW1, = 0:6. 0 2 4 6 8 10 12 14 16 18 0.85 0.9 0.95 1 Antiferromagnetic chain length N orN Probability of correct answer U C EP QAC fit fit (c) DW1, = 0:9. 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Antiferromagnetic chain length N orN Unprotected (U) Classical (C) Encoded with penalty (EP) Complete (QAC) 5 10 15 0.85 0.9 0.95 1 (d) DW2, = 0:9. Figure 3.13: Panels (a)-(c) are DW1 results, panel (d) is DW2 results. Shown are success probabilities for the various cases discussed in the main text as a function of chain lengthN; N2 2;:::; 16, complementing the results shown in Fig. 3.2. Blue and purple lines are best fits to an independent errors model as described in Sec. 3.10. Panel (d) shows an additional DW2 data set, at = 0:9. The inset shows the same data on the same scale as in panel (c), to emphasize the improved performance of the DW2 over the DW1. The penalty scale was chosen as the experimental optimum for each and N in all cases. 90 In Fig. 3.14 we show the correlation between Hamming distance, energy, and decod- ability, for the DW1 and the DW2 for chains of length 16. This complements Fig. 3.6 in the main text. In the DW2 plot, we see the points representing domain walls spaced at even four-qubit intervals because there are no missing penalty qubits. As can be seen from the color variation, the distribution for the DW1 is less bimodal than that of the DW2. In both cases states with physical Hamming distance less than 5 from the ground state show decodability while states further away do not. 0 5 10 15 20 25 30 0 1 2 3 4 5 6 7 8 Physical Hamming distance from ground state Energy of observed state relative to ground state 0 0.2 0.4 0.6 0.8 1 (a) DW1 0 5 10 15 20 25 30 0 1 2 3 4 5 6 7 Physical Hamming distance from ground state Energy of observed state relative to ground state 0 0.2 0.4 0.6 0.8 1 (b) DW2 Figure 3.14: The fraction of decodable states (color scale), for N = 16, = 0:3 and = 0:4 (optimal for DW1), vs the energy (in units of J ij = 1) of each observed state relative to the ground state and the Hamming distance from the nearest degenerate ground state, measured in physical qubits. Here we compare the DW1 (a) and the DW2 (b); opt = 0:2 for DW2. 3.12 Adiabatic master equation In order to derive the adiabatic Markovian master equation used in performing the sim- ulations, we consider a closed system with Hamiltonian H(t) =H S (t) +H B +g X A B ; (3.8) 91 where H S (t) is the time-dependent system Hamiltonian (which in our case takes the form given in Eq. (1.4)),H B is the bath Hamiltonian,fA g are Hermitian system oper- ators,fB g are Hermitian bath operators, andg is the system-bath interaction strength (with dimensions of energy). Under suitable approximations, a master equation can be derived from first principles [ABLZ12] describing the Markovian evolution of the system. This equation takes the Lindblad form [Lin76]: d dt (t) = i ~ [H S (t) +H LS (t);(t)] + g 2 ~ 2 X ; X ! (!) L ;! (t)(t)L y ;! (t) 1 2 L y ;! (t)L ;! (t);(t) ; (3.9) whereH LS is the Lamb shift term induced by the interaction with the thermal bath,! is a frequency, (!) is a positive matrix for all values of!, andL ;! (t) are time-dependent Lindblad operators. They are given by L ;! (t) = X ! ~!; ba (t) h" a (t)jA j" b (t)ij" a (t)ih" b (t)j; (3.10a) (!) = Z 1 1 dte i!t he iH B t B e iH B t B i; (3.10b) H LS (t) = g 2 ~ X X ! S (!)L y ;! (t)L ;! (t); (3.10c) S (!) = Z 1 1 d! 0 (! 0 )P 1 !! 0 ; (3.10d) whereP is the Cauchy principal value, ba (t)" b (t)" a (t), and the statesj" a (t)i are the instantaneous energy eigenstates ofH S (t) with eigenvalues" a (t) satisfying H S (t)j" a (t)i =" a (t)j" a (t)i; (3.11) 92 For our simulations we considered independent dephasing harmonic oscillator baths (i.e., each qubit experiences its own thermal bath) such that: X A B = X z X k b k; +b y k; ; (3.12) whereb k; andb y k; are, respectively, lowering and raising operators for thekth oscillator of the bath associated with qubit satisfying h b k; ;b y k 0 ; i = k;k 0. Furthermore, we assume an Ohmic spectrum for each bath such that g 2 (!) = ; 2g 2 ! 1e ~! e !=!c ; (3.13) where is the inverse temperature, (with units of time squared) characterizes the Ohmic bath, and! c is a UV cut-off. In our simulations, we fix! c = 8 GHz in order to satisfy the approximations made in deriving the master equation (see Ref. [?] for more details), and we fix 1 =~ 2:2 GHz to match the operating temperature of 17 mK of the D-Wave device. The only remaining free parameter is the effective system-bath coupling g 2 =~ 2 ; (3.14) which we vary to find the best agreement with our experimental data. As an example of the ability of our adiabatic master equation to capture the experi- mental data, we show in Fig. 3.15 the results of calculations for unprotected antiferro- magnetic (AF) chains, along with the corresponding DW2 data. As can be seen the mas- ter equation correctly captures the initial quadratic decay as a function of chain length, which is missed by the classical kinks model presented in Sec. 3.10. Additional confir- mation of the validity of the adiabatic master equation approach is given in Fig. 3.16(a), and earlier work [BAS + 13, ALMZ13]. 93 2 4 6 8 0.8 0.85 0.9 0.95 1 N Ground state probability Figure 3.15: Experimental (circles) and simulation (crosses) results for an unencoded AF chain with = 0:3 and annealing time t f = 20s. The effective system-bath coupling is = 6:37 10 6 . 3.13 The role of the penalty qubit In this section we demonstrate that the penalty qubits are indeed responsible for error suppression. To do so we tested both experimentally and numerically the effect of removing one of the penalty qubits. Figure 3.16(a) clearly shows, using a simulation of two AF coupled logical qubits, that the physical problem qubits that are coupled to the penalty qubit have a smaller probability of flipping than those that are not. Fig- ure 3.16(b) shows experimental data for an N = 16 chain with three missing penalty qubits; Hamming distance 1 errors at the locations of the missing penalty qubits are greatly enhanced, reversing the dominance ofd = 3 errors seen in Fig. 3.5b in the main text. Let us now consider in some more detail the interplay between the optimal effect and the consequences of an absent penalty qubit. We saw the effects of changing the 94 0 1 2 0 0.2 0.4 0.6 0.8 1 Hamming Distance Probability PQ Theory PQ Experiment NPQ Theory NPQ Experiment Pen Theory Pen Experiment 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 1 0 0.2 0.4 0.6 0.8 1 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 β Single Flip Probability PNPS PPS (a) 1 2 3 0 2 4 6 8 10 12 14 Hamming distance from logical ground state Percentage of observations 2 4 6 8 10 12 14 16 1 0 0.2 0.4 0.6 0.8 1 1.2 (b) Figure 3.16: Effect of the penalty qubits. (a) Shown are adiabatic master equation simulation results (with parameters = 3:18 10 4 andt f = 20s) as a function of, with = 0:3, for two antiferromagnetically coupled encoded qubits, one with a penalty qubit and one without (see diagram in upper left corner). Blue diamonds are the probability of a single flip in the problem qubits coupled to the penalty qubit (penalty side – PS), red circles for the uncoupled case (no-penalty side – NPS). Inset: Simulated and experimental probability of Hamming distance d = 1; 2 for the encoded qubit with (PQ) or without (NPQ) the penalty qubit; “Pen” denotes the penalty qubit (the probability atd = 3 is negligible and is not shown). Good agreement is seen between the master equation and the experimental results. The NPQ case has substantially higher error probability, demonstrating the positive effect of the penalty qubit. (b) Similar to Fig. 3.5a, for an N = 16 qubit chain (20; 000 samples collected on the DW1). The logical qubits numbered 1, 7, and 16 did not have a penalty qubit, which is manifested by large peaks at these positions, at both d = 1 and d = 2. The inset shows the percentage of errors on the penalty qubits (with numbers 1; 7; 16 missing). Thed = 3 data illustrates that a flipped penalty qubit increases the probability of all problem qubits flipping even more than an absent penalty qubit. penalty strength in Figs. 3.3a and 3.3b in the main text, in terms of a model of two cou- pled logical qubits. For a given problem, there exists an optimal at which the success probability is maximized. In Fig. 3.16(a) we saw that the absence of a penalty qubit increases the bit flip probability of the associated problem qubits. Fig. 3.17 shows the results of simulations using the adiabatic master equation of Sec. 3.12, for the probabil- ity of finding the ground state of the physical system versus finding the correct ground state after decoding. For small , the two penalty versus one penalty qubit instances exhibit similar behavior, with a ground state probabilityP GS that grows with increasing 95 0 0.2 0.4 0.6 0.8 1 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 β Probability PGS PS (a) One penalty qubit 0 0.2 0.4 0.6 0.8 1 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 β Probability PGS PS (b) Two penalty qubits Figure 3.17: Adiabatic master equation simulation results for the probability of finding the undecoded ground state P GS and the decoded ground state P S for two logical qubits. We set = 0:3, = 3:18 10 4 , andt f = 20s. , whereas the decoded success probability remains more or less constant. This initial growth inP GS can be easily understood in terms of the behavior of the energy gap to the second excited state (recall that the final ground state is degenerate so any population in the first excited state joins the ground state by the end of the evolution). As we showed in the main text, this energy gap grows and shifts to earlier times in the evolution. This reduces the thermal excitation rate out of the ground state. The low lying excited states are decodable to the correct ground state, and therefore, any population excited to them is still recovered by decoding, which explains the constant decoded success probability. As is further increased, we observe that the two-penalty problem exhibits a peak in P GS as a function of , although no such peak is observed in P S . This means that there is significant probability loss to states that are still decoded to the correct ground state. However, we notice that at = 0:6, there is a large drop in P S without an accompanying large drop in P GS . This means that there is a large probability lost to a state that is decoded incorrectly. This indeed happens because for 0:6, the first excited state is actually the fully ferromagnetic state, which decodes to the incorrect ground state. Notice however that this large drop does not happen for the single penalty 96 case, because this important identity shift in the excited state spectrum does not happen over the range of explored. 3.14 The role of the problem scale and the penalty scale In order to understand the role of the problem scale and the penalty strength on the quantum annealing evolution, we analyze the effect of each separately in terms of analytically tractable models. 3.14.1 The role of First, let us consider the case of = 0, and consider a one dimensional antiferromag- netic transverse Ising model: H = h N X i=1 x i + N X i=1 z i z i+1 ! : (3.15) It is well established that in the largeN limit, the system undergoes a quantum phase transition ath c = J [LSM61, Pfe70]. Let us denote the minimum gap for this case as min = 0 . Now let us consider the time-dependent Hamiltonian: H(t) =A(t) X i x i +B(t) X i z i z i+1 ; (3.16) which is the case of interest in the main text. The minimum gap would then occur at A(t min ) =B(t min ). For monotonically decreasingA(t) and monotonically increasing 97 B(t), increasing means t min decreases. For example, consider linear interpolating functions: A L (s) = 2A 0 (1s); B L (s) = 2A 0 s; (3.17) where s = t=t f . The minimum gap occurs at s min = 1= (1 +), such that the min- imum gap is given by min = 2A 0 0 = (1 +). Therefore, increasing decreases s min , i.e., shifts the minimum gap to earlier in the evolution, and increases min . Fig- ure 3.18 shows the results of numerical simulations for an an antiferromagnetically cou- pled chain, verifying both the increase in the gap and its shift to the left. 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 t/t f Gap,Δ α=0.1 α=0.5 α=1.0 Figure 3.18: The gap to the lowest relevant (2nd and higher) excited state for an anti- ferromagnetically coupled 8-qubit chain for = 0 and different values of. The gap increases and moves to the left as is increased (see Fig. 3.3a in the main text for a similar plot with fixed and varying). 98 3.14.2 The role of In order to analytically study the effect of we resort to a much simpler model and perturbation theory. Let us consider the annealing Hamiltonian for a single qubit with linear interpolating functions: H(s) =A 0 (1s) x +A 0 s 1 2 ! z : (3.18) This has time-dependent eigenvalues given by A 0 " (s) = A 0 2 4(1s) 2 +s 2 ! 2 1=2 A 0 2 ; (3.19) with eigenvectors (in the computational basis): j" (s)i = 1 c (s) s! 2(1s) 2(1s) ; 1 ; (3.20a) j" + (s)i = 1 c + (s) s! 2 + 2 ; 1s ; (3.20b) wherec (s) are positive normalization coefficients. Let us now consider three decou- pled qubits, representing the problem qubits. The ground state of this three qubit system is simply a tensor product of the individual ground states: j0i =j" i 1 j" i 2 j" i 3 ; (3.21) 99 with energy" 0 (s) = 3" (s). The first excited state is triply degenerate: j1i =j" + i 1 j" i 2 j" i 3 ; (3.22a) j2i =j" i 1 j" + i 2 j" i 3 ; (3.22b) j3i =j" i 1 j" i 2 j" + i 3 ; (3.22c) with energy" 1 (s) = 2" (s) +" + (s). A single logical qubit To model a logical qubit which includes three problems qubits and a penalty qubit we now introduce a fourth qubit, with a Hamiltonian similar to Eq. (3.18) but with energy ! 0 !, i.e., H(s) =A 0 (1s) 4 X i=1 x i +A 0 s 1 2 ( 3 X i=1 ! z i +! 0 z 4 ): (3.23) This is chosen to ensure that the ground state and the degenerate first excited states all have the fourth qubit in its ground statej i 4 . Thus the ground state is j0i =j" i 1 j" i 2 j" i 3 j i 4 ; (3.24) and the excited states corresponding to the single bit-flip errors we are interested in are: j1i =j" + i 1 j" i 2 j" i 3 j i 4 ; (3.25a) j2i =j" i 1 j" + i 2 j" i 3 j i 4 ; (3.25b) j3i =j" i 1 j" i 2 j" + i 3 j i 4 : (3.25c) 100 Next we introduce the ferromagnetic penalty term of the main text as a perturbation: H P =A 0 s 3 X i=1 z i z 4 ; (3.26) We can calculate the first order perturbation to the energy states, and from there obtain the gap. This amounts to simply calculating the matrix elementhaj z i z 4 jai, where a = 0; 1; 2; 3. The z 4 always gives a multiplicative contribution ofh j z 4 j i. The remaining matrix elements are given by: h0j z i j0i = c 2 2 c 2 ; i = 1; 2; 3; (3.27a) h1j z 2 j1i =h3j z 2 j3i =h1j z 3 j1i =h2j z 3 j2i = c 2 2 c 2 ; (3.27b) h1j z 1 j1i =h2j z 2 j2i =h3j z 3 j3i = c 2 + 2(1s) 2 c 2 + : (3.27c) The remaining matrix elements vanish. The degeneracy of the excited states is not broken by this perturbation, and the perturbed energy spectrum is given by: E 0 = 3A 0 " (s)A 0 s 3 2c 2 c 2 2 ~ c 2 ~ c 2 ; (3.28a) E 1 = 2A 0 " (s) +A 0 " + (s) +A 0 s 2 2c 2 c 2 + c 2 + 2(1s) 2 c 2 + 2 ~ c 2 ~ c 2 : (3.28b) where ~ c is the normalization for the ground state of the unperturbed fourth qubit. Therefore the gap is =E 1 E 0 =A 0 [" + (s)" (s) (3.29) +s c 2 + 2(1s) 2 c 2 + + 2c 2 c 2 2 ~ c 2 ~ c 2 : 101 The contribution to the gap from the perturbation is positive throughout the evolution for finite values of! 0 . Furthermore, as shown in Fig. 3.19, the minimum gap shifts to earlier in the evolution. 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.88 0.9 0.92 0.94 0.96 0.98 1 1.02 s = t/t f Gap,Δ/A 0 Figure 3.19: The gap to the first relevant excited state for the unperturbed (blue curve) Hamiltonian and the perturbed (red curve) Hamiltonian with! = 1 and =! 0 = 0:1. Three qubit pairs coupled to a penalty qubit As a proxy for the case of coupled logical qubits, which is difficult to analyze ana- lytically, we consider three pairs of AF-coupled qubits, with one triple coupled to a penalty qubit. We start from a simple two-site chain, where for simplicity we set the AF-coupling to unity: H(t) =A 0 (1s) ( x 1 + x 2 ) +A 0 s z 1 z 2 : (3.30) 102 0.55 0.6 0.65 1.6 1.601 1.602 1.603 1.604 1.605 1.606 1.607 s = t/t f Gap,Δ/A 0 Figure 3.20: The gap to the first relevant excited state for the unperturbed (blue curve) Hamiltonian and the perturbed (red curve) Hamiltonian with = 0:1 and! 0 = 0:001. The energy eigenvalues are given by: " 0 A 0 =; " 1 A 0 =s; " 2 A 0 =s; " 3 A 0 =; (3.31) where = p 4(1s) 2 +s 2 , and the eigenvectors (in the computational basis"";"# ;#";##) are: j" 0 (s)i = 1 c (s) 1s; s 2 2 ; s 2 2 ; 1s (3.32a) j" 1 (s)i = 1 p 2 (0;1; 1; 0) (3.32b) j" 2 (s)i = 1 p 2 (1; 0; 0; 1) (3.32c) j" 3 (s)i = 1 c + (s) 1; s 2 + 2 ; s 2 + 2 ; 1 ; (3.32d) 103 where c (s) are normalization functions. Note that the Hamiltonian in Eq. (3.30) is invariant under z i ! z i , so that the expectation value of z i under any of the energy eigenstates is zero, h" i j z j j" i i = 0; i = 0; 1; 2; 3; j = 1; 2: (3.33) Furthermore, the only non-zero matrix elements of the sum of z i in the instantaneous energy eigenbasis are: h" 0 j z 1 j" 2 i = p 2(1s) c ; h" 3 j z 1 j" 2 i = p 2 c + h" 0 j z 1 j" 1 i = s + p 2c : (3.34) We now proceed as in the single qubit case: we make three copies of the AF chain and introduce a fourth qubit that does not interact with the three AF chains. The ground state of this seven qubit system can be written as follows: j0i =j" 0 i j" 0 i j" 0 i j i; (3.35) with energy E 0 = 3" 0 + : (3.36) The relevant excited state is the lowest excited state that does not decode to the correct ground state. This is: j2i =j" 2 i j" 0 i j" 0 i j i; (3.37) with a three-fold degeneracy because the statej" 2 i can be placed on any of the three chains, with energy E 2 = 2" 0 +" 2 + : (3.38) 104 We now again introduce the penalty as a perturbation: H P =A 0 s 3 X i=1 z 1 i z 4 ; (3.39) where z 1 i is the Pauli operator acting on the ith qubit of the first logical qubit of the AF chain. From Eq. (3.33), we obtain that the energy change vanishes to first order in perturbation theory, so we have to go to second order. The change in the ground state energy is given by: E 0 2 A 2 0 s 2 = 3 jh" 0 j z 1 j" 2 ij 2 " 0 " 2 jh j z 4 j ij 2 + 3 jh" 0 j z 1 j" 2 ij 2 " 0 " 2 + + jh j z 4 j + ij 2 + 3 jh" 0 j z 1 j" 1 ij 2 " 0 " 1 jh j z 4 j ij 2 + 3 jh" 0 j z 1 j" 1 ij 2 " 0 " 1 + + jh j z 4 j + ij 2 ; (3.40) while the change in the relevant excited state energy is: E 2 2 A 2 0 s 2 = jh" 2 j z 1 j" 0 ij 2 " 2 " 0 jh j z 4 j ij 2 + jh" 2 j z 1 j" 0 ij 2 " 2 " 0 + + jh j z 4 j + ij 2 + 2 jh" 0 j z 1 j" 2 ij 2 " 0 " 2 jh j z 4 j ij 2 + 2 jh" 0 j z 1 j" 2 ij 2 " 0 " 2 + + jh j z 4 j + ij 2 + jh" 2 j z 1 j" 3 ij 2 " 2 " 3 jh j z 4 j ij 2 + jh" 2 j z 1 j" 3 ij 2 " 2 " 3 + + jh j z 4 j + ij 2 + 2 jh" 0 j z 1 j" 1 ij 2 " 0 " 1 jh j z 4 j ij 2 + 2 jh" 0 j z 1 j" 1 ij 2 " 0 " 1 + + jh j z 4 j + ij 2 : (3.41) The perturbed gap is given by: =E 2 E 0 +E 2 E 0 ; (3.42) which is larger than the unperturbed gap, and moves slightly to the right, as shown in Fig. 3.20. Thus this incomplete model of coupled logical qubits captures the increase in the gap due to the penalty term, but not its shift to earlier time. However, we do observe 105 both the increase in the gap and its shift to the left when we numerically compute the gap for AF chains, as can be seen in Fig. 3.3a in the main text. 106 Chapter 4 Randomized Benchmarking with QAC 4.1 Context In the previous chapter, we investigated the performance of an error correction scheme for quantum annealing as applied to the problem of antiferromagnetic chains from 2 to 86 qubits in length. We found that our scheme, quantum annealing correction (QAC) outperformed all the alternative methods we studied to enhance the performance of the quantum annealer. This was accomplished through a combination of gap enhancement, spectrum manipulation, and post-readout decoding. The work we present here is an application of QAC to a broader and more complex set of problems in order to further examine the scheme’s potential. 4.2 Quantum annealing correction The error correction scheme used here is the same as that in [PAL14]. Because it is central to the experiments performed for this paper, we will describe it in detail here as well. We use a repetition code in the computational basis to suppress, detect, and correct 107 bit flip errors. This code is applied to the final HamiltonianH Ising only. We first define the encoded operators z i = n X `=1 z i ` (4.1) z i z j = n X `=1 z i ` z j ` ; (4.2) then use them to create an encoded Ising Hamiltonian H Ising = N X i=1 h i z i + N X i<j J ij z i z j : (4.3) We further define a penalty Hamiltonian H P = N X i=1 z i 1 + + z in z i P (4.4) which implements the stabilizer generators of the computational basis repetition code by providing ferromagnetic couplings between the physical qubits within the logical group ofn + 1 qubits (n “problem” qubits involved in the encoded logical operators, and one “penalty” qubit which mediates the stabilizer couplings). In this work,n = 3, allowing us to embed 2 logical qubits within each unit cell of the processor, as detailed in the previous chapter. The entire Hamiltonian of the encoded processor, then, is H(t) =A(t)H X +B(t)(H Ising +H P ); (4.5) where is a scaling factor we introduce as a degree of freedom to balanceH P against H Ising in order to optimize performance of the code. 108 Encoding with QAC produces several effects. Errors are suppressed by means of redundancy in the individual bias h i and coupling J ij terms, the effective strength of which is tripled forn = 3 because we use three devices on-chip for these terms where the unprotected problem embedding has access to only one. This amplifies the energy scale of the encoded problem, suppressing logical errors. Individual bit-flip errors are suppressed via the stabilizer terms inH P , which produce an energy penalty when qubits within a logical group have different values. Finally, simple post-readout decoding strategies provide a way to recover the solution to the logical problem from certain excited states. These strategies will be explored in more detail in Section 4.4. 4.3 Problem set In order to test the QAC strategy on a wide variety of problems that could be imple- mented on the encoded processor, a randomized benchmark strategy was devised. It is similar to the benchmarking work that has been undertaken for the unprotected D-Wave Two processor in [RWJ + 14, BRI + 14]. First, we removed any logical groups missing their penalty qubits from the set of encoded qubits used to define problems, along with their counterparts which shared the same unit cell (using these would have precluded the embedding of a classical repetition strategy using the same redundancy in the quantum hardware). Then, we defined problem sizes of 46; 66; 86; and 112 encoded qubits as shown in Figure 4.1. Each coupling J ij within the specified problem size was set to a value drawn uni- formly at random from the setf1; 5 6 ;:::; 1 6 ; 1 6 ;::: 5 6 ; 1g. Individual biasesh i were set to zero. In this way, a richly connected non-planar Ising problem was defined over the logical graph. This problem was then embedded twice in the physical processor. 109 0 1 2 3 16 17 4 5 20 21 32 33 36 37 6 7 22 23 38 39 48 49 50 51 52 53 54 55 8 9 24 25 40 41 56 57 64 65 66 67 68 69 70 71 72 73 10 11 26 27 42 43 58 59 74 75 80 81 82 83 84 85 86 87 88 89 12 13 28 29 44 45 60 61 76 77 92 93 100 101 102 103 106 107 108 109 14 15 46 47 62 63 78 79 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 1 Figure 4.1: Problem sizes studied. We generated random problems over each of the regions shown in the boxes of increasing size above. The smallest rectangle contains 46 logical qubits, the next largest 66, then 86 and 112. These problem sizes were cho- sen because they consist of square blocks of unit cells, and the treewidth of a planar square lattice (which is the arrangement of the unit cells here) grows as a function of the smallest dimension of any rectangular region chosen [RS84]. First, we embedded four independent copies of the problem on-chip, using the four qubits on each side of the unit cell to instantiate each copy: H IsingC = N X i<j 4 X k=1 J ij z i k z j k : (4.6) This implements a classical repetition strategy over the quantum hardware, which we will refer to as the C strategy. It uses all the extra resources allotted to the AQC strategy to embed the problem multiple times, giving the unprotected quantum annealer four chances to find the ground state instead of one. We also encode the logical problem using the full QAC strategy as described in section 4.2, and collect readout statistics for both strategies. This procedure was followed for 1000 randomly generated problem instances. 110 4.4 Decoding Strategies 4.4.1 Two complementary decoding methods To recover the correct solution to the logical problem from certain excited states observed after application of the QAC strategy, we use two post-readout classical decod- ing methods. Each method has complexity linear in problem size, and we will see that they are complementary in their action. The first method, which we will term logical group decoding, is the standard scheme for decoding a repetition code. It consists of taking a majority vote over the three prob- lem qubits within each logical group (this is equivalent to a majority vote over all four physical qubits with any ties broken by the majority of the three qubits that are coupled into the problem terms), which yields a single logical value for each qubit in the original problem we seek to solve. We also avail ourselves of a second method that we call problem group decoding. To understand this method, it is useful to think of the encoding as an embedding of three copies of the problem on-chip (using the problem qubits), which are tied together at each node of the logical graph by the penalty terms (enforced by the penalty qubit). From this perspective, a simple and natural way to decode is to look at the result provided by each physical copy of the problem separately and determine whether it is a solution. Graphical representations of each decoding method can be found in Figure 4.2. 4.4.2 Behavior of a problem instance In order to more closely examine the effects of the QAC encoding and the capabilities of the two decoding methods, it is instructive to look at results for a representative problem instance. The example instance shown here is one of the more difficult problems gener- ated by our random method; it lies near the 95th percentile in terms of time to solution 111 1 (a) 1 ✗ ✗ ✔ (b) Figure 4.2: Decoding methods. In logical group decoding, portrayed in panel (a), a majority vote takes place within each yellow or orange grouping. This determines the value of each logical qubit in the decoded answer. For problem group decoding, shown in panel (b), the values of the physical qubits comprising each copy of the problem (boxed in blue or green) are examined as a group to see if any problem copy has the correct solution. (i.e. 95% of the other problems we generated of its size were solvable in a shorter time). It is of sizeN = 112, which represents the entire encoded processor minus the logical groups that were not used due to missing physical qubits. Figure 4.3 shows all of the states that were observed in 1000 annealing cycles of this problem instance, colored by the decoding method they yielded to. Looking deeper, Fig- ures 4.4 through 4.7 show sample states from the decodability categories of the previous figure, illuminating the error mechanisms for which each type of decoding is particularly suited. Overall, we see that both decoding methods succeed for low Hamming weight ther- mal errors. While logical group decoding performs well for excited states consisting of a large number of random errors, problem group decoding is better equipped to address errors that are correlated through the problem coupling terms present in the QAC Hamil- tonian. Logical (rather than physical) bit flip errors result from the (gap-enhanced, via repetition and penalty terms) energy spectrum of the problem which will always remain. Although these errors are not expected to yield to any trivial-complexity postprocessing decoding method like the ones examined here, it is possible that a strategy involving 112 0 10 20 30 40 50 60 70 0 2 4 6 8 10 12 Physical Hamming distance from nearest ground state Energy relative to ground state Figure 4.3: Decodability of observed states. The plot shows the energy and Hamming distance relative to the ground state of all observed states in 1000 annealing cycles of one problem instance (one far outlier state has been cropped out). States colored red are decodable by both methods, and tend to be low in Hamming distance. States colored light blue (yellow) are decodable by logical (problem) group only, and occupy higher Hamming distances, with logical group decodable states generally higher in energy than problem group decodable states. Dark blue states are undecodable, and cluster in groups which represent sets of logical qubits flipping together (see Figure 4.7). some kind of local classical optimization may allow even these states to be recovered in the future. 113 0 1 2 3 16 17 4 5 20 21 32 33 36 37 6 7 22 23 38 39 48 49 50 51 52 53 54 55 8 9 24 25 40 41 56 57 64 65 66 67 68 69 70 71 72 73 10 11 26 27 42 43 58 59 74 75 80 81 82 83 84 85 86 87 88 89 12 13 28 29 44 45 60 61 76 77 92 93 100 101 102 103 106 107 108 109 14 15 46 47 62 63 78 79 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 1 Figure 4.4: A state decodable by both methods. In this typical example, we see two single bit flips (blue dots) isolated among many logical groups with all their physical qubits aligned with the ground state (green dots). The state is logical group decodable because single bit flips are majority-vote correctable. It is also problem group decodable because there are only two bit flips, which is not enough to ruin all three copies of the problem embedded within the QAC scheme. The magnitude (though not the sign) of the problem couplings is also shown on the diagram: pink lines indicatejJ ij j = 1 6 , and the shade of the line darkens through gradations to indigo forjJijj = 1. 4.5 Results on the benchmark set Our main results are presented in Figures 4.8 and 4.9. In Figure 4.8 we show that although the C and QAC strategies are competitive for the smaller problem sizes of N =f46; 66; 86g, they separate forN = 112. At this size, the QAC strategy outper- forms the C strategy in a statistically significant manner for every quantile of problem difficulty, with the separation between the two strategies increasing with the hardness (time to solution) of the problem. Although the break from the C strategy is sharp, this is likely an artifact of the coarseness of the scale of problem sizes studied (which in turn is due to the limited size of the chip), coupled with the lack of availability of data at 114 0 1 2 3 16 17 4 5 20 21 32 33 36 37 6 7 22 23 38 39 48 49 50 51 52 53 54 55 8 9 24 25 40 41 56 57 64 65 66 67 68 69 70 71 72 73 10 11 26 27 42 43 58 59 74 75 80 81 82 83 84 85 86 87 88 89 12 13 28 29 44 45 60 61 76 77 92 93 100 101 102 103 106 107 108 109 14 15 46 47 62 63 78 79 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 1 Figure 4.5: A logical group decodable state. Note the many single bit flips that were read out. These are all decodable via majority vote, but the state is not problem group decodable because each problem group is corrupted by at least one bit flip. optimal annealing times for problems of this size (all data was collected for the mini- mum 20s annealing time; for importance of optimal annealing times when examining scaling data see [RWJ + 14]. However, because the separation is statistically significant and appears across multiple percentiles, we view this data as very promising for QAC applied to general optimization problems. Figure 4.9 compares the performance of the two strategies for each of the 1000 instances atN = 112. We see better performance for the QAC strategy in 99% of instances studied at this largest problem size. 4.6 Robustness to qubit loss The effect of physical qubit loss on the QAC strategy was also studied. Specifically, we investigated the case where the missing physical qubits are few enough to be embedded as the penalty qubits within logical groups, so that all problem couplings would remain intact. To this end, we performed quantum annealing on the 1000 random problem 115 0 1 2 3 16 17 4 5 20 21 32 33 36 37 6 7 22 23 38 39 48 49 50 51 52 53 54 55 8 9 24 25 40 41 56 57 64 65 66 67 68 69 70 71 72 73 10 11 26 27 42 43 58 59 74 75 80 81 82 83 84 85 86 87 88 89 12 13 28 29 44 45 60 61 76 77 92 93 100 101 102 103 106 107 108 109 14 15 46 47 62 63 78 79 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 1 Figure 4.6: A problem group decodable state. This state has two logical qubits in the upper right hand corner that are loosely coupled to the rest of the problem (pink line, jJ ij j = 1 6 ) and which each have three physical qubits flipped from the ground state values (orange dots). In this case, two of the three problem qubits have flipped for those two groups, taking the penalty qubits with them. The problem qubits that flip are correlated between the two groups; they belong to the same copy of the problem because the problem coupling is strong between counterpart problem qubits (purple line). This leaves one copy of the problem fully intact, and the state can be decoded using problem group decoding. instances of each size again, but this time we removed 30%, 60%, then finally 90% of the penalty qubits from each instance at random. Again, the results here are encouraging. Figure 4.10 shows that the separation between the C and QAC strategies for multiple quantiles of problem difficulty at N = 112 persists even when up to 60% of the penalty qubits are removed. This can be explained by examining Figure 4.11, which shows that the experimentally observed opti- mal value of the penalty scaling factor (see Equation 4.5) grows as more penalty qubits are removed from the implementation. The penalty qubits that remain after some are lost provide a stronger link between the three copies of the problem that QAC embeds. 116 0 1 2 3 16 17 4 5 20 21 32 33 36 37 6 7 22 23 38 39 48 49 50 51 52 53 54 55 8 9 24 25 40 41 56 57 64 65 66 67 68 69 70 71 72 73 10 11 26 27 42 43 58 59 74 75 80 81 82 83 84 85 86 87 88 89 12 13 28 29 44 45 60 61 76 77 92 93 100 101 102 103 106 107 108 109 14 15 46 47 62 63 78 79 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 1 Figure 4.7: An undecodable state. Examine the cluster of logical qubit flips (red dots) in the lower right hand corner. Here we see that region is loosely coupled to the rest of the problem; all links going out of it are weak pink couplings. This means that the state with these logical qubits flipped together is a low-lying final excited state of the logical Ising problem, which has been suppressed via the repetition energy scale enhancement portion of QAC but is still observable in the problem instance’s output statistics. This state belongs to the cluster we see near Hamming weight 20 in Figure 4.3. This keeps the redundant instances coordinated in the face of noise, and maintains the benefits of the QAC strategy. Another important note concerning Figure 4.11 is the consistency of the experimen- tally observed value across many random problem instances. When we look at this histogram, we see only a few optimal values for each percentage of available penalty qubits. This is important because the advantages provided by QAC would be diluted if it were necessary to try many values of in the field to solve a relevant problem. In fact, taking into account the broad peaks in performance around the experimentally observed values from the antiferromagnetic chains data in [PAL14], it is likely only necessary to try a single value of chosen to be near the expected optimum to realize a benefit from encoding. 117 1.5 2 2.5 3 3.5 √ N time to solution (number of annealing cycles) √ 46 √ 66 √ 86 √ 112 25 percentile, C 25 percentile, QAC 50 percentile, C 50 percentile, QAC 75 percentile, C 75 percentile, QAC 90 percentile, C 90 percentile, QAC 95 percentile, C 95 percentile, QAC Figure 4.8: Scaling of time to solution. We plot time to solution (defined as the expected number of annealing cycles necessary to observe a ground state once with 99% proba- bility) as a function of the square root of problem size for both the C (solid lines) and QAC (dotted lines) strategies. Various percentiles of difficulty are included, as shown in the figure key. We observe better performance for the QAC strategy at all percentiles for the largest problem size. 4.7 Discussion We have shown that for a problem size ofN = 112, we observe better performance in terms of expected time to solution from our QAC strategy than from a classical strategy using the same quantum hardware resources. This is consistent with the results observed for the problem of finding the ground state of an antiferromagnetic chain, to which QAC was applied in our previous work [PAL14]. In that work, we observed QAC to 118 0.6 0.7 0.8 0.9 1 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Csuccessprobability QAC success probability Figure 4.9: Results for N = 112. Each point represents one of 1000 random prob- lem instances at this size. QAC outperforms the C strategy for approximately 99% of instances studied, in some cases by a very wide margin. outperform the competing C strategy for chains that were long, with the separation in performance increasing as we studied longer (more difficult) chains and higher noise levels. Here, we see separation increase with problem size and difficulty. Extrapolation of these results to larger problem sizes would be inappropriate given the size of the hardware used. Although it is extremely promising that we observed a separation in performance of the two strategies, the fact that we only observed this separation at the largest problem size the hardware was capable of embedding means that the shape of the curve and the future behavior of the separation is unknown. It 119 1.5 1.6 1.7 1.8 1.9 2 √ N time to solution (number of annealing cycles) √ 46 √ 66 √ 86 √ 112 50 percentile C QAC 30% missing 60% missing 90% missing (a) 1.5 2 2.5 √ N time to solution (number of annealing cycles) √ 46 √ 66 √ 86 √ 112 75 percentile C QAC 30% missing 60% missing 90% missing (b) 1.5 2 2.5 3 3.5 √ N time to solution (number of annealing cycles) √ 46 √ 66 √ 86 √ 112 95 percentile C QAC 30% missing 60% missing 90% missing (c) Figure 4.10: Effect of missing penalty qubits. While the percentiles are broken out into several panels here, the blue (C strategy) and green (QAC strategy) lines in each are the same data that was displayed in Figure 4.8. The new red, teal, and purple series show the effects of randomly removing 30%, 60%, and 90% of the penalty qubits from the QAC Hamiltonians, respectively. We see that the 30% and 60% lines track the original QAC lines closely, suggesting the code is highly resilient to qubit loss, if the inoperative physical qubits can be embedded as penalty qubits in the encoding. would be highly instructive to perform further experiments on a larger processor of the same type. QAC is also shown to be promising because of its ability to compensate for missing qubits in the physical hardware graph of a processor. If the location of the inoperative 120 0.1 0.2 0.3 0.4 0.5 0 100 200 300 400 500 600 700 800 optimal beta number of instances QAC 30% missing 60% missing 90% missing Figure 4.11: Optimal beta with qubit loss. In this histogram, we display the experi- mentally observed value of the penalty scaling parameter from the set of values we tried:f0:1; 0:2; 0:3; 0:4; 0:5g. The true optimal for the perfect QAC scheme lies between 0:1 and 0:2, but for 30% of penalty qubits missing the optimum shifts to 0:2, for 60% missing between 0:2 and 0:3, and for 90% missing to the maximum allowed value. We conclude that the stronger couplings in the remaining intact logical groups resulting from these higher optimal penalty scaling values make up for the missing penalty qubits in other logical groups. qubits is such that they can be embedded as penalty qubits in the QAC scheme, many of them can be absorbed before the performance of the encoded problem is affected. Open problems in the area of practical encoding for physically realized quantum annealing systems abound. Future work could include investigation of larger encoded problems (which would require new hardware), problem categories directly linked to real-world applications, and different embeddings of similar codes. 121 Chapter 5 Conclusion In this dissertation, we have presented an application and a multifaceted error correction scheme for adiabatic quantum optimization processors. Both were developed specifi- cally with the D-Wave hardware implementation in mind, but could easily be applied to related systems that may arise, or even extended if future systems prove to have greater capabilities. The anomaly detection algorithm builds on the excellent work that has been done to implement machine learning on these devices to provide a framework for attacking many practical problems beyond the featured application of software verifica- tion and validation. All that is necessary is for a minimal solution space encompassing the key features of the problem to be defined, and the problem posed in terms of set membership within the solution space. The identification of additional applications and fine-tuning of the algorithm to accomodate them (specifically generation of weak clas- sifier libraries well suited for the problem at hand) is the primary direction for future research related to this work. Also of interest is the development of alternate machine learning techniques for the first step of the algorithm, as discussed in Section 2.4.4. The development of error correction schemes is a topic of great interest to the field because it determines the long-term growth potential of the technology. We have demon- strated a practical error correction scheme for the D-Wave processors which uses several effects together to enhance the probability of solution. The scheme is based on a repeti- tion code, which allows multiple devices (qubits and couplers) to be used where before a single device served, thereby accessing a higher energy scale than is possible without 122 encoding. Quantum stabilizer terms which penalize mismatched values within each log- ical qubit are included in the final Hamiltonian, providing an energy penalty for states outside the codespace which increases as the computation progresses. After readout, the results are decoded in order to recover the correct logical answer from certain excited states outside the codespace. The QAC scheme was shown to be effective on both spe- cific and general problem classes, outperforming in particular repeated annealing using comparable hardware to that required for implementation of QAC on problems of suffi- cient size and difficulty, even when a significant number of physical qubits are unusable. Although we’ve learned a great deal concerning error correction for AQO systems in the wide-ranging investigations of the QAC scheme presented here, this important field presents many more open challenges to be pursued. Many of the problems in the ran- domized benchmark set over the encoded hardware graph turn out to be relatively easy, and valuable information about the performance of QAC could be gained from finding another class of problems that is more difficult for the D-Wave processors. Furthermore, the QAC formulation presented here is not the only way a repetition code can be embed- ded on the hardware, and the other possibilities bear investigation. Alternate encoding formulations may further illuminate the interplay between the various factors contribut- ing to an encoding scheme like QAC. The conclusions of this and other research will be crucial for future implementations of adiabatic quantum devices. 123 Bibliography [A. 99] A. Messiah. Quantum Mechanics. Dover Publication, New York, 1999. [AAN09] M. H. S. Amin, Dmitri V . Averin, and James A. Nesteroff. Decoherence in adiabatic quantum computation. Phys. Rev. A, 79:022107, Feb 2009. [ABLZ12] Tameem Albash, Sergio Boixo, Daniel A Lidar, and Paolo Zanardi. Quantum adiabatic markovian master equations. New J. of Phys., 14(12):123016, 2012. [ALMZ13] Tameem Albash, Daniel A. Lidar, Milad Marvian, and Paolo Zanardi. Fluctuation theorems for quantum processes. Physical Review E, 88(3):032146–, 09 2013. [ALT08] M. H. S. Amin, Peter J. Love, and C. J. S. Truncik. Thermally assisted adiabatic quantum computation. Phys. Rev. Lett., 100:060503, Feb 2008. [ATK + 09] Takao Aoki, Go Takahashi, Tadashi Kajiya, Junichi Yoshikawa, Samuel L. Braunstein, Peter van Loock, and Akira Furusawa. Quantum error correc- tion beyond qubits. Nature Physics, 5:541, 2009. [AvDK + 07] Dorit Aharonov, Wim van Dam, Julia Kempe, Zeph Landau, Seth Lloyd, and Oded Regev. Adiabatic quantum computation is equivalent to standard quantum computation. SIAM J. Comput., 37(1):166–194, January 2007. [Bar82] F Barahona. On the computational complexity of Ising spin glass models. J. Phys. A: Math. Gen, 15(10):3241, 1982. [BAS + 13] Sergio Boixo, Tameem Albash, Federico M. Spedalieri, Nicholas Chancel- lor, and Daniel A. Lidar. Experimental signature of programmable quan- tum annealing. Nat Commun, 4, 06 2013. [BCMR] Z. Bian, F. Chudak, W. G. Macready, and G. Rose. The Ising model: teaching an old problem new tricks. D-Wave Systems, 2010. [BEHW87] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Occam’s razor. Information Processing Letters, 24:377–380, 1987. [Ben80] Paul Benioff. The computer as a physical system: A microscopic quan- tum mechanical hamiltonian model of computers as represented by turing machines. Journal of Statistical Physics, 22(5):563–591, 1980. 124 [BHJ + 14] P. I. Bunyk, E. Hoskinson, M. W. Johnson, E. Tolkacheva, F. Altomare, A. J. Berkley, R. Harris, J. P. Hilton, T. Lanting, and J. Whittaker. Architec- tural considerations in the design of a superconducting quantum annealing processor. arXiv:1401.5504, 2014. [BJB + 10] A J Berkley, M W Johnson, P Bunyk, R Harris, J Johansson, T Lanting, E Ladizinsky, E Tolkacheva, M H S Amin, and G Rose. A scalable read- out system for a superconducting adiabatic quantum optimization system. Superconductor Science and Technology, 23(10):105014, 2010. [BL08] Jacob D. Biamonte and Peter J. Love. Realizable hamiltonians for univer- sal adiabatic quantum computers. Phys. Rev. A, 78:012352, Jul 2008. [BM04] John M. Boyer and Wendy J. Myrvold. On the cutting edge: Simplified o(n) planarity by edge addition. Journal of Graph Algorithms and Appli- cations, 8(3):241–273, 2004. [Bre98] L. Breiman. Arcing classifiers. The Annals of Statistics, 26:801–849, 1998. [BRI + 14] Sergio Boixo, Troels F. Ronnow, Sergei V . Isakov, Zhihui Wang, David Wecker, Daniel A. Lidar, John M. Martinis, and Matthias Troyer. Evidence for quantum annealing with more than one hundred qubits. Nat Phys, 10(3):218–224, 03 2014. [CBK09] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detec- tion: A survey. ACM Comput. Surv., 41:15:1–15:58, July 2009. [CFP01] Andrew M. Childs, Edward Farhi, and John Preskill. Robustness of adia- batic quantum computation. Phys. Rev. A, 65(1):012322, December 2001. [Cho08] Vicky Choi. Minor-embedding in adiabatic quantum computation: I. The parameter setting problem. Quant. Inf. Proc., 7(5):193–209, 2008. [Cho11] Vicky Choi. Minor-embedding in adiabatic quantum computation: II. Minor-universal graph design. Quant. Inf. Proc., 10(3):343–353, 2011. [CLS + 04] J. Chiaverini, D. Leibfried, T. Schaetz, M. D. Barrett, R. B. Blakestad, J. Britton, W. M. Itano, J. D. Jost, E. Knill, C. Langer, R. Ozeri, and D. J. Wineland. Realization of quantum error correction. Nature, 432(7017):602–605, December 2004. [CPM + 98] D. G. Cory, M. D. Price, W. Maas, E. Knill, R. Laflamme, W. H. Zurek, T. F. Havel, and S. S. Somaroo. Experimental quantum error correction. Phys. Rev. Lett., 81:2152–2155, Sep 1998. 125 [CYHH07] H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative frequent pat- tern analysis for effective classification. Data Engineering, International Conference on, 0:716–725, 2007. [DA11] Neil G. Dickson and M. H. S. Amin. Does adiabatic quantum optimization fail for np-complete problems? Physical Review Letters, 106(5):050502–, 02 2011. [DA12] Neil G. Dickson and Mohammad H. Amin. Algorithmic approach to adi- abatic quantum optimization. Phys. Rev. A, 85:032303, Mar 2012. [DAAS13] Qiang Deng, Dmitri V . Averin, Mohammad H. Amin, and Peter Smith. Decoherence induced deformation of the ground state in adiabatic quan- tum computation. Sci. Rep., 3:1479, 2013. [DJA + 13] N. G. Dickson, M. W. Johnson, M. H. Amin, R. Harris, F. Altomare, A. J. Berkley, P. Bunyk, J. Cai, E. M. Chapple, P. Chavez, F. Cioata, T. Cirip, P. deBuen, M. Drew-Brook, C. Enderud, S. Gildert, F. Hamze, J. P. Hilton, E. Hoskinson, K. Karimi, E. Ladizinsky, N. Ladizinsky, T. Lant- ing, T. Mahon, R. Neufeld, T. Oh, I. Perminov, C. Petroff, A. Przybysz, C. Rich, P. Spear, A. Tcaciuc, M. C. Thom, E. Tolkacheva, S. Uchaikin, J. Wang, A. B. Wilson, Z. Merali, and G. Rose. Thermally assisted quan- tum annealing of a 16-qubit problem. Nat. Commun., 4:1903, 2013. [FGG + 01] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, Joshua Lapan, Andrew Lundgren, and Daniel Preda. A quantum adiabatic evolution algorithm applied to random instances of an NP-Complete problem. Science, 292(5516):472–475, April 2001. [FGG02] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. Quantum adiabatic evolution algorithms with different paths. arXiv:quant-ph/0208135, 2002. [FGGS00] E. Farhi, J. Goldstone, S. Gutmann, and M. Sipser. Quantum computation by adiabatic evolution. ArXiv, 2000. [FSA99] Y . Freund, R. Schapire, and N. Abe. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14:771–780, 1999. [FWH12] Austin G. Fowler, Adam C. Whiteside, and Lloyd C. L. Hollenberg. Towards practical classical processing for the surface code: Timing analy- sis. Physical Review A, 86(4):042313–, 10 2012. [Gai08] F. Gaitan. Quantum Error Correction and Fault Tolerant Quantum Com- puting. Taylor & Francis Group, Boca Raton, 2008. 126 [GOY13] Anand Ganti, Uzoma Onunkwo, and Kevin Young. A family of [[6k, 2k, 2]] codes for practical, scalable adiabatic quantum computation, 2013. [HJB + 10] R. Harris, J. Johansson, A. J. Berkley, M. W. Johnson, T. Lanting, Siyuan Han, P. Bunyk, E. Ladizinsky, T. Oh, I. Perminov, E. Tolkacheva, S. Uchaikin, E. M. Chapple, C. Enderud, C. Rich, M. Thom, J. Wang, B. Wilson, and G. Rose. Experimental demonstration of a robust and scal- able flux qubit. Phys. Rev. B, 81:134510, Apr 2010. [HJL + 10] R. Harris, M. W. Johnson, T. Lanting, A. J. Berkley, J. Johansson, P. Bunyk, E. Tolkacheva, E. Ladizinsky, N. Ladizinsky, T. Oh, F. Cioata, I. Perminov, P. Spear, C. Enderud, C. Rich, S. Uchaikin, M. C. Thom, E. M. Chapple, J. Wang, B. Wilson, M. H. S. Amin, N. Dickson, K. Karimi, B. Macready, C. J. S. Truncik, and G. Rose. Experimental investigation of an eight-qubit unit cell in a superconducting optimization processor. Phys. Rev. B, 82:024511, Jul 2010. [JAG + 11] M. W. Johnson, M. H. S. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dick- son, R. Harris, A. J. Berkley, J. Johansson, P. Bunyk, E. M. Chapple, C. Enderud, J. P. Hilton, K. Karimi, E. Ladizinsky, N. Ladizinsky, T. Oh, I. Perminov, C. Rich, M. C. Thom, E. Tolkacheva, C. J. S. Truncik, S. Uchaikin, J. Wang, B. Wilson, and G. Rose. Quantum annealing with manufactured spins. Nature, 473(7346):194–198, 05 2011. [JF08] S. P. Jordan and E. Farhi. Perturbative gadgets at arbitrary orders. Phys. Rev. A, 77:062329, 2008. [JFS06] S. P. Jordan, E. Farhi, and P. W. Shor. Error-correcting codes for adiabatic quantum computation. Phys. Rev. A, 74(5):052322, 11 2006. [KDH + 11] K. Karimi, N. Dickson, F. Hamze, M. Amin, M. Drew-Brook, F. Chudak, P. Bunyk, W. Macready, and G. Rose. Investigating the performance of an adiabatic quantum optimization processor. Quantum Information Process- ing, page 1, 2011. [KHA14] Helmut G. Katzgraber, Firas Hamze, and Ruben S. Andrist. Glassy chimeras could be blind to quantum speedup: Designing better bench- marks for quantum annealing machines. arXiv:1401.1546, 2014. [KLM01] E. Knill, R. Laflamme, and G.J. Milburn. A scheme for efficient quantum computation with linear optics. Nature, 409:46, 2001. [Kot07] S. B. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica, 31:249–268, 2007. 127 [Lan13] Trevor Lanting. D-Wave Inc., private communications, 2013. [LGZ + 08] Chao-Yang Lu, Wei-Bo Gao, Jin Zhang, Xiao-Qi Zhou, Tao Yang, and Jian-Wei Pan. Experimental quantum coding against qubit loss error. Proceedings of the National Academy of Sciences, 105(32):11050–11054, 2008. [Lid08] D. A. Lidar. Towards fault tolerant adiabatic quantum computation. Phys. Rev. Lett., 100(16):160506, 04 2008. [Lin76] G. Lindblad. On the generators of quantum dynamical semigroups. Comm. Math. Phys., 48(2):119–130, 1976. [LMR13] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. Quantum algo- rithms for supervised and unsupervised machine learning. arXiv preprint arXiv:1307.0411, 2013. [LPS + 14] T. Lanting, A. J. Przybysz, A. Yu. Smirnov, F. M. Spedalieri, M. H. Amin, A. J. Berkley, R. Harris, F. Altomare, S. Boixo, P. Bunyk, N. Dickson, C. Enderud, J. P. Hilton, E. Hoskinson, M. W. Johnson, E. Ladizinsky, N. Ladizinsky, R. Neufeld, T. Oh, I. Perminov, C. Rich, M. C. Thom, E. Tolkacheva, S. Uchaikin, A. B. Wilson, and G. Rose. Entanglement in a quantum annealing processor. arXiv:1401.3500, 2014. [LRH09] Daniel A. Lidar, Ali T. Rezakhani, and Alioscia Hamma. Adiabatic approximation with exponential accuracy for many-body systems and quantum computation. Journal of Mathematical Physics, 50(10):–, 2009. [LSM61] Elliott Lieb, Theodore Schultz, and Daniel Mattis. Two soluble models of an antiferromagnetic chain. Annals of Physics, 16(3):407–466, 12 1961. [Luc13] A. Lucas. Ising formulations of many NP problems, 2013. [Mei03] An introduction to boosting and leveraging. In S. Mendelson and A. Smola, editors, Advanced Lectures on Machine Learning, volume 2600 of Lecture Notes in Computer Science, pages 118–183. Springer Berlin / Heidelberg, 2003. [M.H03] M.H. Freedman, A. Kitaev, M.J. Larsen, and Z. Wang. Topological Quan- tum Computation. Bull. Amer. Math. Soc., 40:31, 2003. eprint quant- ph/0101025. [Miz14] Ari Mizel. Fault-tolerant, universal adiabatic quantum computation. arXiv preprint arXiv:1403.7694, 2014. 128 [MLM07] Ari Mizel, Daniel A. Lidar, and Morgan Mitchell. Simple proof of equiva- lence between adiabatic quantum computation and the circuit model. Phys. Rev. Lett., 99:070502, Aug 2007. [MM01] S. Mannor and R. Meir. Geometric bounds for generalization in boosting. In D. Helmbold and B. Williamson, editors, Computational Learning The- ory, volume 2111 of Lecture Notes in Computer Science, pages 461–472. Springer Berlin / Heidelberg, 2001. [M.W11] M.W. Johnson et al. Quantum annealing with manufactured spins. Nature, 473(7346):194–198, May 2011. [NDRMa] H. Neven, V . S. Denchev, G. Rose, and W. G. Macready. Training a binary classifier with the quantum adiabatic algorithm. eprint arXiv:0811.0416. [NDRMb] H. Neven, V . S. Denchev, G. Rose, and W. G. Macready. Training a large scale classifier with the quantum adiabatic algorithm. eprint arXiv:0912.0779. [NRM] H. Neven, G. Rose, and W. G. Macready. Image recognition with an adia- batic quantum computer i. mapping to quadratic unconstrained binary opti- mization. eprint arXiv:0804.4457. [PAL14] Kristen L Pudenz, Tameem Albash, and Daniel A Lidar. Error-corrected quantum annealing with hundreds of qubits. Nat Commun, 5, 02 2014. [Pfe70] Pierre Pfeuty. The one-dimensional Ising model with a transverse field. Annals of Physics, 57(1):79–90, 3 1970. [QL12] G. Quiroz and D. A. Lidar. High-fidelity adiabatic quantum computation via dynamical decoupling. Phys. Rev. A, 86:042333, Oct 2012. [R. 01] R. Raussendorf and H.J. Briegel. A One-Way Quantum Computer. Phys. Rev. Lett., 86:5188, 2001. [RALZ10] A.T. Rezakhani, D.F. Abasto, D.A. Lidar, and P. Zanardi. Intrinsic geom- etry of quantum adiabatic evolution and quantum phase transitions. Phys. Rev. A, 82:012321, 2010. [RDN + 12] M. D. Reed, L. Dicarlo, S. E. Nigg, L. Sun, L. Frunzio, S. M. Girvin, and R. J. Schoelkopf. Realization of three-qubit quantum error correction with superconducting circuits. Nature, 482:382–385, February 2012. [RKH + 09] A. T. Rezakhani, W. J. Kuo, A. Hamma, D. A. Lidar, and P. Zanardi. Quan- tum adiabatic brachistochrone. Physical Review Letters, 103(8):080502–, 08 2009. 129 [R.P82] R.P. Feynman. Simulating Physics with Computers. Intl. J. Theor. Phys., 21:467, 1982. [RS84] Neil Robertson and P.D Seymour. Graph minors. iii. planar tree-width. Journal of Combinatorial Theory, Series B, 36(1):49 – 64, 1984. [RWJ + 14] Troels F. Rønnow, Zhihui Wang, Joshua Job, Sergio Boixo, Sergei V . Isakov, David Wecker, John M. Martinis, Daniel A. Lidar, and Matthias Troyer. Defining and detecting quantum speedup. arXiv:1401.2910, 2014. [SBM + 11] Philipp Schindler, Julio T. Barreiro, Thomas Monz, V olckmar Nebendahl, Daniel Nigg, Michael Chwalla, Markus Hennrich, and Rainer Blatt. Exper- imental repetitive quantum error correction. Science, 332(6033):1059– 1061, 05 2011. [Sch90] R. E. Schapire. The strength of weak learnability. Machine Learning, 5:197–227, 1990. [SL05] M. S. Sarandy and D. A. Lidar. Adiabatic approximation in open quantum systems. Phys. Rev. A, 71:012331, Jan 2005. [SLS + ] E. Stehle, K. Lynch, M. Shevertalov, C. Rorres, and Spiros Mancoridis. On the use of computational geometry to detect software faults at runtime. [SQSL13] Siddhartha Santra, Greg Quiroz, Greg Ver Steeg, and Daniel Lidar. MAX 2-SAT with up to 108 qubits. arXiv:1307.3931, 2013. [SS13] John A. Smolin and Graeme Smith. Classical signature of quantum anneal- ing. arXiv:1305.4904, 2013. [SSSV14] S. W. Shin, G. Smith, J. A. Smolin, and U. Vazirani. How ”Quantum” is the D-Wave Machine? ArXiv e-prints, January 2014. [SY13] Mohan Sarovar and Kevin C. Young. Error suppression and error cor- rection in adiabatic quantum computation ii: non-equilibrium dynamics. arXiv:1307.5892, 2013. [TBJ06] Y . Le Traon, B. Baudry, and J.-M. Jezequel. Design by contract to improve software vigilance. Software Engineering, IEEE Transactions on, 32(8):571 –586, 2006. [V AM + 14] Walter Vinci, Tameem Albash, Anurag Mishra, Paul A. Warburton, and Daniel A. Lidar. Distinguishing classical and quantum models for the D- Wave device. arXiv:1403.4228, 2014. 130 [WRB + 13] Lei Wang, Troels F. Rønnow, Sergio Boixo, Sergei V . Isakov, Zhihui Wang, David Wecker, Daniel A. Lidar, John M. Martinis, and Matthias Troyer. Comment on: ‘Classical signature of quantum annealing’. 2013. [YBKL13] Kevin C. Young, Robin Blume-Kohout, and Daniel A. Lidar. Adiabatic quantum optimization with the wrong hamiltonian. Physical Review A, 88(6):062314–, 12 2013. [YKS08a] A. P. Young, S. Knysh, and V . N. Smelyanskiy. Size dependence of the minimum excitation gap in the quantum adiabatic algorithm. Phys. Rev. Lett., 101:170503, 2008. [YKS08b] A. P. Young, S. Knysh, and V . N. Smelyanskiy. Size dependence of the minimum excitation gap in the quantum adiabatic algorithm. Phys. Rev. Lett., 101:170503, Oct 2008. [YL04] L. Yu and H. Liu. Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res., 5:1205–1224, 2004. [YSBK13] Kevin C. Young, Mohan Sarovar, and Robin Blume-Kohout. Error sup- pression and error correction in adiabatic quantum computation: Tech- niques and challenges. Physical Review X, 3(4):041013–, 11 2013. [ZLS12] Jingfu Zhang, Raymond Laflamme, and Dieter Suter. Experimental imple- mentation of encoded logical qubit operations in a perfect quantum error correcting code. Phys. Rev. Lett., 109:100503, Sep 2012. [ZR99] Paolo Zanardi and Mario Rasetti. Holonomic quantum computation. Physics Letters A, 264(2–3):94–99, 12 1999. [ZZY03] S. Zhang, C. Zhang, and Q. Yang. Data preparation for data mining. Applied Artificial Intelligence, 17:375–381, 2003. 131 Appendix A The D-Wave Device: Lessons from Experience A.1 Motivation This appendix is meant to be a review of D-Wave processor programming and engineer- ing considerations from a user’s perspective. It may be useful as a starting point for new researchers in the field, or a unified set of references for those already proficient in the use of the device. Reading it will offer insight into the principles and methods underlying the main body of the dissertation, but this guide should also be able to stand on its own as a resource. We address considerations relevant to the design of new applications for the D- Wave adiabatic quantum optimization processors and the interpretation of experimental results. Practical suggestions for programming embedding are offered in order to make best use of the degrees of freedom available to enhance the quality of results. Cita- tions to scientific work which introduces or exemplifies the techniques considered are included throughout. 132 A.2 Physics of the implementation We will not attempt an extensive treatment of the basic system and device physics here, but rather refer the reader to several excellent sources in the literature. First, it is essen- tial to understand the principles of closed-system adiabatic quantum computation (AQC) as established by Farhi et. al. [FGGS00, FGG + 01] and treated in Section 1.2. At zero temperature, with no environmental noise, the adiabatic theorem guarantees that a sys- tem obeying a time-dependent Hamiltonian of the formH AQC (t) =A(t)H I +B(t)H F , where A(0) B(0) and A(T ) B(T ) will reach the ground state of H F at time T with high probability if T is long enough. The necessary computation time T is inversely proportional to the square of the minimum gap between the ground state and the first excited state, , and directly proportional to the first derivative of the matrix element connecting the ground state and the first excited state [A. 99]. One can exploit the degrees of freedom in this model by optimizing the Hamiltonian for a smooth adi- abatic evolution path with a large gap [RKH + 09], by adding a driver Hamiltonian term that is zero at the beginning and end of the computation but has beneficial properties with regards to the gap in between [FGG02], or by manipulating the initial Hamiltonian [DA11, DA12]. In the case of the D-Wave system, the initial HamiltonianH I is fixed toH X = P i h i x i , and the final HamiltonianH F must follow an Ising spin glass form dictated by the hardware layout,H Ising = P i h i z i + P i;j J ij z i z j . This limited form of AQC has often been referred to as adiabatic quantum optimization (AQO) because the H Ising is fundamentally an optimization problem. Another factor that must be taken into account when considering the D-Wave proces- sors is that, as a physical realization of AQO, they constitute an open system subject to various types of environmental noise. AQO on an open system is sometimes referred to as quantum annealing (QA) in the literature. This has consequences for the performance of the computation. The adiabatic idea of an instantaneous spectrum of continuously 133 changing eigenstates persists, but now the system may be excited out of or relax back into the ground state via noise processes over the course of the computation [ABLZ12]. The presence of these mechanisms can be helpful [ALT08, DJA + 13], but may change the nature of the class of implementable problems [KHA14]. The details of the qubit and coupler implementation are typically of interest to the user only as they impact the environmental noise seen by the system and the ways in which the available qubits deviate from the abstract theoretical model. These char- acteristics will be related in the sections that follow. However, some basic facts and references are relevant here. The D-Wave quantum processors are composed of highly customized superconducting flux qubits - essentially loops of niobium interrupted by Josephson junctions which store their state in the direction of the supercurrent. The basic system and the qubits are well described in published literature [M.W11, HJB + 10]. Also on-chip is an extensive suite of specialized control hardware designed to make the fabricated devices as identical as possible and implement coupling, programming, and annealing [HJL + 10, BHJ + 14]. The functioning of multi-qubit regions of the chip has been studied, particularly with regards to identifying entanglement within the physical system [LPS + 14]. A.3 Hardware layout and embedding A.3.1 The Ising model One of the biggest challenges inherent in programming for the D-Wave device is that it is not a universal adiabatic quantum computer. Instead, it is an optimizer over a 2- local Ising spin glass defined on a particular physically motivated connectivity graph. Given these restrictions, plus the limited size of the machine, it is important to reduce any potential application to the essential elements which are well-suited to the use of 134 QA. A good QA programmer should consider the quantum annealer as a powerful co- processor to the classical resources already available. Any pre- or post-processing that can be done efficiently on a classical computer should be handled there. In this way, the programmer makes the best use of the QA processor by using it only for the portions of the application for which it can offer a speedup or some other benefit such as enhanced accuracy. For example, consider the verification and validation “triplex monitor” application developed in Chapter 2 of this dissertation. It takes as input the values generated by three identically constructed, redundant systems over a number of time steps to deter- mine if one of the three systems is failing. The values themselves are never passed to the quantum processor; instead, the optimization is constructed using binary variables that represent equality between each pairing of the three systems within a given time step. It is trivial to use classical preprocessing to generate the binary variables, with which implementation of the more interesting calculation on the quantum annealer is far simpler. Many problems have already been mapped to Ising optimizations [BCMR, Luc13]. It may be useful to develop applications centered around these problems as the quantum- processing portion of the work, or to consider these example mapping techniques for new problems. A.3.2 The Chimera graph Once the desired QA application has been reduced to the Ising model, it must be further mapped to fall within the implemented connectivity graph. The “Chimera”-structured hardware graph for the D-Wave Two Vesuvius processor installed at USC is shown in Figure A.1. If the problem is highly local, it may be directly embeddable within the Chimera graph. If possible, this is the superior option. It allows for the gathering of 135 statistics directly on the energy spectrum of the problem at hand, which is advantageous because in some cases the existence and position of higher energy levels may be valuable information. However, a direct embedding approach will often be impossible. The hardware graph is highly specialized, and it is unlikely that there are many problems which will fit it natively. In the absence of a direct embedding, the programmer must decide which approximations to make in order to create an implementable form. The primary resource available in this respect is the creation of groups of physical qubits coupled together to form a logical qubit. The logical qubit then has access to all of the couplings of its physical members, broadening the available graph structure considerably. Chimera was originally designed to allow the embedding of fully connected graphs of arbitrary size using regular embeddings of logical qubits [Cho08, Cho11]. In practice, the embedding of complete graphs is a poor strategy for two reasons: most problems do not require a fully connected graph, and the large logical groups necessary generate too many errors. The creation of logical qubits introduces additional energy levels to the implemented problem spectrum, allowing more opportunities for error and clouding the statistics of the original problem. For these reasons, it is in the interest of the programmer to keep any logical qubits used as small as possible in terms of physical qubits. When logical qubits must be used, in particular larger groups, certain strategies can help keep them more perfectly aligned and improve the quality of solutions. First, keep in mind that antiferromagnetic chains consistently outperform ferromag- netic chains on this hardware. One reason is that D-Wave’s couplers perform more reliably in the antiferromagnetic regime than in the ferromagnetic regime (see Figure A.2). An antiferromagnetic coupling experiences smaller variability between requested and implemented coupling value, thereby improving performance. The other mecha- nism at work is non-ideal interactions between D-Wave’s qubits and couplers, allowing 136 approximately 5% bleedthrough of coupling values to become individual bias values on the coupled qubits. This means that a ferromagnetic chain will be biased toward the all +1 rather than the all1 degenerate ground state, a break in symmetry that could be quite consequential if delicately balanced optimization problems are embedded using ferromagnetically coupled logical qubits. In contrast, for an antiferromagnetic chain the bias from bleedthrough applies equally to each degenerate ground state (favoring1 values for all qubits when the ground state is an equal number of +1 and1 values), so the symmetry-destroying argument does not apply. Although it is natural to construct a logical qubit with ferromagnetic couplings between the physical qubits because they all represent the same value, the logical course of action on the D-Wave hardware is to use antiferromagnetic couplings in place of fer- romagnetic couplings as much as possible. For instance, if two qubits in distant parts of the chip must be connected using a long chain, an antiferromagnetic chain should be used. If the chain is of even length, the two ends may have opposite values in the ground state of the chain, but this is easily solved by reversing the sign of the couplings on one end of the chain. Similarly, if there is a coupling in the middle of the chain, or if the logical qubit is not a chain but a cluster of physical qubits, antiferromagnetic couplings may still be used, the ground state of the cluster found (this is an easy problem), and the signs of the problem couplings can be reversed for members of the cluster which should take the opposite value from a ferromagnetically-coupled cluster. If the probability of success within logical qubits is still insufficient and additional physical resources are available on-chip, the logical qubits could be encoded according to the QAC scheme developed in Chapters 2 and 3 of this dissertation. A further option, in cases where a few qubits scattered widely over the chip must take the same value, could be not to connect them physically at all, but rather to run two different versions of the problem with each member of this “virtual logical qubit” first biased strongly to1, 137 then to +1. A QA approach may still have value for solving the rest of the problem in such a scenario. A.4 Practical parameter setting A.4.1 Thermalization time When the D-Wave processors are programmed, heat is introduced into the system. Allowing sufficient time for the system to cool after programming is important to get accurate results, and generally the default value of the post-programming thermalization time is insufficient. A simple test has been quite effective in determining the neces- sary thermalization time and should be valid for hardware versions beyond the D-Wave Two Vesuvius chip discussed here. Set up a long antiferromagnetic chain through the processor (this should be simple to embed): H Ising = P N1 i=1 z i z i+1 . With the thermal- ization time set to the default, take a significant number (thousands) of samples of the outcome of the annealing process using the “raw” data collection setting to produce a time-ordered series. Bin these into smaller sets of samples (50 100), and plot the probability of finding a ground state within each bin. When the probability converges to a constant value, sufficient time has passed and the system is thermalized; set the post- programming thermalization time for future experiments to the default plus the time used to collect the pre-convergence samples. For the current D-Wave Two Vesuvius chip, this time has been found to be approximately 10 milliseconds. A.4.2 Optimal annealing time The existence of noise must be taken into account when considering the computation time. In a closed system AQO, a longer computation time always enhances the ground 138 state fidelity at the end, because it makes the process more perfectly adiabatic in a set- ting where the only source of error is non-adiabatic transitions due to jumping energy levels at an avoided crossing. For an open system, this effect must be balanced with the reality that a longer computation means more time the system spends exposed to its noisy environment. For this reason, we expect QA to exhibit an optimal computation (annealing) time, beyond which the effects of thermal noise overwhelm the advantages gained by slowing down the computation [RWJ + 14]. In effect, the optimal annealing time is the tipping point between the regime where the dominant error mechanism is non-adiabatic transitions and the regime in which thermal errors are more prominent. The time itself is highly problem-dependent, and as of this writing any prediction of the optimal annealing time for a given problem is an open question. A.4.3 Problem specification issues and gauge averaging There is another error mechanism for the D-Wave processor that has not yet been addressed here: problem mis-specification. When the terms of the final, Ising Hamil- tonian are programmed into the device, the problem that is physically realized may not be exactly the same as the problem specified by the programmer. If the two problems are different enough, the solution to the problem on the hardware may not be the same as the solution to the specified problem, and the device may end up correctly solving the wrong problem. This type of error arises from several sources on-chip, and is what D-Wave tries to encompass within its bits of precision figure. D-Wave advertises 4 bits of precision for its Vesuvius chip, meaning that 2 4 evenly spaced distinct values can be used in a problem before there is a significant chance of the implemented values cross- ing (the programmer requestedh 1 > h 2 but on-chiph 1 < h 2 ). In practical terms, this usually means that the chip is programmed in increments of 1 7 , enabling the inclusion of 1, 0, and +1. Problems using fewer bits of precision are more reliable. 139 Much, though not all, of the problem mis-specification error arises from imperfec- tions at the time of chip fabrication and the need to calibrate the qubits to make them as identical as possible. A tremendous amount of on-chip hardware exists solely to deal with this issue, to tune the physics of the qubits and couplers as close to the theoretical curves as feasible and synchronize their behavior. However, this tuning has effects on the interaction of the qubits and couplers with the digital to analog converters (DACs) used to store the problem values programmed in to define the Ising Hamiltonian. The discrete set of values the DACs can take define theh orJ values that can be set, however the mapping from DAC to Ising parameter is unique to each qubit or coupler, determined by the calibration that was used on the individual device. This means that each value set by the programmer is approximated using the nearest available DAC value, which is different for different qubits, but not random at the time of programming. Instead, the error was predetermined by fabrication errors and calibration corrections, and will be consistent over multiple attempts to program the same problem into the device. Gauge averaging is the most effective way to deal with problem mis-specification errors. It involves programming the same problem into the chip multiple times, but with the values of half of the qubits randomly reversed in sign - reversing the sign of the associated problem values as well, and causing whatever error exists to have the opposite effect. A single “gauge” of a problem begins by determining the flip vector f2f1; 1g n , wheren is the number of qubits in the problem and each elementf i is drawn uniformly at random fromf1; +1g. The gauge version of the problem is then H Ising, gauge = X i f i h i z i + X i;j f i f j J ij z i z j ; (A.1) which creates a version of the problem with a solution that can be mapped back to the original solution by multiplying by the flip vector (each qubit withf i =1 in the flip vector has its sign reversed in the new solution). When enough of these gauges are 140 applied and the resulting samples of the solution space averaged, the effects of problem mis-specification are mitigated. A.4.4 Sampling The dominant mode of use of the D-Wave processors is as samplers. The sampling is over the spectrum of low-lying energy levels at the end of the computation, for which the frequencies of observation fall into a somewhat Boltzmann-like distribution for a sufficient number of annealing runs. This is an important consideration for applications; if information can be gained from the excited states as well as the ground state, the usefulness of the machine is enhanced. In general, the processor is programmed once, cooled, and then the problem is run many times (a typical starting number of samples is 1; 000) to determine the ground state and the makeup of the adjacent spectrum. It is important to note that the results of each individual annealing cycle are not independent from the results of those immediately before and after it in time. For this reason, most data should be collected in time-ordered “raw” mode so that time correlations may be noted and quantified into error measures (for an example of such error analysis, see [RWJ + 14]). A.4.5 Problem scaling Although the operating temperature of the machine is fixed, and may not be increased as a means of accessing the higher excited spectrum, it is within the programmer’s con- trol to reduce the scale of the entire problem (within the precision limit of the equip- ment). This temperature control workaround enables better investigation of thermal noise effects, especially useful when developing methods of avoiding or mitigating noise. Another frequently-overlooked fact regarding the scale at which problems are 141 embedded is that h i bias values on individual qubits have twice the range of J ij cou- pling values, so that one can embed a term likeh 1 = 2J 1;2 without forcing the scale of theJ values to be reduced in implementation. A.5 Conclusion Although it is designed as a limited quantum optimizer, the D-Wave device is still a ver- satile problem solver. Whether it allows a quantum speedup is still an open question, and the field of applications for the device is just beginning to be explored. Many excellent techniques for mapping problems to and working around the less than ideal characteris- tics of this class of processors have been developed. A programmer making use of them should be able to rapidly expand the body of knowledge regarding the capabilities of this type of hardware. Good luckigure A.1: Graph of the qubits (green/gray circles represent operational/inaccessible qubits) and couplers (black lines) available on the USC/ISI D-Wave Two Vesuvius chip. This layout, composed of eight qubit bipartite unit cells connected in a grid, is known as the Chimera graph. 143 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 −2 −1.5 −1 −0.5 0 0.5 1 1.5 x 10 −12 coupler bias (Φ x co /Φ 0 ) M eff (pH) Typical V7 coupler response curve Figure A.2: Coupling parameter as a function of control variable. The curve is much steeper in the ferromagnetic regime (negative coupling values), and the digital to analog converter takes values evenly spaced along the horizontal axis to set the coupling. This enables more precision in the choice of an antiferromagnetic coupling than a ferromag- netic coupling. 144
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Quantum steganography and quantum error-correction
PDF
Applications of quantum error-correcting codes to quantum information processing
PDF
Quantum error correction and fault-tolerant quantum computation
PDF
Lower overhead fault-tolerant building blocks for noisy quantum computers
PDF
Error correction and quantumness testing of quantum annealing devices
PDF
Quantum coding with entanglement
PDF
Quantum feedback control for measurement and error correction
PDF
Quantum computation and optimized error correction
PDF
Towards efficient fault-tolerant quantum computation
PDF
Protecting Hamiltonian-based quantum computation using error suppression and error correction
PDF
Open quantum systems and error correction
PDF
Dynamical error suppression for quantum information
PDF
Imposing classical symmetries on quantum operators with applications to optimization
PDF
Towards optimized dynamical error control and algorithms for quantum information processing
PDF
Error suppression in quantum annealing
PDF
Towards robust dynamical decoupling and high fidelity adiabatic quantum computation
PDF
Topics in quantum information and the theory of open quantum systems
PDF
Tunneling, cascades, and semiclassical methods in analog quantum optimization
PDF
Topics in quantum cryptography, quantum error correction, and channel simulation
PDF
Error correction and cryptography using Majorana zero modes
Asset Metadata
Creator
Pudenz, Kristen L.
(author)
Core Title
Applications and error correction for adiabatic quantum optimization
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
09/04/2014
Defense Date
06/16/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
adiabatic,OAI-PMH Harvest,quantum computing,quantum error correction,quantum information
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lidar, Daniel A. (
committee chair
), Brun, Todd A. (
committee member
), Spedalieri, Federico (
committee member
), Ver Steeg, Greg (
committee member
)
Creator Email
kristen.pudenz@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-468551
Unique identifier
UC11287026
Identifier
etd-PudenzKris-2879.pdf (filename),usctheses-c3-468551 (legacy record id)
Legacy Identifier
etd-PudenzKris-2879.pdf
Dmrecord
468551
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Pudenz, Kristen L.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
adiabatic
quantum computing
quantum error correction
quantum information