Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Trainability, dynamics, and applications of quantum neural networks
(USC Thesis Other)
Trainability, dynamics, and applications of quantum neural networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
TRAINABILITY, DYNAMICS, AND APPLICATIONS OF QUANTUM NEURAL NETWORKS by Bingzhi Zhang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (PHYSICS) August 2024 Copyright 2024 Bingzhi Zhang Acknowledgements My doctoral study was a long but interesting and precious journey. Through the years, I received supports and helps from many people, which makes this thesis and all research works possible. First, I would like to express my greatest gratitude to my advisor, Quntao Zhuang, for his guidance and support through my Ph.D. since the first day I joined the group. I appreciated his guidance on how to raise a meaningful and tractable question, and he has always been open to ideas and suggestions I came up with. He has always been willing to discuss every details in my work, and I learned a lot from this altitude about how to do real scientific research. I am also influenced by his flexible approaches in thinking and handling hard problems, thus we can pave the way to the solution and find new physical insight. Especially, it is very fortunate for me to work with Quntao at both University of Arizona and University of Southern California. I cannot value more on these unforgettable five years, within which I grew from a bachelor to a qualified Ph.D. candidate. I am very fortunate to have the opportunities to work with Akira Sone, Linran Fan, Liang Jiang, Xiaohui Chen, Zhi-Cheng Yang (alphabetic order). Akira owns great knowledge in quantum algorithm and quantum control, and the connection he pointed out from our results on gradient transition to the dynamical Lie Algebra, helped to the physical insight. He is not only a great collaborator, but provided many suggestions and much help to my study and life. Linran is a great experimental physicist in photonics, and through the discussions of the project, he offered a lot of suggestions, and showed me how an experimentalist would think and evaluate a question. Liang has profound knowledge in quantum science and engineering, and ii provide numerous insightful suggestions and works. He firstly pointed out the duality between kernel and error from the dynamical equation we modeled, which helped us form a unified theory finally. Xiaohui is a great mathematician with deep understanding in both statistics and machine learning. I really appreciate the through and insightful discussions through the project, where we formulate the quantum ensemble learning problem and propose a model inheriting the success from classical machine learning studies. Zhi-Cheng is a great condensed matter physicist and broadened my view to the Non-equilibrium physics like quantum many-body scar. I also want to thank Zheshen Zhang and Jeffrey Shapiro for their contribution to experiments and theories, and David Simmons-Duffin, Alexey V. Gorshkov, Maria Schuld for helpful discussions. I also want to thank all of my brilliant collaborators, Haowei Shi, Jing Wu, Jia-Jin Feng, Junyu Liu, Peng Xu, Pengcheng Liao, Xiao-Chuan Wu. Haowei made great design in optical receivers utilizing entanglement, making a quantum advantage possible. Junyu provided a lot of helpful suggestions through our discussions based on his understanding of both quantum and classical machine learning. Jing assisted the project with his calculation in transductions. It’s a pleasure to work with Jia-Jin with great ideas in condensed matter physics. I benefited a lot from the collaboration with Peng in both code-writing and fruitful discussions on classical optimal transport. Pengcheng did hard work in our collaboration, and numerous numerical simulation results paved the way to identify interesting phenomena in classification problems. The strong background in condensed matter theory and statistical field theory of Xiao-Chuan helped us unveil the approach to formulate our theory in a systematic Hamiltonian mechanics. I would like to thank my dissertation committee at USC, Daniel Lidar, Eli Levenson-Falk, Paolo Zanardi and Todd Brun for their willing to attend my thesis defense. I also want to thank Shufang Su, Sumit Mazumdar, Stefan Meinel, Zheshen Zhang who served as the committee member for my comprehensive exam when I was at UofA. Especially I want to thank Zheshen for his help in my transition from UofA to USC. I also want to thank my academic advisors, James Boedicker (USC) and Brian LeRoy (UofA) for iii their help in my whole Ph.D. life. I cannot value more on James’work to help me move to the physics department at USC and help me clarify a lot of things including course credits, graduation, etc. I would like to thank my labmates, Anthony J. Brady and Xiaobin Zhao. I also want to thank Chaohan Cui, Changhao Li, Guoqing Wang, Hong Qiao, Matthias C. Caro, Shengqi Sang, Xuchen You, Yuanhang Zhang, Yuhan Liu, Yuan Liu for fruitful discussions and advice. Thanks to my friends, Bowei Zhou, Boyu Zhou, Feng Wu, Junyu Zheng, Xiaohang Chen, Jianbo Gong, Jingwei Liu, Yangyang Cai, Zhixue Shu, especially for Junyu and Yu’s help in my junior years. Personally, thanks to Zihao Niu for the happiness from his radio, and Hideo Kojima, Hidetaka Miyazaki, Toshihiro Kondo, Yoko Taro’s for all the happiness and thoughts I earned from their products. I also want to thank financial support from NSF, DARPA, ONR, ARO, DOE and Cisco. Finally, I want to express my great gratitude to my family, Xiangrong Liu (mother) and Zhimin Zhang (father). Without their support and encouragement, it is impossible for me to complete the Ph.D. career. iv Table of Contents Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Quantum neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Optimization of QNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Applications of QNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2: Quantum computational phase transition in combinatorial problems . . . . . . . . . . . 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1.1 Quantum Approximate Optimization algorithm . . . . . . . . . . . . . . 8 2.2.1.2 SAT problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Gradient of QAOA training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2.1 Connection to controllability . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2.2 Barren plateau and complexity . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.3 Accuracy of QAOA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Comparison with quantum adiabatic algorithm . . . . . . . . . . . . . . . . . . . . 22 2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.1 Many-body Formulation of the problem Hamiltonian . . . . . . . . . . . . . . . . . 24 2.4.2 Dynamical Lie algebra: definition and bounds . . . . . . . . . . . . . . . . . . . . . 25 2.4.3 Classical approximate algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 3: Dynamical phase transition in quantum neural networks with large depth . . . . . . . . 28 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Training dynamics of quantum neural networks . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Dynamical phase transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 Generalized Lotka-Volterra model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.5 Statistical physics interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 v 3.6 Unitary ensemble theory for QNN dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.9 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.9.1 QNN ansatz and details of the tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.9.2 Details of restricted Haar ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.9.3 Haar ensemble results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Chapter 4: Applications of quantum neural network in supervised and unsupervised learning . . . 51 4.1 Fast decay of classification error in variational quantum circuits . . . . . . . . . . . . . . . 51 4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1.2 Circuit architecture and main results . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.3 Ensemble of states under discrimination . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1.3.1 States generated from local quantum circuits . . . . . . . . . . . . . . . . 57 4.1.3.2 Ground states of many-body systems . . . . . . . . . . . . . . . . . . . . 59 4.1.4 Performance of the brickwall VQC in state discrimination . . . . . . . . . . . . . . 60 4.1.4.1 Fast error decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.1.4.2 MLE’s superiority over single measurement schemes . . . . . . . . . . . 62 4.1.4.3 Linear growth of complexity . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.1.4.4 Comparison between state generation and discrimination: performance and trainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.5 Performance benchmarks of different ansatz . . . . . . . . . . . . . . . . . . . . . . 65 4.1.5.1 Comparison between different architectures . . . . . . . . . . . . . . . . 66 4.1.5.2 Discriminative power versus scrambling power . . . . . . . . . . . . . . 67 4.1.5.3 Performance of simplified gate sets . . . . . . . . . . . . . . . . . . . . . 70 4.1.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2 Generative quantum machine learning via denoising diffusion probabilistic models . . . . 74 4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.2 General formulation of QuDDPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2.3 Training strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2.4 Loss function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.5 Gate complexity and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.6.1 Learning correlated noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2.6.2 Learning many-body phases . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2.6.3 Learning nontrivial topology . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.8 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2.8.1 Details of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2.8.2 Wasserstein distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.2.8.3 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Chapter 5: Trainability and application of hybrid QNN . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.1 Energy-dependent barren plateau in bosonic variational quantum circuits . . . . . . . . . . 89 5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.1.2.1 Energy-regularized circuit ensemble . . . . . . . . . . . . . . . . . . . . . 93 5.1.2.2 Barren plateau in state preparation . . . . . . . . . . . . . . . . . . . . . 96 vi 5.1.2.3 Strategies to circumvent training issues . . . . . . . . . . . . . . . . . . . 105 5.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.1.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.1.4.1 Gaussian states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.1.4.2 Universality of ECD gate sets: proof of Lemma 6 . . . . . . . . . . . . . . 110 5.2 Hybrid entanglement distribution between remote microwave quantum computers empowered by machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2.2 System setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.2.3 Entangling microwave modes via CV swap . . . . . . . . . . . . . . . . . . . . . . 115 5.2.4 Conversion and distillation protocols . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.2.4.1 Direct conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.2.4.2 Variational hybrid conversion and distillation . . . . . . . . . . . . . . . 118 5.2.4.3 Performance comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.2.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Chapter 6: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Supplemental Material for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 A.1 Distribution of Hamiltonian coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 A.2 Gate-based implementation of QAOA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 A.3 Details of the classical approximate algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 153 A.3.1 On approximation ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 A.4 Initialization strategy of QAOA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Supplemental Material for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 B.1 Derivation of Eq. (3.8) of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 B.2 Details of autocorrelators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 B.3 Observable trace properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 B.3.1 One-body observable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 B.3.2 Two-body observable: XXZ model . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 B.4 Method in ensemble average calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 B.5 Frame potential with restricted Haar ensemble . . . . . . . . . . . . . . . . . . . . . . . . . 166 B.5.1 Frame potential applied to QNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 B.5.2 Details of formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 B.6 Results with Haar random ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 B.6.1 Average QNTK under Haar random ensemble . . . . . . . . . . . . . . . . . . . . . 169 B.6.2 Average relative dQNTK under Haar random ensemble . . . . . . . . . . . . . . . . 170 B.6.3 Average dynamical index with Haar random ensemble . . . . . . . . . . . . . . . . 174 B.6.4 Fluctuations of error and QNTK under Haar random ensemble . . . . . . . . . . . 187 B.6.4.1 Relative fluctuation of total error under Haar random ensemble . . . . . 188 B.6.4.2 Relative fluctuation of QNTK under Haar random ensemble . . . . . . . 189 B.7 Results with restricted Haar ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 B.7.1 Average QNTK under restricted Haar ensemble . . . . . . . . . . . . . . . . . . . . 199 vii B.7.2 Average relative dQNTK under restricted Haar ensemble . . . . . . . . . . . . . . . 200 B.7.3 Average dynamical index under restricted Haar ensemble . . . . . . . . . . . . . . 206 Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Supplemental Material for Section 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 C.1 Preliminary of state discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 C.1.1 General Helstrom limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 C.1.2 MLE decision for general state discrimination . . . . . . . . . . . . . . . . . . . . . 213 C.2 Evaluation of ⟨PH (ψ0, ψ1)⟩H(D0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 C.2.1 n ≫ 1 limit of finite D0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 C.2.2 D0 ≫ 1 limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 C.3 Local random gates construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Supplemental Material for Section 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 D.1 Review of Classical DDPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 D.2 Reproducing kernel Hilbert spaces and maximum mean discrepancy . . . . . . . . . . . . . 220 D.3 Computation of Wasserstein distance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 D.4 SWAP test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 D.5 Forward and backward circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 D.6 Additional details on distance metrics evaluation . . . . . . . . . . . . . . . . . . . . . . . . 225 D.7 Details of simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 D.8 Benchmarks: QuDT and QuGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Appendix E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Supplemental Material for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 E.1 Supplemental Material for Section. 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 E.1.1 Representation of states: single-mode case . . . . . . . . . . . . . . . . . . . . . . . 232 E.1.2 Methods for gradient evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 E.1.2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 E.1.2.2 Ensemble average of four-fold product of weights . . . . . . . . . . . . . 244 E.1.3 Variance of gradient in state preparation of single-mode CV state . . . . . . . . . . 246 E.1.3.1 Single-mode Gaussian states . . . . . . . . . . . . . . . . . . . . . . . . . 253 E.1.3.2 Fock number states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 E.1.4 Variance of gradient in preparation of multimode CV states . . . . . . . . . . . . . 261 E.1.4.1 Product states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 E.1.4.2 Multimode Gaussian states . . . . . . . . . . . . . . . . . . . . . . . . . . 270 E.2 Supplemental Material for Section. 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 E.2.1 Entanglement measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 E.2.2 Solving the direct swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 E.2.3 Qubit distillation protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 viii List of Tables 4.1 The number of parameters and CNOT/CZ gates of brickwall ansatz, sVQC and real sVQC ansatz to the leading order O(nD) with depth D in a system of n qubits. . . . . . . . . . . 70 4.2 List of hyperparameters of QuDDPM and its performance in different generative learning tasks. To test the performance after training, we randomly sample Nte random noise states, and perform the optimized backward PQC to generate the sampled data. Data set size Ntr = Nte = N. n is the number of data qubit n and nA the ancilla qubit. L is the PQC depth. T is the diffusion steps. For cluster state generation, we evaluate the average fidelity with the center state in each cluster, i.e. |0⟩ for single-qubit and |0, 0⟩ for two-qubit in Sec. 4.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3 List of hyperparameters of QuDDPM, QuDT and QuGAN for generating clustered state in Fig. 4.14 and Fig.8 in Sec. 4.2. Data set size Ntr = Nte = N. n is the number of data qubit n and nA the ancilla qubit. In performance, the mean fidelity with the center state of the cluster |0, 0⟩ is F0 = E|ψ⟩∈S˜| ⟨0, 0|ψ⟩ |2 and for true data it is F0,data = 0.977 ± 0.014 in Sec. 4.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 D.1 Summary of prior work results on quantum generative learning, including quantum generative adversarial network (QuGAN) and quantum circuit born machine (QuCBM). . . 230 E.1 The number of θℓ in Ts,a and Tr,a with corresponding form. . . . . . . . . . . . . . . . . . 243 E.2 A satisfiability test of Eq. (E.57) for all possible combination of sℓ , s ′ ℓ , rℓ , r ′ ℓ with 1 ≤ ℓ ≤ L − 1 up to a global reverse of signs. . . . . . . . . . . . . . . . . . . . . . . . . . 251 ix List of Figures 2.1 SAT-UNSAT phase transition and trainability of 3-SAT (left column) and 2-SAT. (a)(b) Probability of SAT for different system size n. (c)(d) The mean of 1/SD ∂1C(⃗γ, β⃗) in different p-layer QAOA with n = 16 variables. The inverse is added for easier comparison according to Eq. (2.3). (e)(f) The ratio of average gradient variance ⟨SD(∂1C(⃗γ, β⃗))⟩ for n = 6 variables over n = 16 variables in different p-layer QAOA. Larger ratio indicates barren plateau. (g)(h) The dimension of dynamical Lie algebra dim(g) for generators in QAOA in an n = 6 qubits system. (i)(j) Average of 4-point OTOC for different p-layer QAOA with n = 10 variables. Green horizontal dashed line represents the value given by Haar unitary −2 n/(4n − 1). Vertical dashed lines indicate the critical SAT-UNSAT transition point. All results are evaluated over 100 instances. . . . 10 2.2 SAT-UNSAT phase transition and trainability of 1-3-SAT+ (left column) and 1-2-SAT+. (a)(b) Probability of SAT for different system size n. (c)(d) The mean of 1/SD ∂1C(⃗γ, β⃗) in different p-layer QAOA with n = 18 variables. The inverse is added for easier comparison according to Eq. (2.3). (e)(f) The ratio of average gradient variance ⟨SD(∂1C(⃗γ, β⃗))⟩ for n = 6 variables over n = 18 variables in different p-layer QAOA. Larger ratio indicates barren plateau. (g)(h) The dimension of dynamical Lie algebra dim(g) for generators in QAOA in an n = 6 qubits system. The inset log-log plots (g2) and (h2) show dim(g) versus n with a fully symmetric HC. Green and orange curves represent the lower bound estimate dim(g) = n 2 and upper bound of dimUB. (i)(j) Average of 4-point OTOC for different p-layer QAOA with n = 10 variables. Green horizontal dashed line represents the value given by Haar unitary −2 n/(4n − 1). Vertical dashed lines indicate the critical SAT-UNSAT transition point. All results are evaluated over 100 instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Barren plateau. SD of gradient SD (∂1C (γ, β)) versus the layer of QAOA p for 1-3-SAT+ (top) for 1-2-SAT+ (bottom) problems with different number of variables n. From left to right we plot at three different ratios m/n = 0.2, 0.8, 2. Due to the finite size n ≤ 18, as shown in Fig. 2.2 the transition happens at around m/n = 0.8. . . . . . . . . . . . . . . . 16 x 2.4 Accuracy of QAOA. (a)(b) Approximation ratio r of SAT clauses, (c)(d) Success probability in determining SAT/UNSAT for 3-SAT (left) and 2-SAT (right) versus clause-to-variable ratio m/n with n = 10 variables. Green dashed line represent the lower bound of approximation algorithm r ≥ 0.95 for Max-3-SAT [77] and r ≥ 21/22 for Max-2- SAT [55]. The horizontal light green dashed line in (c)(d) represent the success probability of the random guess which are 7/8 and 3/4 for 3-SAT and 2-SAT separately. Vertical black dashed lines in all plots represent critical point of SAT-UNSAT transition. . . . . . . 19 2.5 Accuracy of QAOA. (a)(b) Approximation ratio r of SAT clauses, (c)(d) Success probability in determining SAT/UNSAT for 1-3-SAT+ (left) (from p = 4 to p = 24) and 1-2-SAT+ (right) (from p = 4 to p = 16) versus clause-to-variable ratio m/n with n = 10 variables. Green dots represent the classical approximate results through a reduction to MWIS. The horizontal light green dashed line in (c)(d) represent the success probability of the random guess which are 3/8 and 1/2 for 1-3-SAT+ and 1-2-SAT+ separately. Vertical black dashed lines in all plots represent critical point of SAT-UNSAT transition. . . . . . . . . . . . . . . 20 2.6 QAA gap size and performance for k-SAT problems.(a)(b) The median of 1/∆E2 of QAA with n = 10 variables shown by blue circles. The green and purple circles represent the SAT and UNSAT instances separately. (c)(d) The probability that state through QAA evolution lies in the ground state P = PD i=1 | ⟨ψi |ϕQAA⟩ |2 . . . . . . . . . . . . . . . . . . 21 2.7 QAA gap size and performance of 1-k-SAT+.(a)(b) The median of 1/∆E2 of QAA with n = 10 variables shown by blue circles. The green and purple circles represent the SAT and UNSAT instances separately. (c)(d) The probability that the state through QAA evolution lies in ground state P = PD i=1 | ⟨ψi |ϕQAA⟩ |2 . . . . . . . . . . . . . . . . . . . . . 22 3.1 Illustration of setup and main results of this work. We study the total error optimization dynamics of quantum neural networks with loss function L(θ) = (⟨Oˆ⟩ − O0) 2/2, and identify a dynamical phase transition. We derive a first-principle generalized LotkaVolterra model to characterize it, and also provide a quantum statistical theory and random unitary ensemble interpretation, presented separately in Sec. 3.4, 3.5, 3.6. . . . . . 29 3.2 Dynamics in QNN network in the example of XXZ model. The top and bottom panel shows the dynamics of total error ϵ(t) and QNTK K(t) with respect to the three cases O0 ⋛ Omin. Blue curves represent numerical ensemble average result. Red curves in from (a1) to (c1) represents theoretical predictions on the dynamics of total error in Eq. (3.18), (3.19), (3.20); red curves in (a2) to (c2) represents theoretical predictions on the QNTK dynamics in Eq. (3.37), (3.19), (3.20). Grey dashed lines show the dynamics for each random sample. The inset in (c1) shows the exponential decay of residual error ε. Here random Pauli ansatz (RPA) consists of L = 64 layers on n = 2 qubits, and the parameter in XXZ model is J = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 xi 3.3 Classical dynamics interpretation of total error and QNTK dynamics. (a) Trajectories of (2λϵ, K) in dynamics of QNN with different O0 ⋛ Omin, plotted in solid blue, red and green. Dashed curves show the trajectory from Eq. (3.11). The arrows denote the flow of time in QNN optimization. Logarithmic scale is taken to focus on the late-time comparison. (b) The dynamics of corresponding λ = µ/K. The inset shows the dynamics of ζ = ϵµ/K2 . The observable is XXZ model with J = 2, and QNN is an n = 2 qubit RPA with L = 64 layers. The legend in (a) is also shared with (b) and its inset. . . . . . . . . . . 36 3.4 Dynamical phase transition of QNN network in the example of XXZ model. (a) The phase diagram of QNN network optimization dynamics, characterized by the spectrum gap of Hessian matrix in t → ∞ (black). The critical gapless point corresponds to O0 = Omin, Omax (red triangles). The orange line represents the QNTK limt→∞ K. Inset (a1) shows the Hessian spectrum of the largest 10 eigenvalues for the three cases O0 ⋚ Omin marked by triangles in (a). In (b) we present the scaling of spectrum gap GM ∼ |O0 − Omin| ν1 with ν1 ≃ 1 from fitting (dashed lines). In (c), we plot the decay of autocorrelators Aϵ(τ ) with different O0 ⋚ Omin, and show the scaling of correlation length ξ ∼ |O0 − Omin| −ν2 with ν2 ≃ 1 (dashed lines) by fitting in inset (c1). Here the RPA consists of L = 64 layers on n = 2 qubits, and the parameter in XXZ model is J = 2. 37 3.5 Dynamics of total error ϵ(t) on IBM quantum devices, Kolkata. Solid and dashed curves represent results of real experiment and noiseless simulation. An n = 2 qubit 4-layer hardware efficient ansatz is utilized to optimize with respect to XXZ model observable with J = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 Ensemble average results under restricted Haar ensemble (top) and Haar ensemble (bottom). In top panel, we plot (a) K with L = 64 fixed, (b) ζ with O0 = 1, (c) λ with O0 = 1 at late time in state preparation. Blue dots in top panel represents numerical results from late-time optimization of n = 2 qubit RPA. Red solid lines represent exact ensemble average with restricted Haar ensemble in Eq. (B.230), (B.288), (B.254) in Sec. B.7. Magenta dashed lines represent asymptotic ensemble average with restricted Haar ensemble in Eq. (3.34), (3.35), (3.36). The observable in all cases is |Φ⟩⟨Φ| with |Φ⟩ is a fixed Haar random state. In the inset of (b), we fix L = 64. In the bottom panel, we plot (d) fluctuation SD[K0]/K0, (e) ζ0, (f) λ0 under random initialization. Blue dots represent numerical results from random initializations of n = 6 qubit RPA. Red solid lines represent exact ensemble average with restricted Haar ensemble in Eq. (B.216), (B.155), (B.95) in Sec. B.6. Magenta dashed lines represent asymptotic ensemble average with restricted Haar ensemble in Eq. (3.44), (3.38), (3.39). The observable in all cases is XXZ model with J = 2. O0 = Omin = −22 for (d)-(f). Orange solid line in (d) represents results from [25]. . 46 4.1 Schematic illustration of the MLE-VQC system. As any single qubit rotation can be combined into the VQC, we can fix each single-qubit measurement to be Pauli Z without loss of generality. An MLE strategy is utilized on the measurement results to make the decision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 xii 4.2 (a) The open boundary local random unitary circuit, “Brickwall”. Every connected boxes represent local Haar 2−qubit unitary gate. Here we show an example of a depth D0 = 6 circuit on n = 6 qubits input. (b) Phase diagram for TFIM. The black curve represents the analytical expression for energy gap as ∆E = 2 (|g| − 1), and the dashed line indicates the critical point. (c) Ensemble-averaged bipartite von Neumann entanglement entropy (see Eq. (4.4)) of states prepared by circuit in (a). Black curve is the Page curve of Haar random states defined in Eq. (4.5). (d) Bipartite entanglement entropy curve for n = 6 spins TFIM ground states with g = 1, 10. Inset shows the maximal bipartite entanglement entropy with different n. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 Error probability of binary state discrimination (between two quantum states) with trained MLE-VQC systems. (a) State discrimination between Haar random states or complex TI states S (2n ) in a system of n = 6 qubits. Horizontal dashed lines represent the ensemble-averaged Helstrom limit ⟨PH⟩ and light color area represents the amount of ensemble fluctuation. (b) State discrimination between ground states of TFIM with g = 1 and g = 10 in a system of n = 6, 8, 10 qubits, where the y-axis is in a logarithmic scale. Horizontal dashed lines show the Helstrom limit in each case. We use the open-boundary brickwall VQC ansatz in all cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4 MLE versus single-qubit measurement approach in a brickwall ansatz VQC for discriminating between Haar random states and TI states sampled from S(2n ). We consider a system of n = 6 qubits. We show the ensemble-averaged cost function ⟨Cdis⟩ in (a) and the parameter-averaged variance of gradient ⟨Var (gi)⟩ i for ansatz with depth D = 2 in (b). We show the half bipartite von Neumann entanglement entropy ⟨S(n/2)⟩ of VQC output states with MLE (circles) and single-qubit measurement (diamonds) in (c). The input states are Haar random states. Black dashed line indicates the Page entanglement (see Eq. (4.5)). In (a), the line going below the plot region decreases to zero at the machine precision; we choose the plot range to make the trends clear. . . . . . . . . . . . . . . . . . 62 4.5 (a) Discrimination error probability using the brickwall ansatz between a pair of random states sampled from H(D0) for n = 6 qubits. Black and grey dashed horizontal lines show the Helstrom limit ⟨PH⟩ and a critical value ⟨PE⟩ = 2 ⟨PH⟩. Note the relative large error bar in the error probability in states H(D0) generated by a shallow depth VQC is due to the lack of self-averaging. (b) Average maximal bipartite entanglement entropy ⟨S(n/2)⟩ of VQC output states before measurements for H(D0) discrimination. (c) The critical depth Dc to achieve ⟨PE⟩ = 2 ⟨PH⟩ versus input complexity D0. (d) Maximal bipartite entanglement entropy ⟨S(n/2)⟩ of the input states to be distinguished. In (c) and (d), we plot odd and even D0 separately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6 Performance and gradient of the open-boundary brickwall VQC. We consider two different state ensembles, Haar random states and TI states S(2n ). (a) Ensemble-averaged cost functions for discrimination and generation. Blue and red curves show the ensemble of Haar random states and TI states separately in a system of n = 6 qubits. (b) Parameteraveraged variance of gradient ⟨Var (gi)⟩ i in the D = 2 ansatz. In (a), the lines going below the plot region decrease to zero at the machine precision; we choose the plot range to make the trends clear. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 xiii 4.7 Topological architectures of VQCs. Two connected boxes represents a universal 2−qubit gate. For (a) brickwall, (e) prism and (f) polygon ansatzs, we show the case with depth D = 2, 3, 3 separately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.8 Cost functions for the discrimination (left) and generation (right) among Haar random states (top) or complex TI states S(2n ) (bottom) in a system of n = 6 qubits. We compare performance with different VQC architectures. For comparison, the Helstrom limit ⟨PH⟩ ∼ 10−2.4 for (a) and ⟨PH⟩ ∼ 10−2 for (c), as shown in Fig. 4.3. The relative differences in the cost functions between top and bottom panels are below 3%. The lines going below the plot region decreases to zero at the machine precision; we choose the plot range to keep the trends clear. Note that each of the non-extensive architectures (TTN, MERA, QCNN) have a single fixed depth and therefore is represented by a single dot. . . . 69 4.9 Ensemble-averaged operator size growth with the VQC depth for Z initially localized at the n/2-th qubit in an n = 6 system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.10 Cost function ⟨Cdis⟩ (a) and average variance of gradient ⟨Var (gi)⟩ i (b) of different brickwall ansatz in discriminating between random states sampled from S(2n ) with number of qubits n = 6. We take TI and periodic-boundary brickwall ansatz. In (b) all ansatzs are set to be D = 2. In (a), the lines going below the plot region decrease to zero at the machine precision; we choose the plot range to make the trends clear. . . . . . . . . 71 4.11 Architectures and performance for nisq VQCs. (a) Layout of an n = 6-qubit sVQC ansatz and real sVQC ansatz, plotted in (a1) and (a2) separately. The sVQC ansatz consists of CZ gates and generic single qubit rotations, and the real sVQC ansatz consists of CZ gates and RY rotations. The circuits surrounded by the red dashed box represent D⋆ = 2 layers and at the end of the circuit each qubit is applied with a rotation. (b1)-(b2) Cost function ⟨Cdis⟩ and average variance of gradient ⟨Var (gi)⟩ i for discriminating between Haar random states. (c1)-(c2) Cost function and average variance of gradient for discriminating between TFIM ground states with g = 1, 10. We benchmark in a system of n = 6 qubits. In (b2) and (c2) we take D⋆ = 6 for all ansatz. Note for the brickwall ansatz, it corresponds to D = 2. In (b1) and (c1), the lines going below the plot region decrease to zero at the machine precision; we choose the plot range to keep the trends clear. . . . . . . . . . . . . 72 4.12 Schematic of QuDDPM. The forward noisy process is implemented by quantum scrambling circuit (QSC) in (a), while in the backward denoising process is achieved via measurement enabled by ancilla and PQC in (d). Subplots (b1-b5) and (c1-c4) presents the Bloch sphere dynamics in generation of states clustering around |0⟩. . . . . . . . . . . . . . . . . . . . . 75 4.13 The training of QuDDPM at each step t = k. Pairwise distance between states in generated ensemble ψ˜ (k) i ∈ S˜ k and true diffusion ensemble ψ (k) j ∈ Sk is measured and utilized in the evaluation of the loss function L. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.14 The decay of MMD distance D between generated ensemble S˜ t using different models and target ensemble of states E0 clustered around |0, 0⟩ versus training steps. The converged value is D ≃ 0.002 for QuDDPM, showing a two-order-of-magnitude advantage over QuDT and QuGAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 xiv 4.15 The generalization error of QuDDPM in generating cluster states versus (a) diffusion steps T and (b) training dataset size N. Dots are numerical results and orange dashed line is linear fitting results with both exponents equal to 1 within the numerical precision. . . . 81 4.16 Generation of states with probabilistic correlated noise on a specific state in (a) and (b) states with ferromagnetic phase. In (a), average fidelity F10 between states at step t and |10⟩ for diffusion (red), training (blue) and testing (green) are plotted. In (b), we show the distribution of magnetization for generated data from training (blue) and testing (green) data set, and compared to true data (red) and full noise (orange). Four qubits are considered in (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.17 Bloch visualization of the forward (a1-a3) and backward (b1-b3,c1-c3) process. (d)(e) deviation of generated states from unit circle in X − Z plane. The deviation ⟨Y ⟩ 2 for forward diffusion (red), backward training (blue), backward test (green) are plotted. The shaded area show the sample standard deviation. . . . . . . . . . . . . . . . . . . . . . . . 83 5.1 Summary of VQC trainability in DV and CV systems. The Hilbert space is finite-dimension in DV systems while for CV ones it is infinite-dimension. The universal DV VQC is built from local 2-design unitary gates (lime green), and universal CV VQCs consist of single qubit rotations (cyan) and echoed conditional displacement (ECD) gates (pink) [241]. In DV VQCs, the variance of the gradient decays exponentially with the number of qubits N in shallow [18] and deep [16] DV VQCs optimizing a global cost function. In this paper, we show the variance decays exponentially in number of modes M but polynomially with circuit energy E in shallow and deep region of CV VQCs. . . . . . . . . . . . . . . . . . . 91 5.2 Variance of gradient Var[∂θk C] at k = L/2 in preparation of (a) displaced squeezed vacuum (DSV) state with γ = 2, ζ = sinh−1 (2) and (b) Fock state with Et = 8. Orange and red dots with errorbars show numerical results of variance in shallow and deep circuits. Orange solid curve represents the analytical variance in Theorem 8; the dashed and solid magenta curves show the lower and upper bounds in Ineqs. (5.6), (5.7). Black dotted and dashed lines indicate the scaling of 1/E and 1/E2 . Insets in (a)(b) we plot the logarithm in base ten of the upper bound in Ineq. (5.7) versus circuit depth and energy. Green triangle (main) and line (inset) show the corresponding boundary of variance at Ec(L) and ℓc(E). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.3 Variance of gradient Var[∂θk C] at k = L/2 in preparation of random CV states |ψ⟩m = P n bn |n⟩m with L = 4 (a) and L = 50 (b) circuits. Curves with same color show the variance of different sample target states. Black dotted and dashed lines in (a) and (b) represent the scaling of 1/E and 1/E2 . In our calculation, we have chosen cutoff nc ∼ 2Et and ϵ = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.4 Correlators C Gauss 1 and C Gauss 2 (z) with z = {1/2, · · · , 1/2} in Eqs. (5.17), (5.18) versus (a) ensemble energy E and (b) modes M. The target state |ψ⟩m is generated by global random passive Gaussian unitary following a single-mode squeezer with strength r = 8. . 101 xv 5.5 Variance of gradient Var[∂θk C] at k = ML/2 in preparation of a TMSV state |ζ⟩TMSV with ζ = sinh−1 (2). Orange and red dots with errorbars show numerical results of variance in shallow and deep circuits. Orange solid curve represents the (3/4)2LC TMSV 1 /6 for reference; the dashed and solid magenta curves show the lower and upper bounds in Ineqs. (5.6), (5.7). Black dotted and dashed lines indicate the scaling of 1/E2 and 1/E4 . Inset shows the logarithm in base ten upper bound versus circuit depth and energy. Green triangle (main) and line (inset) show the corresponding boundary of variance at Ec(L) and ℓc(E). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.6 Variance of gradient Var[∂θk C] at k = ML/2 in preparation of random two-mode CV states |ψ⟩m = P n1,n2 bn1,n2 |n1⟩m1 |n2⟩m2 with L = 4 (a) and L = 50 (b) circuits. Curves with same color show the variance of different sample target states. Black dotted and dashed lines in (a) and (b) represent the scaling of 1/E2 and 1/E4 . In the calculation, we have chosen ϵ = 0.1 and nc ∼ 2Et . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.7 Training for Fock state |ψ⟩m = |20⟩m with a L = 50 CV VQC initilized with different ensemble energy E. We show (a) average infidelity of output state, (b) average output state energy and (c) average circuit energy PL j=1 |βj | 2 versus training steps. . . . . . . . . 106 5.8 (a) Interconnect system between two microwave quantum computers. Two cavities enable the generation of microwave-optical entanglement. The optical modes are detected for entanglement-swap, generating microwave-microwave entanglement after displacement Dˆ. Finally, the transmon qubits interact with the microwave modes to generate Bell pairs. (b) Schematic of hybrid local variational quantum circuits (hybrid LVQCs) to distill entanglement from noisy entangled microwave modes m1, m2 to transmon qubits q1, q2. The hybrid LVQC is shown in detail in the dashed box with D echoed conditional displacement (ECD) blocks (ECD gates and single qubit rotations) followed by displacement and rotation in the end. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.9 Rate (in ebit per round) comparison (a) ζm = ζo = 1. (b) ζm = 0.95, ζo = 0.9 and nin = 0.2. The rate of CV entanglement swap is within green shaded and the rate of time-bin entanglement swap is within purple region. . . . . . . . . . . . . . . . . . . . . . 117 5.10 Infidelity of transmon qubits versus success probability on (a) one-copy and (b) two-copy entangling microwave modes with C = 0.1, nin = 0.2, ζm = 0.992, ζo = 0.99. The PPT bound (black) in (a) goes below 10−4 and saturates; red and pink dashed curves in (a)(b) correspond to PPT bound of one-copy and two-copy two-qubit state by direct swap. There are D = 10 ECD blocks in hybrid LVQCs and L = 6 layers in DV LVQCs. . . . . . . . . . 120 5.11 Entanglement rate per copy on (a) one-copy and (b) two-copy noisy entangled microwave modes versus infidelity 1 − F, at identical parameters to Fig. 5.10. Dot-dashed and dashed curves correspondingly represent RCI lower bounds and EoF upper bounds. Shaded areas and line segments show the range of rate. The shallow blue curves and areas in (b) are same as (a) for comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 xvi 5.12 Infidelity of transmon qubits (orange) and distillation success probability (blue) versus depolarization noise on one-copy entangling microwave modes at identical parameters as in Fig. 5.10. Orange and blue horizontal lines indicate the infidelity and success probability without noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 B.1 Decay of autocorrelators and corresponding correlation length with O0 away from critical points as O0 < Omin (top) and O0 > Omin (bottom). The first two columns plot the autocorrelators. In (c), (f), the overlapping red and green dots represent correlation length fitted from Eq. (B.18) and dashed lines with same color show the fitting results. Black dashed lines represent its scaling as 1/|O0 − Omin|. The observable is the Hamiltonian of XXZ model with J = 2, and circuit ansatz is n = 2 qubit RPA with L = 64 layers. . . . . . 160 B.2 Decay of autocorrelators for ϵ(t), K(t), µ(t) at critical O0 = −6 (solid curves). Dashed lines with corresponding color show the fitting results with ∆[ϵ] = 0.494, ∆[K] = 0.499 and ∆[µ] = 0.506. Black dashed line represent the scaling 1/τ . . . . . . . . . . . . . . . . 161 B.3 (a) Frame potential F (k) for the restricted Haar ensemble with dimension 2 n = 4. Red dashed line is the exact theory prediction in Eq. (B.59) and magenta dashed line is its lower bound (k + 1)!. (b) The evolution of 2-order frame potential F (2) for ensemble of RPA circuit unitary with different target O0. The observable is OXXZ with J = 2, with Omin = −6. Black dashed line is exact value of F (k) in Eq. B.51. . . . . . . . . . . . . . . . 167 C.1 Ensemble-averaged Helstrom limit ⟨PH⟩ for n = 6 qubits states sampled from H(D0) and S(D0). The blue dashed line represents ⟨PH⟩ ∼ 1/2 n+2 and the red dashed line is the average over red dots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 C.2 A decomposition of a general two-qubit gate. The R gate represents an arbitrary single qubit rotation with 3 independent parameters as R(θ1, θ2, θ3) = RZ(θ1)RY (θ2)RZ(θ3). . 215 D.1 Circuit implementation of swap test. We show an example of swap test between two 3-qubit state |ψ⟩, |ψ˜⟩. A Z-basis measurement is performed at the end. . . . . . . . . . . . 223 D.2 Quantum circuit architectures. In (a) we show the forward diffusion circuit for one diffusion step on a system of n = 4 qubits. In (b) we show one-layer architecture of a L-layer backward denoising PQC on a system of n = 4 data and nA = 2 ancilla qubits. RX, RY and RZ are Pauli X, Y and Z rotations. RZZ is the two-qubit ZZ rotation. . . . . . . 226 D.3 MMD and Wasserstein loss between ensemble S0 and ensemble through diffusion process at step t, St in generation of cluster state (left) and circular state ensemble (right). For cluster state ensemble, MMD (a) and Wasserstein loss (c) behaves similarly. For circular state ensemble, MMD loss (c) vanishes while Wasserstein loss characterize the diffusion of distribution. The data set size for cluster states is |S| = 100, and |S| = 500 for circular states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 xvii D.4 Schematic of QuDT (a) and QuGAN (b) in state ensemble generation. Since the forward of QuDT only has a single step from data to noise, it is not necessary to implement the diffusion for QuDT. In the backward process, there is also a single PQC U˜DT with depth LT. In the training, the samples generated via applying U˜DT on random states are directly compared to the target ensemble S0, shown in (a). The QuGAN is similar to QuDT, except that a discriminator circuit is added to evaluate the cost function. . . . . . . . . . . . . . . 229 E.1 Numerical result of correlator C Fock 3 (z, z˜) for Fock state in Eq. (E.85). Here we choose z = ˜z = 1/3. The dashed-dot line is 1/E3 for reference. . . . . . . . . . . . . . . . . . . . 260 E.2 Scheme of M-mode L-layer CV VQC. Cyan boxes with θℓ , ϕℓ ranging from 1 ≤ ℓ ≤ ML represents the qubit rotation UR(θℓ , ϕℓ); Pink boxes with β (j) ℓ denotes the ECD gate U (j) ECD(β (j) ℓ ) applying on the qubit and jth mode. . . . . . . . . . . . . . . . . . . . . . . . . 262 E.3 The probability of PM j=1 νj ≤ 2M in Eq. (E.113) versus different circuit depth L and modes number M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 E.4 (a) Schematic of the direct swap approach. (b) Entanglement of formation Ef between qubits versus evolution time t. The vertical dashed line indicates the time with maximum entanglement at t = π/2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 E.5 Schematic of (a) DEJMPS protocol and (b) L-layer DV LVQC. . . . . . . . . . . . . . . . . . 279 xviii Abstract Quantum neural networks (QNNs) is a promising paradigm for quantum computing and quantum information processing on quantum devices from near-term towards fault-tolerant era. It is fundamental and important to understand the physics within its optimization process and performance, and to design new architectures to fully release its potential. In this thesis, we investigate the phase transitions in QNNs associated with its training difficulty and dynamics, and its applications in both supervised and unsupervised learning. We identify a computational phase transition with quantum approximate optimization algorithm (QAOA) in solving combinatorial problems such as 3-SAT. It is connected to the controllability and complexity of QAOA circuits. We also show that the late-time dynamics of universal deep QNNs can be described generalized Lotka-Volterra equations, where target values induce convergence dynamics with different scaling. In applications to supervised learning, we find an exponential decay in quantum state discrimination error with extensive QNN circuit depth, in contrast to non-extensive ones with limited depth such as quantum convolutional neural networks to be suboptimal. For unsupervised generative learning, we propose the quantum denoising diffusion probabilistic model (QuDDPM) to enable efficiently trainable generative learning of quantum data via interpolation between data and noise. We provide bounds on the learning error and demonstrate QuDDPM’s capability in learning correlated quantum noise model, quantum many-body phases and topological structure of quantum data. In the end, we extend our scope to the trainability analysis of universal xix QNNs on continuous-variable systems, and apply it to transduction of entanglement from continuous to discrete variable systems. xx Chapter 1 Introduction Quantum theory is one of the most important and beautiful physical theories established in the last century, and significantly influenced the way people understand the world, especially in the microscopic realm. The first quantum revolution completed in last century brought the solid state theories to applied physics and engineering, and led to the development of transistors and lasers. The second quantum revolution is targeted at controlling quantum system at the individual level, originating from Richard Feynman’s proposal of simulating physical systems with a quantum computer [1], BB84 quantum key distribution [2], and Peter Shor’s algorithm to break existing cryptography protocols such as RSA [3]. Through decades of development, quantum theories have been applied to three main areas, quantum communication, quantum computing and quantum sensing (metrology). Quantum communication utilizes quantum information theory to transmit either classical or quantum information without suffering from leakage due to eavesdroppers [2, 4]. The main goal of quantum computing is to build a fault-tolerant quantum computer, which can be applied to solve computational problems with potential quantum advantage and to simulate complex quantum systems in many-body physics [5, 6]. Quantum sensing enables accurate estimation of unknown parameters within quantum systems via quantum resources including entanglement [7, 8]. All of their development contributes to the distribution, transmission, processing and distilling of information in the quantum systems. 1 Among these areas of studies, a key question to ask is what is the optimal quantum state or quantum operation should we utilize to achieve the best performance and thus possible quantum advantage? In general, there does not exist a unified answer, however, inspired from classical machine learning (CML), a versatile and powerful ansatz known as quantum neural network (QNN) is proposed and widely utilized in all the studies above. Below, we introduce the QNN and motivations on the main results presented in the thesis. 1.1 Quantum neural network To solve problems existing potential quantum advantage, it is challenging to design a quantum algorithm to either implement the corresponding quantum operator or achieve the target quantum state as the output state of a quantum device, except for known ones such as quantum Fourier transform [9], Shor’s algorithm [3], Grover’s algorithm [10] and Harrow–Hassidim–Lloyd algorithm [11]. Meanwhile, QNN is implemented based on parametric quantum circuits where one can implement exact control on every operation over arbitrary part of the whole quantum system, and achieve various quantum operators under same circuit architecture, therefore it has become a promising paradigm for quantum computing [12, 13]. QNN, also called variational quantum algorithms (VQA), is built from variational quantum gates [14, 15, 13]. The scheme of QNN starts from either one or multiple input quantum states, and a parametric quantum circuit consisting of multiple quantum gates with tunable parameters is applied to the input side in order following a design of architecture. At the end of QNN circuit, one performs quantum measurement on the outputs to estimate the expectation of observable operators, which under potential classical postprocessing, results in an estimation of target cost function. The gradient of the cost function with respect to each variational parameter can be calculated in the same way in principle. A classical computer then 2 comes in to train the parameters of quantum gates in QNN to optimize the cost function via gradientbased methods. Through enough training iterations, one can approximately achieve the targeted quantum operation and obtain desirable computation results from the output state of the QNN. 1.2 Optimization of QNN However, there are also many challenges within the implementation of QNN. One of the major challenge is the trainability of QNN. For a typically deep QNN with variational parameters randomly initialized, its trainability suffers from the “barren plateau” phenomena [16], which indicates that the gradient of cost function with respect to any variational parameter exponentially concentrated around its mean. In other words, it suggests that the landscape of cost function is typically flat, and it looks challenging for gradientbased method to find a desirable path for optimization [17]. The following studies unveil the relation between the emergence of barren plateau and locality of cost function [18], depth of circuits [18], design of circuit architectures [19, 20], entanglement [21], noise [22], etc. The barren plateau phenomenon is also understood from the perspective of controllability of QNN onto the Hilbert space of the system, connecting to the quantum control theory [23]. It is then realized that the barren plateau phenomena only describes the typical landscape of QNN cost functions in the initial stage of training. With optimization proceeding, the unitary operator corresponding to QNN’s circuit deviates from unitaries drawn from the random unitary ensemble formed initially. Therefore, a theory to describe the optimization dynamics until convergence is needed to provide a comprehensive understanding of optimization for QNN. We would like to point out two existing methods in understanding the optimization dynamics. The first theory proposed the quantum neural tangent kernel (QNTK) and replies on the perturbation on initial stage of QNN [24, 25]. The other theory describes the convergence of loss function for variational quantum eigensolver algorithm via the approach of Riemannian gradient flow on Hilbert space of quantum system [26]. 3 1.3 Applications of QNN Given on the number and information carried within input states, we can regard the QNN as either special VQAs and quantum machine learning (QML) models. For a special VQA, the input state is usually a trivial and easily prepared state carrying no information of interest, and the training is aimed to find the optimal QNN such that the output state encodes the information of interest. For instance, in the variational quantum eigensolver, one tries to find the ground state of a given quantum system, which plays a key role in condensed matter physics, quantum chemistry, etc [27]. Another specific example is the quantum approximation optimization algorithm (QAOA), which aims to find the approximate solution to combinatorial problems including MaxCut, SAT which are known to be NP-hard [28]. The QML models can be thought as the application of QNN to generalized areas of studies [29]. Following the criteria in CML the learning tasks of QML can also be classified as supervised learning and unsupervised learning [30]. Generally speaking, supervised learning asks to search for the optimal QNN such that the prediction on input data is close to its label. A thoroughly studied task in supervised learning is classification, which means to find the optimal strategy to minimize the error in classifying data from different classes utilizing knowledge learned from finite number of samples. There are multiple QNN candidates designed for classification, for instance, quantum convolutional neural network (QCNN) [31] and quantum support vector machine [32]. Quantum state classification forms an important part of multiple quantum information processing tasks. Learning of different quantum phases in a many-body system is directly based on the capability of QNN in classification over the corresponding ground states [31]. In quantum illumination, a receiver makes a discrimination on received states to determine the existence of unknown object within low signal-to-noise ratio [33, 34]. For unsupervised learning, a noticeable contrast to supervised learning is the lack of supervision labels and the goal is to learn the distribution lying behind the samples, and regenerate new data following the same distribution. There are a number of models designed for generative learning in CML on computer 4 vision and natural language processing, generative adversarial networks (GAN) [35], diffusion denoising probabilistic models (DDPM) [36, 37] and the large language models such as GPT [38]. Note that the idea of adversarial game is also introduced to QML models, leading to the quantum generative adversarial network (QuGAN) [39], which is applied to the learning of a single pure or mixed quantum state. Due to the hardness in training from adversarial games, there are many open questions on the model design and ultimate performance of quantum generative learning. 1.4 Organization of thesis Through the journey of my doctoral study, I have made contributions to understanding of trainability and optimization dynamics of QNNs, mainly focused on the arising phase transitions [40, 41], and the supervised and unsupervised applications of QNNs [42, 43]. The thesis will present the main results organized as follows. In Chapter 2, I will summarize my work on the computational phase transition of QAOA in solving combinatorial problems. In Chapter 3, I will summarize my work on the dynamical phase transition of QNN with large depth. In Chapter 4, I will summarize my works on the application of QNN in both supervised learning and unsupervised learning focusing on the analysis of discrimination power and proposing a new model of quantum generative learning with advantage over its candidates. In Chapter 5, I extend my research scope to QNNs to continuous-variable system and summarize my work on the trainability of universal hybrid qubit-qumode QNN [44] and the application in entanglement transduction across platforms [45]. More details of Chapters 2-5 can be found in Appendices A, B, C, D, E. 5 Chapter 2 Quantum computational phase transition in combinatorial problems 2.1 Introduction Quantum Approximate Optimization algorithm (QAOA) [28], like all quantum algorithms, aims to utilize quantum hardwares to efficiently solve problems that are hard on classical computers. It is one of the candidates to achieve a quantum supremacy in the noisy intermediate-scale quantum (NISQ) era [46]. So far, quantum supremacy has only been realized for random circuit sampling tasks [47, 48]. For complex but practical problems in the class of nondeterministic-polynomial (NP) time, no quantum advantage has been found, despite trials of QAOA on the Google Sycamore quantum processor [49]. To search for quantum supremacy, it is crucial to first understand the difference between what is hard or easy for QAOA and for classical algorithms, which can be best explored via the combinatorial problem of Boolean satisfiability problem (SAT). In a k-SAT instance, one asks whether multiple clauses, each involving k Boolean variables, can be satisfied simultaneously. Depending on the value of k, the worst-case hardness is drastically different— while 3-SAT is NP-complete, 2-SAT can be efficiently solved in polynomial time (class P). In addition, the classical empirical hardness of random 3-SAT instances is known to have a computational phase transition [50, 51, 52, 53, 54] versus the problem density characterized by the clause-to-variable ratio. When the density is small (large), almost all instances are satisfied (unsatisfied) and easy to solve; while for density 6 approaching the critical point of the SAT-UNSAT phase transition, the 3-SAT problem instances become the hardest to solve. For quantum algorithms such as QAOA, the above phenomenon is largely unexplored. To begin with, as QAOA always implements SAT problems in their NP-hard optimization versions (Max-SAT) [55], it is unclear whether the decision version’s NP (k = 3) versus P (k = 2) contrast has any influence on QAOA’s performance. It is also unclear how classical empirical hardness of 3-SAT connects to QAOA’s performance on its optimization versions. Indeed, Ref. [56] does not find a big difference between QAOA’s performance on Max-2-SAT and Max-3-SAT, and only finds QAOA’s performance to worsen as the density increases—a phenomenon they call reachability deficits. In this chapter, we reveal a computational phase transition in the trainability of QAOA in solving the positive 1-in-k SAT problem. In terms of trainability characterized by gradient, the typical amplitude of gradient in training SAT problems achieves the minimum at a critical problem density ratio. In general, this quantum critical problem density deviates from the SAT-UNSAT phase transition [50, 51, 52, 53, 54], where the instances are hardest classically. We link this gradient transition to the controllability of the quantum systems evolving under the QAOA circuit [57, 58, 59, 23] and the complexity of QAOA circuit [60, 61, 62, 63]. In terms of accuracy of the optimization versions of SAT, we find the QAOA’s approximation ratio to be robust and decay slowly with the problem density. Moreover, despite the performance decay due to reachability deficits, it is precisely in the large problem density region where a quantum advantage can be identified, when comparing with classical approximate algorithms. In addition, the accuracy in solving Max-2-SAT is higher than that in solving Max-3-SAT, consistent with the P versus NP contrast in the decision version of the problems. Interestingly, for the decision version of the SAT problems, QAOA shows the worst performance at the SAT-UNSAT transition, revealing a remnant of classical empirical hardness. Such remnant of classical empirical hardness is also confirmed in quantum adiabatic algorithms (QAA) [64, 65, 66]. 7 2.2 Results 2.2.1 Preliminary 2.2.1.1 Quantum Approximate Optimization algorithm To solve an optimization problem, QAOA encodes the cost function into the energy of a problem Hamiltonian HC, defined over spin-1/2 particles (qubits), and then seeks for an approximation of the ground state that encodes the solution to the optimization problem. An n-qubit QAOA circuit implements dynamics governed by the problem Hamiltonian HC and a mixing Hamiltonian HB = Pn i=1 σˆ x i alternatively in each layer, where σˆ x i is the Pauli-X operator representing the transverse fields. The output state of a p-layer QAOA is therefore |ψ(⃗γ, β⃗)⟩ = Qp ℓ=1 e −iβℓHB e −iγℓHC |ψ (0, 0)⟩, where ⃗γ = (γ1, . . . , γp) and β⃗ = (β1, . . . , βp) are variational parameters. The initial state is set to be a superposition of all possible spin configurations, |ψ (0, 0)⟩ = |+⟩ ⊗n with |+⟩ = (|0⟩ + |1⟩) / √ 2. To solve the problem, variational training is performed over the parameters ⃗γ, β⃗ to minimize the cost function C(⃗γ, β⃗) = ⟨ψ(⃗γ, β⃗)| HC |ψ(⃗γ, β⃗)⟩. The variational training terminates when the cost function stops to decrease significantly, and ideally leads to the optimal parameters ⃗γ∗ , β⃗∗ = argmin⃗γ,β⃗ C(⃗γ, β⃗). 2.2.1.2 SAT problems We will focus on two types of SAT problems, k-SAT problem (k ≥ 2) and the positive 1-in-k SAT problem (1-k-SAT+, k ≥ 2). The positive 1-in-k SAT problem (1-k-SAT+, k ≥ 2) is also known as the exactcover k problem. Given n Boolean variables V = {vi} n i=1, a random instance of the SAT problems can be constructed by choosing m clauses C = {ca} m a=1, each containing k different variables {vaj} k j=1 uniformly randomly chosen from V . The k elements in each clause can be either positive or negative literal in k-SAT problem with equal probability, while only positive in 1-k-SAT+ problems. The conjunctive normal form (CNF) of the SAT instance can be expressed as F (V ) = Vm a=1 ca {vaj} k j=1 , where ‘V ’ denotes AND, 8 and forces the CNF to be true only when all clauses are satisfied. In a k-SAT problem, each clause is true when at least one element in the clause is true; while in positive 1-in-k SAT, a clause ca {vaj} k j=1 is satisfied if and only if a single variable among {vaj} k j=1 is taken to be true. The (decision version of) SAT problem asks whether F (V ) can be satisfied with an assignment of variables V , while the optimization version—Max-SAT—aims to find an assignment of variables V to minimize the number of clause violations. With the increase of clause-to-variable ratio m/n, it becomes harder to satisfy a random SAT instance, and there exists a phase transition of SAT probability across a critical ratio, m/n = 1 for 2-SAT [67] and m/n = 4.26 for 3-SAT [53], m/n ∼ 0.55 for 1-2-SAT+ and m/n ∼ 0.62 for 1-3-SAT+ [54], as shown in Fig. 2.1(a)(b) and Fig. 2.2(a)(b). We study the case of k = 2 and k = 3 for a comparison: while 2-SAT and 1-2-SAT+ are in class P and efficiently solvable, 3-SAT and 1-3-SAT+ are NP-complete and it takes an exponential amount of time to solve it, e.g., by the well-known algorithm X [68]. Despite the contrast in the decision versions, Max-k-SAT and Max-1-k-SAT+ are always NP hard, even for k = 2 [69]. In addition to the worst case hardness, empirical studies with classical algorithms on different variants of 3-SAT [50, 51, 52, 53, 54] show that when m/n is small (large), almost all instances are satisfied (unsatisfied) and easy to solve; while for m/n approaching the critical point of the SAT-UNSAT transition, the SAT problem instances become the hardest to solve. To solve SAT problems with QAOA, we transform each Boolean variables vi to the spin states of a qubit, with spin-down state |1⟩ (Pauli-Z operator σˆ z = −1) for true and spin-up state |0⟩ (σˆ z = 1) for false, and obtain the spin Hamiltonians for 2-SAT and 3-SAT as HC,k = 1 2 k Xm a=1 Y k ℓ=1 1 + Aaℓ,aσˆ z aℓ , (2.1) 1 /SD ( 1 ( Ԧ , Ԧ ) ) Clause−to−variable ratio / dim ( ) 3-SAT 2-SAT (g) (h) (c) (d) (a) (b) 4 -point OTOC (i) (j) (e) (f) (b) Figure 2.1: SAT-UNSAT phase transition and trainability of 3-SAT (left column) and 2-SAT. (a)(b) Probability of SAT for different system size n. (c)(d) The mean of 1/SD ∂1C(⃗γ, β⃗) in different p-layer QAOA with n = 16 variables. The inverse is added for easier comparison according to Eq. (2.3). (e)(f) The ratio of average gradient variance ⟨SD(∂1C(⃗γ, β⃗))⟩ for n = 6 variables over n = 16 variables in different p-layer QAOA. Larger ratio indicates barren plateau. (g)(h) The dimension of dynamical Lie algebra dim(g) for generators in QAOA in an n = 6 qubits system. (i)(j) Average of 4-point OTOC for different p-layer QAOA with n = 10 variables. Green horizontal dashed line represents the value given by Haar unitary −2 n/(4n − 1). Vertical dashed lines indicate the critical SAT-UNSAT transition point. All results are evaluated over 100 instances. 10 1 /SD ( 1 ( Ԧ , Ԧ ) ) Clause−to−variable ratio / dim ( ) (g2) 1-3-SAT + 1-2-SAT + (h2) (g) (h) (c) (d) (a) (b) 4 -point OTOC (i) (j) (e) (f) Figure 2.2: SAT-UNSAT phase transition and trainability of 1-3-SAT+ (left column) and 1-2-SAT+. (a)(b) Probability of SAT for different system size n. (c)(d) The mean of 1/SD ∂1C(⃗γ, β⃗) in different player QAOA with n = 18 variables. The inverse is added for easier comparison according to Eq. (2.3). (e)(f) The ratio of average gradient variance ⟨SD(∂1C(⃗γ, β⃗))⟩ for n = 6 variables over n = 18 variables in different p-layer QAOA. Larger ratio indicates barren plateau. (g)(h) The dimension of dynamical Lie algebra dim(g) for generators in QAOA in an n = 6 qubits system. The inset log-log plots (g2) and (h2) show dim(g) versus n with a fully symmetric HC. Green and orange curves represent the lower bound estimate dim(g) = n 2 and upper bound of dimUB. (i)(j) Average of 4-point OTOC for different p-layer QAOA with n = 10 variables. Green horizontal dashed line represents the value given by Haar unitary −2 n/(4n − 1). Vertical dashed lines indicate the critical SAT-UNSAT transition point. All results are evaluated over 100 instances. 11 where Aaℓ,a stands for the literal sign for ℓth element in ath clause with +1(−1) for positive(negative) literal separately and 0 for absence of it in the clause. Similarly, for 1-3-SAT+ and 1-2-SAT+ as [64, 65, 66, 70] (see Methods) HC,3+ = 1 4 Xm a=1 (ˆσ z a1 + ˆσ z a2 + ˆσ z a3 − 1)2 , (2.2a) HC,2+ = 1 4 Xm a=1 (ˆσ z a1 + ˆσ z a2 ) 2 . (2.2b) The gate-based implementation of the QAOA for our problem Hamiltonians can be found in Sec. A.2. With the above encoding, an instance is satisfied only if the ground state energy is zero. As QAOA minimizes the cost function, it can be considered as an approximate algorithm for solving Max-1-k-SAT+. By default, via a threshold decision, the solution of the optimization also implies the solution to the decision version. Our overall goal in this paper is to understand what is hard and what is easy on QAOA, both in terms of trainability and accuracy. 2.2.2 Gradient of QAOA training As a variational circuit, QAOA’s cost-function gradients over variables ⃗γ, β⃗ indicate the shape of costfunction landscape—larger amplitudes of gradients indicate sharper changes and therefore the problem is easier to train; while small amplitudes of gradients leads to barren plateaus [16, 18, 23, 71, 22] that make the training difficult. As gradients average to zero on random states [16, 18], we evaluate the standard deviation (SD) of gradients to characterize their typical amplitudes. To represent the typical case of training, we evaluate the gradient on random choices of the circuit parameters via a numerical finite-difference. Without loss of generality, we consider the gradient over the first variable γ1 and denote it as ∂1C [16, 18]. To enable an easier visualization in Fig. 2.1(c)(d) and 12 Fig. 2.2(c)(d), we plot the inverse of the gradient SD, 1/SD ∂1C(⃗γ, β⃗) , so that large values indicate hardness in convergence. We consider different number of layers p in QAOA to obtain a comprehensive picture of it. For all the problems under study, the inverse gradient SD has a clear peak at a critical clause-to-variable m/n, as shown in Fig. 2.1(c)(d) and Fig. 2.2(c)(d). However, this peak is in general different from the classical SAT-UNSAT transition indicated by the dashed line. For the special case of 1-3-SAT+, Fig. 2.2(c) shows that the peak of the inverse gradient coincides with the SAT-UNSAT transition. A large inverse gradient SD indicates a small gradient in the typical case, and therefore a more barren plateau that makes the training hard at the phase transition. When p is small, the peak disappears; however, at the same time, QAOA fails to provide the accurate solution, making trainability irrelevant. We notice that the cases of k = 3 have the inverse gradient peaked at a much smaller clause-to-variable density, as a result of the more complex clauses. Overall, the results reveal a transition of the trainability measured by gradient that is different from the classical SAT-UNSAT transition, showing that the empirical hardness for quantum algorithms can be different from classical algorithms. 2.2.2.1 Connection to controllability To understand the different behaviors of the gradient, we utilize the connection between gradient and controllability measured by the dimension of dynamical Lie algebra (DLA) of QAOA generators, as recently identified in Ref. [23]. As explained in Ref. [57], DLA can be used to test the controllability of the quantum system governed by unitary dynamics. Let us consider an n-qubit system described by a Hilbert space H. Considering an optimal quantum control model described by a unitary U = QK k=1 e −iukHk , where G ≡ {H1, · · · , HK} is a set of generators and {u1, · · · , uK} ⊆ R is a set of coefficients which are usually represented by the control fields. Then, the DLA g ≡ ⟨iH1, · · · , iHK⟩Lie ⊆ su(2n ), is constructed by the repeated and nested 13 commutators of the elements in G. The corresponding dynamical Lie group is therefore obtained by taking the exponential of the DLA e g ≡ {e V1 e V2 · · · e VL , V1, · · · , VL ∈ g}. Generally, for a finite-time evolution governed by Schrödinger’s equation, the system is fully controllable when the set of the unitaries obtained during this evolution can cover all unitaries as its elements. This is precisely formulated by the so-called Lie algebra rank condition, which states that the system is fully controllable if and only if dim(g) = 4n − 1 ∗ . For quantum systems where the whole Hilbert space is not fully controllable, when the DLA g can be described as the direct sum, i.e. g = L j gj , so that the Hilbert space can be written in a form of the direct sum of the subspace Hj as H = L j Hj , dim(gj ) determines the subspace controllability of Hj [23, 58, 59]. With a problem Hamiltonian HC and the mixing Hamiltonian HB as the generators, we can generate the DLA g and provide an estimate of the standard deviation of gradient from the dimension of the DLA dim (g) as 1/SD ∂1C(⃗γ, β⃗) ∈ Ω [poly (dim (g))]1/2 (2.3) where “poly” denotes a polynomial function. Here, for two functions f(x) and g(x), f(x) ∈ Ω(g(x)) means f(x) is bounded below by g(x) asymptotically. Therefore, we evaluate the DLA dimension numerically to compare with our gradient results. As the numerical evaluation is costly, we are limited to a smaller size of n = 6. Despite the small size, as we see in Fig. 2.1(g)(h) and Fig. 2.2(g)(h), the DLA dimension dim (g) essentially has the same behavior versus the clause-to-variable ratio m/n, when compared with the inverse gradient; This manifests a clear connection between the gradient transition and the DLA dimension transition. QAOA provides a clear physical insight to the concept of trainability of variational quantum algorithms on NISQ device. From Fig. 2.1(c)-(h) and Fig. 2.2(c)-(h), the trainability and controllability have a trade off. ∗Here, note that we suppose that G does not include the identity without the loss of generality because the identity leads to the negligible global phase in QAOA scenario. 14 This can be explained as the following. When the system is more (less) controllable, there are more (less) control protocols available to transform the initial state to the desired final state. Geometrically, these protocols can be described as the accessible paths characterized by the parameters (⃗γ, β⃗) from the initial state to the desired state. In this picture, our task is to find the optimal path from all the possible paths. Therefore, from the trainability perspective, it becomes harder (easier) to train (⃗γ, β⃗) when the system is more (less) controllable. Despite being harder to train, the plurality of the paths also provides more hope to good performance. In addition, one can also connect DLA to controllability via the Quantum Fisher information matrix (QFIM) [71]. As the rank of QFIM characterizes the number of independent ways to vary the control parameters to change the generated quantum state, it is intuitive that the dimension of DLA upper bounds the rank of QFIM, which connects the training difficulty and controllability of a quantum model. An additional insight can be obtained by considering evaluating the DLA dimension at the m ≫ n limit for the 1-k-SAT+ problems, where the Hamiltonian is symmetric between all qubits (see inset of Fig. 2.2 (g) and (h)). In this case, we are able to prove an upper bound (see Methods), dim(g) ≤ dimUB ≡ 1 6 n(n 2 + 6n + 11). We also expect dim(g) to be above n 2 , which is the dimension for a much simpler nearest neighbor Ising model [23]. While the upper bound is in general a loose one, it indicates that the gradient in the m ≫ n limit is only polynomially small; in contrast, for the hard instances we would expect an exponentially small gradient. This contrast supports the decay of dimension and the increase of gradient when m/n is large. 2.2.2.2 Barren plateau and complexity To further understand the barren plateau phenomena, we study the speed of decay of typical gradient with the number of qubits. To begin with, we pick two values of system size, n = 6 and n = 16 for k-SAT (n = 18 for 1-k-SAT+ instead), and evaluate the ratio of the average gradient variance ⟨SD(∂1C(⃗γ, β⃗))⟩ 15 Layer SD 1 Ԧ , Ԧ (b) / = 0.8 (e) / = 0.8 0 20 40 60 (c) / = 2 (f) / = 2 0 20 40 60 (a) / = 0.2 (d) / = 0.2 0 20 40 60 102 10 1 0.1 10 1 0.1 1 - 3 -SAT + 1 - 2 -SAT + Figure 2.3: Barren plateau. SD of gradient SD (∂1C (γ, β)) versus the layer of QAOA p for 1-3-SAT+ (top) for 1-2-SAT+ (bottom) problems with different number of variables n. From left to right we plot at three different ratios m/n = 0.2, 0.8, 2. Due to the finite size n ≤ 18, as shown in Fig. 2.2 the transition happens at around m/n = 0.8. for different number of layers p. In Fig. 2.1 (e)(f) and Fig. 2.2 (e)(f), we see that right after the peak of the inverse gradients, when the circuit depth p is sufficiently large, the decay ratio saturate to a value independent of the clause-to-variable ratio, indicating the barren plateau. While below the peak of the inverse gradient, the decay ratio of gradient increases gradually with the clause-to-variable ratio. To confirm an exponential decay of gradient, in Fig. 2.3, we focus on 1-k-SAT+ and plot the SD of gradient versus the layer p for different number of qubits n, while keeping the clause-to-variable ratio m/n to be a constant in each panel. For 1-3-SAT+, when m/n is small in panel (a), the SD of gradient saturates and does not decrease versus p or n, showing no barren plateau; At the critical value in panel (b), we see an exponential decrease of SD versus the number of qubits n at large p, confirming a barren plateau. Above threshold, as shown in panel (c), a barren plateau can still be confirmed, however, with larger gradients than the critical case of panel (b). On the contrary, for 1-2-SAT+ as we see in panel (d) and (e), at around the SAT-UNSAT transition we do not see the appearance of a barren plateau. At large m/n in panel (f), the gradient finally starts to show an exponential decay, indicating a barren plateau. 16 The appearance of barren plateaus is often connected to the complexity of the typical quantum circuit involved [18]. The complexity of an ensemble of unitaries can in general be characterized by the closeness to unitary t-design, which reproduces the Haar random expectation values of 2t-point correlators. In this regard, when the quantum circuit forms a 2-design [60, 61, 72], it has been shown that the variance of the gradient will vanish exponentially with the system size—which leads to a barren plateau of cost function [16, 18]. Therefore, we consider the unitary ensemble Up formed by the p-layer QAOA, UQAOA = Y p ℓ=1 e −iβℓHB e −iγℓHC , (2.4) with each angle γℓ ∈ [0, 2π) and each angle βℓ ∈ [0, π) independent and uniform random. To measure the closeness to 2-design, we evaluate the ensemble-averaged infinite-temperature 4-point out-of-time-order correlator (OTOC) COTO(W1, W2; E) = 1 d D Trn W1U †W2UW1U †W2U oE E , (2.5) where the dimension d = 2n for an n-qubit system and the average is over the unitary U ∈ E [61]. For ensemble E forming a 2-design, we have COTO(W1, W2; E) = −d/(d 2 − 1) saturated to the Haar results [61, 72]; while for trivial ensembles, COTO(W1, W2; E) is of order one. Therefore, the decay of OTOC indicates the ensemble being a 2-design. Without loss of generality, we consider the OTOC between single-qubit operators COTO(ˆσ y 1 , σˆ y n/2 ; Up). In Fig. 2.1 (i)(j) and Fig. 2.2 (i)(j), we find the OTOC of the QAOA ensemble decays towards the Haar value when clause-to-variable ratio m/n increases to the critical value of the minimum gradient, indicating a transition to 2-design. We also see a difference between the cases of k = 3 versus k = 2—the decay of OTOC for k = 2 is much slower than k = 3. 17 2.2.3 Accuracy of QAOA In this section, we explore the accuracy of QAOA in solving k-SAT and 1-k-SAT+. To speed up the training, we develop a heuristic pre-optimization initialization strategy (see Sec. A.4). To obtain the best accuracy, we perform 10 repetitions on QAOA for each instance to obtain the optimal solution among those results. To benchmark the accuracy of QAOA with the classical algorithm, in the case of Max-k-SAT, we consider the lower bound of state-of-the-art approximation algorithms; In the case of decision versions of k-SAT, we consider success probability of random guess; In the case of 1-k-SAT+, as fewer results are known about approximation ratios, we reduce the problem to the maximum weighted independent set (MWIS) problem [73, 74] and utilize the greedy approximate MWIS algorithms proposed in [75, 76] (see Methods). The standard accuracy characterization of approximate algorithms for optimization problem is the approximation ratio [55, 75, 76]. For our case of Max-SAT problems, we define the approximation ratio r ≤ 1 of a solution to be the ratio between the number of clauses satisfied by the solution and the maximum number of clauses that can be satisfied by any solution. As the output state |ψ(⃗γ, β⃗)⟩ in QAOA can be in a superposition of multiple solutions, we evaluate the expected approximation ratio via projecting the output state to the computational basis. For Max-k-SAT, a random guess will satisfy on average m(1 − 1/2 k ) number of clauses; For Max 1-k-SAT+, a random assignment will satisfy on average mk/2 k number of clauses. For the instances with most clauses satisfiable, the above corresponds to an approximation ratio of rrand ∼ 1 − 1/2 k and rrand ∼ k/2 k . An exact optimal solution will saturate r = 1 and non-trivial approximate algorithms should have r ∈ [rrand, 1]. In Fig. 2.4 (a)(b) and Fig. 2.5 (a)(b), we see that as p increases, QAOA is able to obtain larger approximation ratios. As the clause-to-variable ratio m/n increases, the approximation ratio decays as expected. However, the decay is rather slow and manifest a robustness of QAOA. In the Max-3-SAT case, we see at small p the approximation ratio is already better than the lower bound of r ∼ 0.95 in Ref. [77], similarly, the lower bound of r ≥ 21/22 for the Max-2-SAT [55] case is also overcome at small depth p. In the case 18 Approximation ratio Clause−to−variable ratio / (a) (b) (c) 3-SAT 2-SAT (d) Success probability Figure 2.4: Accuracy of QAOA. (a)(b) Approximation ratio r of SAT clauses, (c)(d) Success probability in determining SAT/UNSAT for 3-SAT (left) and 2-SAT (right) versus clause-to-variable ratio m/n with n = 10 variables. Green dashed line represent the lower bound of approximation algorithm r ≥ 0.95 for Max-3- SAT [77] and r ≥ 21/22 for Max-2-SAT [55]. The horizontal light green dashed line in (c)(d) represent the success probability of the random guess which are 7/8 and 3/4 for 3-SAT and 2-SAT separately. Vertical black dashed lines in all plots represent critical point of SAT-UNSAT transition. of Max 1-k-SAT+, we consider the approximation ratio of the classical MWIS approximate algorithm for comparison (see Methods). For Max-1-3-SAT+, we identify a clear quantum advantage at around p ∼ 16. For Max-1-2-SAT+, advantages appear even for a shallow depth of p = 8. We want to emphasize that the quantum advantage happens only when the clause-to-variable ratio is large, despite the reachability deficits [56]. Indeed, we expect quantum algorithms to be advantageous especially for hard problems, where both classical and quantum algorithms face challenges. Although all Max-SAT problems being considered are NP-hard, we do see some interesting contrast in the performance. In the Max-2-SAT and Max-3-SAT cases, the approximation ratio performance is similar when p is large, consistent with previous results in Ref. [56]; however, for the absolute number of additional violated clauses, Max-2-SAT performs slightly better than Max-3-SAT. In the Max-1-k-SAT+ cases, the accuracy of QAOA is substantially higher for k = 2 than k = 3 with the same number of p 19 Approximation ratio Clause−to−variable ratio / (a) (b) (c) 1-2-SAT + 1-3-SAT+ (d) Success probability = 4 = 8 = 16 = 20 = 24 MWIS approx. Figure 2.5: Accuracy of QAOA. (a)(b) Approximation ratio r of SAT clauses, (c)(d) Success probability in determining SAT/UNSAT for 1-3-SAT+ (left) (from p = 4 to p = 24) and 1-2-SAT+ (right) (from p = 4 to p = 16) versus clause-to-variable ratio m/n with n = 10 variables. Green dots represent the classical approximate results through a reduction to MWIS. The horizontal light green dashed line in (c)(d) represent the success probability of the random guess which are 3/8 and 1/2 for 1-3-SAT+ and 1-2-SAT+ separately. Vertical black dashed lines in all plots represent critical point of SAT-UNSAT transition. layers. We speculate such a contrast in the performance can be caused by the different connectivity and complexity of the Hamiltonian in the problems. To connect to the empirical hardness transition in classical algorithms, we can also reinterpret each optimization result as a decision of SAT/UNSAT. This can be done via a threshold decision on the minimized number of UNSAT clauses, e.g., determine an instance as SAT when the expected number of UNSAT clauses is smaller than Eth = 0.5 and UNSAT otherwise. To characterize the overall performance, we evaluate the success probability of deciding SAT/UNSAT when solving random instances at a fixed clause-to-variable ratio m/n. The results are shown in Fig. 2.4 (c)(d) and Fig. 2.5(c)(d). The success probability increases with the layer of QAOA p as we expect. For the k = 3 cases in Fig. 2.4 (c) and Fig. 2.5(c), there is a valley of low success probability at around the critical point of m/n 20 Clause-to-variable ratio / 1 / Δ 2 Probability of ground state (a) (b) (d) 3 − SAT 2 − SAT (c) Figure 2.6: QAA gap size and performance for k-SAT problems.(a)(b) The median of 1/∆E2 of QAA with n = 10 variables shown by blue circles. The green and purple circles represent the SAT and UNSAT instances separately. (c)(d) The probability that state through QAA evolution lies in the ground state P = PD i=1 | ⟨ψi |ϕQAA⟩ |2 . shown in Fig. 2.2(a), recovering the same hardness transition identified in empirical studies of classical algorithms [50, 51, 52, 53, 54]. While for the k = 2 cases in Fig. 2.4 (d) and Fig. 2.5(d), despite a similar valley of low success probability at small p, the success probability is almost unity for a circuit depth of p = 16. Similarly, the classical benchmark can be reinterpreted and similar transition versus m/n can be seen. Such a valley at the classical SAT-UNSAT transition indicates a remnant of the classical empirical hardness and is different from the trainability transition identified in Fig. 2.1 and Fig. 2.2. Combining the above, we see that overall QAOA possesses a similar notion of what is hard and easy as classical algorithms, while showing advantage over the classical algorithms being considered in the large problem-density instances. 21 Clause-to-variable ratio / 1 / Δ 2 Probability of ground state (a) (b) (c) (d) 1 − 3 − SAT + 1 − 2 − SAT + Figure 2.7: QAA gap size and performance of 1-k-SAT+.(a)(b) The median of 1/∆E2 of QAA with n = 10 variables shown by blue circles. The green and purple circles represent the SAT and UNSAT instances separately. (c)(d) The probability that the state through QAA evolution lies in ground state P = PD i=1 | ⟨ψi |ϕQAA⟩ |2 . 2.2.4 Comparison with quantum adiabatic algorithm The identified trainability transition in general deviates from the classical computational phase transition, while the performance in solving the SAT problems show a consistent trend with the classical computational phase transition. Such a disparity intrigues us to explore the empirical hardness of SAT instances in other Hamiltonian-based quantum algorithms, such as QAA—a popular alternative and also the predecessor of QAOA. To obtain the solution, QAA prepares the ground state of the problem Hamiltonian in Eqs (2.1) and (2.2) via an adiabatic evolution of the Hamiltonian H(s) = sHC + (1 − s)H′ B, s ∈ [0, 1], (2.6) 22 from an ancillary Hamiltonian H′ B at s = 0 to the problem Hamiltonian HC at s = 1. Here the ancillary Hamiltonian is H′ B = Pn i=1 |hi |σˆ x i , where |hi | is the number of times the variable vi appear in the clauses (see Methods). The initial Hamiltonian at H(0) = H′ B , with an easy to prepare ground state |ψ (0)⟩ ∝ (|0⟩ − |1⟩) ⊗n in a superposition of all possible spin configurations. As one tunes the parameter s slowly towards H(1) = HC, the adiabatic theorem guarantees that the state of the system stays in the ground state; therefore, the final state |ϕQAA⟩ is the ground state of the problem Hamiltonian, which provides the solution to the optimization problem. From the adiabatic theorem, we can obtain an estimation on the computation time of QAA as Ta ∼ 1/(∆E) 2 so that the success probability is close to unity, where ∆E is the minimum gap of the Hamiltonians {H(s), s ∈ [0, 1]} [64, 66]. In Fig. 2.6(a)(b) and Fig. 2.7(a)(b), we evaluate the inverse gap square 1/(∆E) 2 as a measure of the instance hardness with different clause-to-variable ratio using Qutip [78]. Note that there exist other more rigorous estimations on the necessary adiabatic evolution time, combining higher-order terms [79, 80]. However, the inverse gap square as a approximate estimation is sufficient for our purpose. In subplots (a), we identify a computational phase transition for 3-SAT and 1-3-SAT+, where the minimum gap is minimum at about the critical SAT/UNSAT transition, up to some small deviation due to finite size, similar to the decision version of SAT on QAOA in subplots (c) of Fig. 2.4 and Fig. 2.5. While for 2-SAT and 1-2-SAT+, we see the minimum gap to be higher than the critical point, qualitatively agreeing with the case of QAOA. To further confirm the transition, we evaluate the success probability P = PD i=1 | ⟨ψi |ϕQAA⟩ |2 from the overlap between the final evolved state |ϕQAA⟩ and the D-degenerate ground state of the problem Hamiltonian |ψi⟩. In Fig. 2.6(c)(d) and Fig. 2.7(c)(d), at a finite time Ta, we see the success probability of QAA decreases before the critical point, while roughly maintaining a constant above the critical point. Such a robustness to the problem density coincides with the slow decay of QAOA’s approximation ratio 23 with the problem density. In addition, we also find the success probability of 2-SAT and 1-2-SAT+ (subplots (d)) to be much higher than that of 3-SAT and 1-3-SAT+ (subplots (c)). 2.3 Discussion In this chapter, we thoroughly explore the empirical hardness of Hamiltonian-based quantum algorithms in solving SAT problems. In the case of QAOA, we find a trainability phase transition, where the gradient is minimum at certain critical problem density. Such a phase transition is connected to the controllability and complexity of QAOA circuits. Although the trainability transition in general deviates from the classical SAT-UNSAT transition, in terms of performance, Hamiltonian-based algorithms do show a remnant of the classical SAT-UNSAT transition. Although our results are empirical, we expect analytical results to be challenging, as the classical correspondence of such transition is also empirical due to the complexity of the SAT problems. We also identify quantum advantages of QAOA against several classical greedy approximate algorithms for a relatively small-scale quantum system, potentially realizable in the nearterm. Although we have focused on two cases of SAT for convenience, we expect the computational phase transition in QAOA to apply to all combinatorial optimization problems. In particular, as 3-SAT is NPcomplete, the clause-to-variable ratio represents a universal characterization of a ‘problem density’ [56], and the computational phase transition applies to all NP-complete problems in this regard. 2.4 Methods 2.4.1 Many-body Formulation of the problem Hamiltonian Here we introduce the many-body formulation of the problem spin Hamiltonian in Eqs. (2.1) and (2.2). For convenience, we first introduce an n × m binary matrix Aij , where Aij = 1(−1) if the variable vi is 24 included in the clause cj as positive(negative) literal and Aij = 0 otherwise; With this matrix in hand, we can express all Hamiltonians in a standard many-body form [74] as HC,3 = 1 8 X i<j<ℓ Kijℓσˆ z i σˆ z j σˆ z ℓ + 1 8 X i<j Jijσˆ z i σˆ z j − 1 8 X i hiσˆ z i + m 8 (2.7a) HC,2 = 1 4 X i<j Jijσˆ z i σˆ z j − 1 4 X i hiσˆ z i + m 4 (2.7b) for k-SAT problems; and HC,3+ = 1 2 Xn i=1 hiσˆ z i + 1 2 X i<j Jijσˆ z i σˆ z j + m, (2.8a) HC,2+ = 1 2 X i<j Jijσˆ z i σˆ z j + m 2 , (2.8b) for 1-k-SAT+ problems. The notations are introduced as hi = − Xm j=1 Aij (2.9a) Jij = Xm a=1 AiaAja (2.9b) Kijℓ = Xm a=1 AiaAjaAℓa (2.9c) Note that the number of clauses containing vi is equal to |hi |. 2.4.2 Dynamical Lie algebra: definition and bounds Below we give bounds for dim (g) in the m ≫ n limit for both 1-2-SAT+ and 1-3-SAT+, where all coefficients Jij ’s and hi ’s approach uniform (see Sec. A.1). For 1-2-SAT+, we have HC,2+ ∝ P i<j σˆ z i σˆ z j up to a constant. Then, the set of the initial generators for the corresponding DLA gHC,2+ ,HB is G2+ ≡ nPn i=1 σˆ x i , P i<j σˆ z i σˆ z j o . For the fully coupled Ising model 25 with transverse fields along x and y axis, the set of the initial generators becomes Gx,y ≡ {G2, Pn i=1 σˆ y i }. From Ref. [81], the dimension of the corresponding DLA gx,y is dim(gx,y) = n + 3 n − 1 = 1 6 n(n 2 + 6n + 11), (2.10) where a b ≡ a!/(a − b)!b! is the binomial coefficient. Since the DLAs are generated by the repeated and nested commutators of the generator sets, we must have dim(gHC,2+ ,HB ) ≤ dim(gx,y) due to G2 ⊂ Gx,y, which leads to dim(gHC,2+ ,HB ) ≤ 1 6 n(n 2 + 6n + 11). (2.11) Also we know nearest neighbor Ising model has dimension n 2 [23]. Therefore, we expect the scaling to be between Ω(n 2 ) and O(n 3 ). For the 1-3-SAT+, we have HC,3+ ∝ 2 n−1 P i<j σˆ z i σˆ z j − Pn i=1 σˆ z i . Then, the initial set of generators is G3 = nPn i=1 σˆ x i , 2 n−1 P i<j σˆ z i σˆ z j − Pn i=1 σˆ z i o . Let gHC,3+ ,HB be the corresponding DLA. Here, because we can write e iHC,3+ = e i 2 n−1 P i<j σˆ z i σˆ z j e −i Pn i=1 σˆ z i , (2.12) if we start from the initial set of generator G ′ 3 = nPn i=1 σˆ x i , Pn i=1 σˆ z i , P i<j σˆ z i σˆ z j o , we have the corresponding DLA to strictly contain gHC,3+ ,HB . Now, due to the commutator [ Pn i=1 σˆ x i , Pn i=1 σˆ y i ] ∝ Pn i=1 σˆ z i , the corresponding DLA of G ′ 3 becomes exactly gx,y. Therefore, we have dim(gHC,3+ ,HB ) ≤ dim(gx,y), which leads to dim(gHC,3+ ,HB ) ≤ 1 6 n(n 2 + 6n + 11). (2.13) The lower bound estimation n 2 for the dimension of DLA from nearest neighbor Ising model still holds. 2 2.4.3 Classical approximate algorithms To solve Max-1-k-SAT+, we transform it to the MWIS problem. Given a 1-k-SAT+ instance with n variables and m clauses, one can construct a weighted graph with n vertices {qi} n i=1, each corresponding to a variable vi and having the weight w(qi) = |hi | ≥ 0. For every two distinct vertices qi , qj , an edge (qi , qj ) exists if Jij > 0—when the corresponding variable vi , vj appear in at least one clause at the same time. One can verify that the SAT/UNSAT version of 1-k-SAT+ problem is reduced to asking whether the weight of maximum independent set is equal to m or not. The reason is simple: an independent set of this graph corresponds to an assignment that does not have more than one true assignment in any clause. To guarantee a solution to the 1-k-SAT+ instance, we still need to make sure that all clauses have one true variable. As the total weight of the independent set is equal to how many clauses are satisfied by this assignment, therefore if the total weight is equal to m, all clauses are satisfied. At the same time, the Max-1-k-SAT+ can be reduced to solving the MWIS. As a classical benchmark, we can utilize various greedy algorithms for MWIS [75, 76] and choose the best performance among them (see Sec. A.3). 27 Chapter 3 Dynamical phase transition in quantum neural networks with large depth 3.1 Introduction As a paradigm of near-term quantum computing, variational quantum algorithms [14, 28, 15, 16, 82, 13] have been widely applied to chemistry [14, 27], optimization [28, 83], quantum simulation [84, 85], condensed matter physics [31], communication [45, 86], sensing [87, 88] and machine learning [89, 12, 29, 90, 91, 92, 93, 94]. Adopting layers of gates and stochastic gradient descent, they are regarded as ‘quantum neural networks’ (QNN), generalizing classical neural networks that are crucial to machine learning. Concepts and methods related to variational quantum algorithms are also beneficial for quantum error correction and quantum control [95, 96], bridging near-term applications with the fault-tolerant era. Despite the progress in applications, theoretical understanding of the training dynamics of QNN is limited, hindering the optimal design of quantum architectures and the theoretical study of quantum advantage in such applications. Previous works adopt tools from quantum information scrambling for empirical study of QNN training [97, 98]. Recently, the Quantum Neural Tangent Kernel (QNTK) theory presents a potential theoretical framework for an analytical understanding of variational quantum algorithms, at least within certain limits [24, 25, 99, 100, 101], revealing deep connections to their classical machine learning 28 Error 0 or 1 0 or 1 0 or 1 0 or 1 Quantum neural network Generalized LotkaVolterra model Quantum statistical phase transition �� − ���� Hessian Spectrum gap Quantum neural tangent kernel (QNTK) Second order phase transition Restricted Haar ensemble interpretation Frame potential Time Haar ensemble Restricted Haar ensemble Early time Late time O0>Omin: frozen kernel phase O0<Omin: frozen error phase O0=Omin: critical point 0 Early time Late time O0 > = < Omin = min stateshOˆi 1 2 (hOˆi O0) 2 Figure 3.1: Illustration of setup and main results of this work. We study the total error optimization dynamics of quantum neural networks with loss function L(θ) = (⟨Oˆ⟩−O0) 2/2, and identify a dynamical phase transition. We derive a first-principle generalized Lotka-Volterra model to characterize it, and also provide a quantum statistical theory and random unitary ensemble interpretation, presented separately in Sec. 3.4, 3.5, 3.6. counterparts [102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112]. However, the theory of QNTK relies on the assumption of sufficiently random quantum circuit set-ups known as unitary k-designs [61, 113, 114, 115] that is only true at random initialization, preventing the theory from describing the more important late-time training dynamics. Similar limitations also exist for other theoretical works [26, 116, 117, 16, 18]. In this chapter, we go beyond QNTK theory and identify a dynamical phase transition in the training of QNNs with large depth, when the target cost function value O0 cross the minimum achievable value (ground state energy Omin). We show that the phase transition is governed by the generalized LotkaVolterra (LV) equations describing a competitive duality between the quantum neural tangent kernel and the total error. As depicted in Fig. 3.1, in the frozen-kernel phase where O0 > Omin is in the bulk of spectrum, the kernel is approaching a constant while the error decays exponentially with training steps; At the critical point when O0 = Omin exactly, both the kernel and the error decay polynomially; In the frozen-error phase when O0 < Omin is unachievable, the error approaches a constant, while the kernel experiences an exponential decay. Via mapping the Hessian of the training dynamics to a Hamiltonian 29 10−3 10−2 10−1 100 101 ( t ) (a1) O0 > Omin (b1) O0 = Omin (c1) O0 < Omin 0 100 200 t 10−3 10−1 101 ε 0 100 200 t 10−3 10−1 101 K ( t ) (a2) 0 500 1000 t (b2) 0 100 200 t (c2) Figure 3.2: Dynamics in QNN network in the example of XXZ model. The top and bottom panel shows the dynamics of total error ϵ(t) and QNTK K(t) with respect to the three cases O0 ⋛ Omin. Blue curves represent numerical ensemble average result. Red curves in from (a1) to (c1) represents theoretical predictions on the dynamics of total error in Eq. (3.18), (3.19), (3.20); red curves in (a2) to (c2) represents theoretical predictions on the QNTK dynamics in Eq. (3.37), (3.19), (3.20). Grey dashed lines show the dynamics for each random sample. The inset in (c1) shows the exponential decay of residual error ε. Here random Pauli ansatz (RPA) consists of L = 64 layers on n = 2 qubits, and the parameter in XXZ model is J = 2. in the imaginary time, we reveal the nature of the phase transition as second-order with the correlation exponent ν = 1, where a closing gap and scale invariance with dimension ∆ = 1/2 are observed at the critical point. We also provide a non-perturbative analytical theory to explain the phase transition via a restricted Haar ensemble at late time, when the QNN output state approaches the steady state. The theory findings are verified on IBM quantum devices. 3.2 Training dynamics of quantum neural networks A QNN is composed of L layers of parameterized quantum circuits, realizing a unitary transform Uˆ(θ) on n qubits, where θ = (θ1, . . . , θL) are the variational parameters. When inputting a trivial state |0⟩ ⊗n , the final output state of the neural network |ψ(θ)⟩ = Uˆ(θ)|0⟩ ⊗n , from which one can measure a Hermitian 30 observable Oˆ leading to the expectation value ⟨Oˆ⟩ = ⟨ψ(θ)|Oˆ|ψ(θ)⟩. To optimize the expectation of an observable Oˆ towards the target value O0, a general choice of loss function is in a quadratic form, L(θ) = 1 2 ⟨Oˆ⟩ − O0 2 ≡ 1 2 ϵ 2 (3.1) where the total error ϵ = ⟨Oˆ⟩ − O0 ≡ ε + R consists of a constant remaining term R = limt→∞ ϵ and a vanishing residual error ε. Here t is the discrete number of time steps in the training. Suppose the observable Oˆ has possible values in the range of [Omin, Omax]. Now due to symmetry of maximum and minimum in optimization problems, we assume O0 < Omax is true. In each optimization step, every variational parameter is updated by the gradient descent δθℓ(t) ≡ θℓ(t + 1) − θℓ(t) = −η ∂L ∂θℓ = −ηϵ ∂ϵ ∂θℓ , (3.2) where η is the fixed learning rate. When η ≪ 1 is small, the total error is updated as δϵ(t) ≃ X ℓ ∂ϵ ∂θℓ δθℓ + 1 2 X ℓ1,ℓ2 ∂ 2 ϵ ∂θℓ1 ∂θℓ2 δθℓ1 δθℓ2 (3.3) = −ηϵK + 1 2 η 2 ϵ 2µ, (3.4) where the QNTK K and dQNTK µ are defined as [25] K ≡ X ℓ ∂ϵ ∂θℓ 2 , (3.5) µ ≡ X ℓ1,ℓ2 ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 . (3.6) 31 In the dynamics of ϵ, as η ≪ 1, we focus on the first order of η in Eq. (3.4) as δϵ(t) = −ηϵK + O(η 2 ). (3.7) To characterize the dynamics of ϵ, it is necessary and sufficient to understand the dynamics of QNTK K(t). Towards this end, we derive a first-order difference equation for QNTK K as (see details in Sec. B.1) δK ≡ K(t + 1) − K(t) = −2ηϵµ + O(η 2 ). (3.8) Combining Eq. (3.7) and Eq. (3.8), we aim to develop the dynamical model in training QNNs. 3.3 Dynamical phase transition Our major finding is that when the circuit depth is large, the QNN dynamics exhibit a phase transition at Omin (and Omax similarly) as we depict in Fig. 3.2 (See Methods for details of the circuit and observable). frozen-kernel phase: When O0 > Omin, the total error decays exponentially and the energy converges towards O0, as shown in Fig. 3.2(a1). This is triggered by the frozen QNTK as shown in Fig. 3.2(a2). Each individual random sample (gray) has slightly different value of frozen QNTK due to initialization, while all possess the exponential convergence. Our theory prediction (red) agrees with the actual average (blue) for both the ensemble averaged QNTK K and the error, while deviations due to early time dynamics can be seen (see Methods for details). Critical point: When targeting right at the GS energy O0 = Omin, both the total error and QNTK decay as 1/t, independent of system dimension d. As shown in Fig. 3.2(b2), the QNTK ensemble average (blue) agrees very well with the theory prediction shown in red. Due to initial time discrepancy in QNTK that is beyond our late time theory, the actual error dynamics has a constant deviation from the theory prediction (red), however still has the 1/t late time scaling, as shown in Fig. 3.2(b1). 32 frozen-error phase: When targeting below the GS energy O0 < Omin, the total error converges to a constant R = Omin − O0 > 0 exponentially, as shown in Fig. 3.2(c1). The inset shows the exponential convergence via the residual error ε = ϵ − R. In this case, the QNTK also decays exponentially with the training steps, as shown in Fig. 3.2(c2). Deviation between the theory and numerical results can be seen due to early time dynamics beyond our theory. 3.4 Generalized Lotka-Volterra model With large depth L ≫ 1, the QNN at late time can be considered as a typical unitary in an ensemble. As we will detail in Sec. 3.6, from the ensemble modeling, one can prove that the relative dQNTK—the ratio of dQNTK and QNTK λ = µ/K (3.9) converges towards a constant dependent on L and d. Under the assumption that λ being a constant and taking the continuous limit, Eqs. (3.7) and (3.8) lead to a coupled set of equations, ∂tϵ = −ηϵK ∂tK = −2ηλϵK (3.10) to the leading order in η. This is the generalized Lotka-Volterra equation developed for modeling nonlinear population dynamics of species competing for some common resource [118]. The two ‘species’ represented by K and ϵ are in direct competition as the interaction terms are negative. As Eqs. (3.10) have zero intrinsic birth/death rate, there is no stable attractor where all species K and ϵ are positive, as sketched in Fig. 3.1, where 2λϵ and K are the x and y axis. From Eqs. (3.10), we can identify the conserved quantity at late time C = K − 2λϵ = const. (3.11) 33 Each trajectory of (2λϵ, K) governed by Eqs. (3.10) is thus a straight line quantified by the conserved quantity C. Also, we can see the fixed points are (2λϵ, K) = (−C, 0) or (2λϵ, K) = (0, C), for different values of C. We verify the trajectory from conserved quantity of Eq. (3.11) in Fig. 3.3(a), where good agreement between QNN dynamics and generalized LV dynamics can be identified. The conservation law indicates that a Hamiltonian description of LV dynamics is possible. Indeed, we can introduce the canonical coordinates P = log(K), Q = log(2λϵ) (3.12) and the associated Hamiltonian H(Q, P) = η(e Q − e P ) ≡ η(2λϵ − K), (3.13) from which the LV equations in (3.10) can be equivalently rewritten as the standard Hamiltonian equation generalizing Ref. [119], dQ dt = ∂H ∂P = {Q, H}, dP dt = − ∂H ∂Q = {P, H}, (3.14) where {A, B} = ∂A ∂Q ∂B ∂P − ∂A ∂P ∂B ∂Q denotes the Poisson bracket. From the position-momentum duality in Hamiltonian formulation, we identify an error-kernel duality between e Q ∼ ϵ and its gradient e P = |∂ϵ/∂θ| 2 . We can obtain analytical solution of Eqs. (3.10) directly. When C ̸= 0, we have λϵ(t) = C/ −2 + B1e ηCt , K(t) = C/ 1 − 2B −1 1 e −ηCt , (3.15) 34 where B1 is a constant fitting parameter as at early time we do not expect Eqs. (3.10) to hold. In particular, when C > 0, at late time QNTK K(t) = C is frozen and λϵ(t) ∝ e −ηCt is exponentially decaying, approaching the fixed point (2λϵ, K) = (0, C); when C < 0, on the other hand, at late time 2λϵ = −C is frozen and QNTK K(t) ∝ e ηCt is exponentially decaying, approaching the fixed point (2λϵ, K) = (−C, 0). In both cases, the convergence to the fixed point is exponential. When C = 0, Eqs. (3.10) leads to polynomial decay of both quantities K(t) = 2λϵ(t) = 2/ 2ηt + B −1 2 (3.16) where B2 is again a fitting parameter as at early time we do not expect Eqs. (3.10) to hold. At late time, both ϵ and K decay as 1/t, approaching the fixed point (0, 0). Overall, we see that the two phases (and the critical point) of the QNN dynamics has a one-to-one correspondence to the two family of fixed points (and their common fixed point) of the generalized LV equation. The conserved quantity C = K − 2λϵ = (K2 − 2ϵµ)/K. Since K > 0 at any finite time, the sign of the constant is determined by the dynamical index defined as ζ = ϵµ/K2 . (3.17) If ζ ⋛ 1/2, we have C ⋚ 0, determining the phases. We summarize the above analyses in the following theorem, taking the t ≫ 1 late time limit of the solutions Theorem 1 Assuming relative dQNTK λ = µ/K being a constant at late time, the QNN dynamics is governed by the generalized Lotka-Volterra equation in Eq. (3.10) and possesses two different phases, depending on the value of a conserved quantity C = K − 2λϵ = (1 − 2ζ)K or equivalently the dynamical index ζ = ϵµ/K2 . 3 10−3 10−1 101 2λ(∞)(t) 10−3 10−1 101 K ( t ) (a) O0 > Omin O0 < Omin O0 = Omin 0 400 800 t −20 0 20 λ ( t ) (b) 0 400 800 t −0.5 0.0 0.5 1.0 ζ ( t ) Figure 3.3: Classical dynamics interpretation of total error and QNTK dynamics. (a) Trajectories of (2λϵ, K) in dynamics of QNN with different O0 ⋛ Omin, plotted in solid blue, red and green. Dashed curves show the trajectory from Eq. (3.11). The arrows denote the flow of time in QNN optimization. Logarithmic scale is taken to focus on the late-time comparison. (b) The dynamics of corresponding λ = µ/K. The inset shows the dynamics of ζ = ϵµ/K2 . The observable is XXZ model with J = 2, and QNN is an n = 2 qubit RPA with L = 64 layers. The legend in (a) is also shared with (b) and its inset. 1. When ζ < 1/2 thus C > 0, we have the ‘frozen-kernel phase’ (c.f. [25]), where the QNTK K(t) = C is frozen and ϵ(t) ∝ e −ηCt , (3.18) 2. When ζ = 1/2 thus C = 0, we have the ‘critical point’, where both the QNTK and total error decay polynomially, K(t) = 2λϵ(t) ≃ 1/ηt. (3.19) 3. When ζ > 1/2 thus C < 0, we have the ‘frozen-error phase’, where the total error ϵ(t) = R is frozen and both the kernel and the residual error decay exponentially K(t) = 2λε(t) ∝ e −2ηλRt . (3.20) 36 −5 0 5 10 15 O0 − Omin 0 20 40 60 80 100 Spectrum gap G M (a) (b) (c) 1 4 7 j 0 50 100 Spectrum (a1) limt→∞ K Gap 10−1 100 |O0 − Omin| 101 102 Spectrum gap G M O0 < Omin O0 > Omin 0 500 1000 1500 2000 τ 10−12 10−10 10−8 10−6 A ( τ ) 10−1 100 |O0 − Omin| 102 ξ (c1) O0 < Omin O0 = Omin O0 > Omin Figure 3.4: Dynamical phase transition of QNN network in the example of XXZ model. (a) The phase diagram of QNN network optimization dynamics, characterized by the spectrum gap of Hessian matrix in t → ∞ (black). The critical gapless point corresponds to O0 = Omin, Omax (red triangles). The orange line represents the QNTK limt→∞ K. Inset (a1) shows the Hessian spectrum of the largest 10 eigenvalues for the three cases O0 ⋚ Omin marked by triangles in (a). In (b) we present the scaling of spectrum gap GM ∼ |O0 − Omin| ν1 with ν1 ≃ 1 from fitting (dashed lines). In (c), we plot the decay of autocorrelators Aϵ(τ ) with different O0 ⋚ Omin, and show the scaling of correlation length ξ ∼ |O0 − Omin| −ν2 with ν2 ≃ 1 (dashed lines) by fitting in inset (c1). Here the RPA consists of L = 64 layers on n = 2 qubits, and the parameter in XXZ model is J = 2. One can connect the phases to O0 ⋛ Omin intuitively. When O0 < Omin, it is clear that R > 0 and we expect dynamical index ζ > 1/2 and C < 0 so that it is the ‘frozen-error phase’. When O0 > Omin, we know the total error will decay to zero eventually, and therefore we can correspond this phase to the ‘frozen-kernel phase’ where dynamical index ζ < 1/2 and C > 0. The case O0 = Omin is therefore the critical point. In Fig. 3.3(b) inset, we indeed see the dynamical index ζ → 0, 1/2, +∞ when O0 ⋛ Omin. In our later theory analyses, we will make this connection rigorous between O0 ⋛ Omin, the dynamical index ζ ⋛ 1/2 and the phase transition. 37 3.5 Statistical physics interpretation Besides the LV dynamics, we can also connect the phase transition to the gap closing of the Hessian, as we detail below. The gradient descent dynamics in Eq. (3.2) leads to the time evolution of the quantum state |ψ(θ)⟩. In the late time limit, we can expand the shifts δθℓ around the extremum θ ∗ to second order as δθ ≃ −ηM|θ=θ∗ (θ − θ ∗ ), (3.21) where the first order term vanishes due to convergence and the Hessian matrix M is Mℓ1ℓ2 = ∂ 2L ∂θℓ1 ∂θℓ2 = ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + ϵ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 . (3.22) Eq. (3.21) can be interpreted as an imaginary time evolution, if we consider an unnormalized ‘differential state’ as a superposition of two output states of the QNN, |Ψ(θ)⟩ = |ψ(θ)⟩ − |ψ(θ ∗ )⟩ = N (θ − θ ∗ ), (3.23) where N = ∂|ψ(θ)⟩/∂θ at θ = θ ∗ is a d×L dimensional matrix. Formally, the pseudoinverse of the linear relation is denoted by N −1 . From Eq. (3.21), the effective Hamiltonian H∞ = NMN −1 can be obtained from δ|Ψ(θ)⟩ = N · δθ = −ηNMN −1 |Ψ(θ)⟩. (3.24) which is the Schrödinger equation with imaginary time iη. In this interpretation, we have a quantum phase transition driven by O0. One can define the correlation length ξ, its associated critical exponent ν, and scaling dimensions for various physical quantities. Note that the quantum mechanical Hamiltonian is 38 valid only under the late-time limit t → ∞, and it should be distinguished from the Hamiltonian Eq. (3.13) governing the LV dynamics of K and ϵ. Let us begin by exploring the behaviors of the gap of H∞ (equivalent to the Hessian gap) and the correlation length (defined below) around the critical point. The numerical observation of the divergence of the correlation length is indeed strong evidence for 2nd order phase transition. We consider the Hessian eigenvalues at the late time limit of t → ∞ and large circuit depth in Fig. 3.4(a). For frozen-kernel phase of Omin < O0 < Omax, Eq. (3.22) is a rank-one matrix with only one nonzero eigenvalue as ϵ → 0 (see blue in (a1)), which coincides with the kernel and is verified by the orange and black curve in (a). While for frozen-error phase with O0 < Omin (or O0 < Omax), due to non-vanishing ϵ, the Hessian has multiple nonzero eigenvalues (see green in (a1)). In both cases far away from the critical point, the Hessian has a constant gap, so we get the exponential decay of the correlation function discussed later. At the critical point, the kernel is vanishing, so the gap is closing, and we have the polynomial decay of the correlation function during time. We notice a linear closing of the gap around the critical point (red triangle), and verify the scaling of closing gap in Fig. 3.4(b) via fitted the gap GM to GM ∼ |O0 − Omin| ν1 , (3.25) resulting in ν1 = 1.000, 0.987 for O0 ≶ Omin. Regarding the QNN as a (0+1)-dimensional system in statistical physics, we then study the correlators to gain more insights into the phase transition. Both K and ϵ has the same correlator behavior and we focus on ϵ here (see Sec. B.2) and define the autocorrelator Aϵ(τ ) ≡ E [ε(t)ε(t + τ )] , where the average is over the ensemble of trajectories and we will consider t ≫ 1 region. Here ε is adopted as it captures the 39 residual error for the study of fluctuations. Away from the critical point, from the mean-field dynamics at late time according to Eq. (3.15), we expect the autocorrelator Aϵ(τ ) ∼ exp [−|τ |/ξ] (3.26) decays exponentially with a finite correlation length ξ, which is verified in Fig. 3.4(c). We further reveal the scaling of correlation length as ξ ∼ 1/|C| ∼ |O0 − Omin| −ν2 , (3.27) where ν2 is found to be 1.001, 0.987 for O0 ≶ Omin, shown in Fig. 3.4(c1). The numerical values of ν1 and ν2 are indeed identical up to numerical precision, as expected. At the critical point O0 = Omin, any physical quantity F is expected to exhibit power-law correlation AF (τ ) ∼ 1/|τ | 2∆[F] for |τ | ≫ t, defining the scaling dimension ∆[F]. Based on the definitions Eqs. (3.5), (3.6) and (3.9), one can establish the following scaling relations ∆[ϵ] = 2∆[K] − ∆[µ] ∆[λ] = ∆[µ] − ∆[K]. (3.28) As shown in Fig. 3.4(c), our numerical result suggests ∆[ϵ] = 1/2. This seems consistent with the solution Eq. (3.16), assuming the correlation is factorizable. In Sec. B.2, we also find ∆[K] = ∆[µ] = 1/2, while ∆[λ] = 0 as λ is a constant. We see that the scaling relations in Eqs. (3.28) are indeed fulfilled. In summary, we have presented compelling evidence supporting the interpretation of the dynamical phase transition as a second-order phase transition in a quantum mechanical system. However, reaching the truly infinite-time limit poses challenges both numerically and experimentally. In our estimation of 40 the correlation function in Sec. B.2, we rely on taking the subtle limit τ ≫ t ≫ 1 and the fluctuations being small. Understanding the complete set of universal data for the critical point of the time-independent effective Hamiltonian H∞ remains a nontrivial theoretical problem, which we leave for future study. 3.6 Unitary ensemble theory for QNN dynamics In this section, we provide analytical results to resolve two missing pieces of the LV model—the assumption that the relative dQNTK λ in Eq. (3.9) is a constant at late time and the connection between the dynamical index ζ ⋛ 1/2 in Eq. (3.17) and the O0 ⋛ Omin cases. Our analyses will rely on large depth L ≫ 1 and large Hilbert space dimension d ≫ 1, which allow us to consider ensemble averaged values to represent the typical case, ζ = ϵµ/K 2 , λ = µ/K. Note that we take the ratio between averaged quantity via considering the sign of C. As the QNN is initialized randomly at the beginning, the unitary Uˆ(θ) being implemented can be regarded as typical ones satisfying Haar random distribution [25, 16, 18], regardless of the circuit ansatz. While this is a good approximation at initial time, we notice that at late time, the QNN Uˆ(θ) is constrained in the sense that it maps the initial trivial state (e.g. product of |0⟩) towards a single quantum state, regardless of whether the quantum state is the unique optimum or not. Therefore, the late-time dynamics are always restricted due to convergence, which we model as the restricted Haar ensemble with a block diagonal form, ERH = {U|U = 1 0 0 V }, (3.29) where V is a unitary with dimension d−1 following a sub-system Haar random distribution (only 4-design is necessary). Here we have set the basis of the first column and row to represent mapping from initial state to the final converged state. At late time, QNN converges to a restricted Haar ensemble determined 41 by the converged state. When the converged state is unique, frame potential [61] of the ensemble can be evaluated by considering different training trajectories, which confirms the ansatz in Eq. (3.29), as shown in Fig. 3.1 and Supplemental Material B.5. The ensemble average for a general traceless operator is challenging to analytically obtain. To gain insights to QNN training, we consider a much simpler problem of state preparation, where Oˆ = |Φ⟩⟨Φ| is a projector. In this case, we are interested in target values O0 near the maximum cost function Omax = 1. Under such restricted Haar ensemble, we have Lemma 2 When the circuit satisfies the restricted Haar random (restricted 4-design) ensemble and L ≫ 1, d ≫ 1, in state preparation tasks the relative dQNTK λ∞ goes to a L, d dependent constant. When O0 < Omax, the dynamical index ζ∞ ≃ 0; when O0 = Omax, the relative dQNTK ζ∞ ≃ 1/2; when O0 > Omax, the dynamical index ζ diverges to +∞. This lemma derives from Theorem 3 in the Method. While our results are general, in our numerical study that verifies the analytical results, we adopt the random Pauli ansatz (RPA) [25] as an example (see Methods). The above is confirmed in Fig. 3.3(b), where λ approaches constant at late time, while ζ(t) approaches either 0, 1/2 or diverges. Due to symmetry between maximum and minimum in optimization, this restricted Haar ensemble therefore fully explains the phases in Theorem 1 quantitatively and the assumption of λ approaches a constant qualitatively. From asymptotic analyses of the restricted Haar ensemble in Sec. B.7, We also have both λ, C ∝ L/d, thus the exponential decay in LV has exponent ∝ ηLt/d. Indeed, in a computation, ηLt describes the resource— when circuit depth L is larger, one needs to compute and update more parameters, while taking fewer steps t to converge. As we show in Methods, Haar ensemble fails to capture the ζ dynamics nor the phase transition. Only in the case of frozen-kernel phase, as the kernel does not change much during the dynamics, the Haar predictions roughly agree with the actual kernel, as shown in Fig. 3.2(a2) (see Methods). 42 0 50 100 t 10−2 10−1 100 101 ( t ) O0 > Omin O0 = Omin O0 < Omin Figure 3.5: Dynamics of total error ϵ(t) on IBM quantum devices, Kolkata. Solid and dashed curves represent results of real experiment and noiseless simulation. An n = 2 qubit 4-layer hardware efficient ansatz is utilized to optimize with respect to XXZ model observable with J = 4. 3.7 Experimental results In this section, we consider the experimental-friendly hardware-efficient ansatz (HEA) to experimentally verify our results on real IBM quantum devices. Each layer of HEA consists of single qubit rotations along Y and Z directions, and followed by CNOT gates on nearest neighbors in a brickwall style [27]. Our experiments adopt the hardware IBM Kolkata, an IBM Falcon r5.11 hardware with 27 qubits, via IBM Qiskit[120]. The device has median T1 ∼ 98.97us, median T2 ∼ 58.21us, median CNOT error ∼ 9.546 × 10−3 , median SX error ∼ 2.374 × 10−4 , and median readout error ∼ 1.110 × 10−2 . We randomly assign the initial variational angles, distributing them within the range of [0, 2π), and maintain consistency across all experiments. To suppress the impact of error, we average the results over 12 independent experiments conducted under the same setup for three distinct choices, O0 = −10, −12, −14. In Fig. 3.5, the experimental data (solid) on IBM Kolkata agree well with the ideal simulations (dashed) and indicate the frozen-error phase with constant error (green), the critical point of polynomial decaying error (red) and the frozen-kernel phase of exponential decaying error (blue). 43 3.8 Discussion Our results go beyond the early-time Haar random ensemble widely adopted in QNN study [16, 18, 25] and reveal rich physics in the phase transition controlled by the target loss function. The target-driven phase transition in the dynamics of QNN suggests a different source to the transition without symmetry breaking. Within the (0 + 1)-dimension spacetime structure of QNNs, there may exist other unexplored sources that can induce phase transition, especially when the QNN has limited depth and controllability. Another intriguing question pertains to the differences between classical and quantum machine learning within this formalism. In our examples, the target O0 can be interpreted as a single piece of supervised data in a supervised machine learning task. Therefore, the phase transition we have discovered could be seen as a simplified version of a theory of data. Classical machine learning also extensively explores phase transitions, whether in relation to learning rate dynamics [121, 122] or the depth of classical neural networks [123, 112]. It is an open question whether some results similar to ours can be established for classical machine learning, especially in the context of the large-width regime of classical neural networks [124]. 3.9 Methods 3.9.1 QNN ansatz and details of the tasks The random Pauli ansatz (RPA) circuit is constructed as Uˆ(θ) = Y L ℓ=1 Wˆ ℓVˆ ℓ(θℓ), (3.30) 44 where θ = (θ1, . . . , θL) are the variational parameters. Here {Wˆ ℓ} L ℓ=1 ∈ UHaar(d) is a set of fixed Haar random unitaries with dimension d = 2n , and Vˆ ℓ is an n-qubit rotation gate defined to be Vˆ ℓ(θℓ) = e −iθℓXˆ ℓ/2 , (3.31) where Xˆ ℓ ∈ {σˆ x , σˆ y , σˆ z} ⊗n is a random n-qubit Pauli operator nontrivially supported on every qubit. Once a circuit is constructed, {Xˆ ℓ , Wˆ ℓ} L ℓ=1 are fixed through the optimization. Note that our results also hold for other typical universal ansatz of QNN, for instance, hardware efficient ansatz. In this chapter, some of our main results are derived for general observable Oˆ. To simplify our expressions, we often consider Oˆ to be tracelss, for instance a spin Hamiltonian, which is not essential to our conclusions. A general traceless operator can be expressed as a random mixture of Pauli strings (excluding identity) Oˆ = X N i=1 ciPˆ i (3.32) with real coefficients ci ∈ R and nontrivial Pauli Pˆ i ∈ {ˆI, σˆ x , σˆ y , σˆ z} ⊗n/{ ˆI ⊗n}. To obtain explicit expressions, we also consider the XXZ model, described by OˆXXZ = − Xn i=1 σˆ x i σˆ x i+1 + ˆσ y i σˆ y i+1 + J σˆ z i σˆ z i+1 + ˆσ z i . (3.33) To help understanding the non-frozen QNTK phenomena, we also consider a state preparation case with the observable Oˆ = |Φ⟩⟨Φ|, where |Φ⟩ is the target state. 3.9.2 Details of restricted Haar ensemble Here we evaluate the average QNTK, relative dQNTK and dynamical index for the restricted Haar ensemble proposed in Eq. (3.29). We focus on state preparation task to enable analytical calculation. As we aim to 4 (a) (b) (c) (d) (e) (f) Haar ensemble average results Restricted Haar ensemble average results Liu et al. Figure 3.6: Ensemble average results under restricted Haar ensemble (top) and Haar ensemble (bottom). In top panel, we plot (a) K with L = 64 fixed, (b) ζ with O0 = 1, (c) λ with O0 = 1 at late time in state preparation. Blue dots in top panel represents numerical results from late-time optimization of n = 2 qubit RPA. Red solid lines represent exact ensemble average with restricted Haar ensemble in Eq. (B.230), (B.288), (B.254) in Sec. B.7. Magenta dashed lines represent asymptotic ensemble average with restricted Haar ensemble in Eq. (3.34), (3.35), (3.36). The observable in all cases is |Φ⟩⟨Φ| with |Φ⟩ is a fixed Haar random state. In the inset of (b), we fix L = 64. In the bottom panel, we plot (d) fluctuation SD[K0]/K0, (e) ζ0, (f) λ0 under random initialization. Blue dots represent numerical results from random initializations of n = 6 qubit RPA. Red solid lines represent exact ensemble average with restricted Haar ensemble in Eq. (B.216), (B.155), (B.95) in Sec. B.6. Magenta dashed lines represent asymptotic ensemble average with restricted Haar ensemble in Eq. (3.44), (3.38), (3.39). The observable in all cases is XXZ model with J = 2. O0 = Omin = −22 for (d)-(f). Orange solid line in (d) represents results from [25]. 46 capture the late time dynamics with the state preparation task, we will be interested in the dynamics when the output state |ψ0⟩ has fidelity ⟨Oˆ⟩ = | ⟨ψ0|Φ⟩ |2 = O0−R−κ, with κ ∼ o(1)indicating late-time where observable is already close to its reachable target. Here the constant remaining term R = O0 − 1 when O0 > 1 and zero otherwise. Note that identity is the maximum reachable target value in state preparation. Under this ensemble, we have the following result (see details in Sec. B.7) Theorem 3 For state projector observable Oˆ = |Φ⟩⟨Φ|, when the circuit satisfies restricted Haar ensemble, the ensemble average of QNTK, relative dQNTK and dynamical index K∞ ≃ L 2d (O0 − R) (1 − O0 + R) + O(κ), (3.34) ζ∞ ≃ R + κ R + κ + 1 − O0 1 − 1 2(O0 − R) + d L , (3.35) λ∞ ≃ L 4d (1 − 2O0 + 2R) − O0 − R 2 + O(κ), (3.36) at the L ≫ 1, d ≫ 1 limit, where the target cost function value O0 ≥ 0, remaining constant R = max{O0 − 1, 0} and κ ∼ o(1). Our results are verified numerically in Fig. 3.6 in state preparation task, where we plot the above asymptotic equations as magenta dashed lines and the full formula in appendices as solid red lines. Subplot (a) plots K∞ versus O0. At late time, if the target O0 ≥ 1, from O0 = 1 + R we directly have K∞ = 0; if O0 < 1, we have R = 0 and K∞ ∝ O0(1 − O0) being a constant. Subplot (b) shows the agreement of ζ∞ versus L, when we fix O0 = 1. As predicted by the theory of Eq. (3.35), as R = 0 in this case, limκ→0 ζ∞ ≃ 1/2 when L ≫ 1. Indeed, we see convergence towards 1/2 as the depth increases. We also verify the ζ versus O0 relation in the inset, where ζ∞ = 0 for O0 < 1, 1/2 for O0 = 1 and diverges for O0 > 1. Note that for a circuit with medium depth L ∼ poly(n), ζ∞ = 1/2 + d/L would slightly deviate from 1/2 for O0 = 1 (Fig. 3.6(b)). This will lead to ‘frozen-error 47 phase’ according to Theorem 1. This is indicating a ‘finite-size’ effect affecting the phase transition, which we defer to future work. Subplot (c) shows the agreement of λ∞ versus L, where the linear relation is verified. As predicted by Eq. (3.36), this is the case regardless of O0 value. 3.9.3 Haar ensemble results We also evaluate the Haar ensemble expectation values for reference, which captures the early time QNN dynamics. Under the Haar random assumption, we find the following lemma Lemma 4 For traceless operator Oˆ, when the initial circuit satisfies Haar random (4-design) and circuit L ≫ 1 and d ≫ 1, the ensemble averages of QNTK, relative dQNTK and dynamical index have leading order K0 =L d tr Oˆ2 2(d − 1)(d + 1)2 ≃ L 2d 2 tr Oˆ2 , (3.37) ζ0 ≃ − 1 L 1 + tr Oˆ4 tr Oˆ2 2 + 1 2 tr Oˆ4 tr Oˆ2 2 − dO0 tr Oˆ3 tr Oˆ2 2 − 3 d , (3.38) λ0 ≃ Ltr Oˆ3 4d tr Oˆ2 . (3.39) Note that for observables with non-zero trace, evaluation is also possible, we present those lengthy formulas and the proofs in Sec. B.6. Meanwhile, it is important to notice the dimension dependence of the trace terms. 48 Specifically, for the XXZ model we considered, when d ≫ 1, the above Lemma 4 leads to K0XXZ ≃ (1 + J 2 ) Ln d (3.40) ζ0XXZ ≃ − 1 L 1 + 3 d − O0 3J(1 − J 2 ) 4(1 + J 2) 2n , (3.41) λ0XXZ ≃ 3J(1 − J 2 )L 4(1 + J 2)d . (3.42) We verified the Haar prediction on ζ0 and λ0 with random initialized circuits in Fig. 3.6(e), (f). Note that when L is large enough, ζ0XXZ scales linearly with O0. In the Haar case, we can also obtain the fluctuation properties. Theorem 5 In the asymptotic limit of wide and deep QNN d, L ≫ 1, we have the ensemble average of QNTK standard deviation (4-design) SD[K0] ≃ 3L 4d 6 d 2 tr Oˆ2 2 − 2d tr Oˆ2 tr Oˆ 2 + tr Oˆ 4 + L 2 4d 5 h d tr Oˆ4 − 4 tr Oˆ3 tr Oˆ i1/2 . (3.43) For traceless operators, Eq. (3.43) can be further simplified and the relative sample fluctuation of QNTK is SD[K0] K0 ≃ 1 √ L L tr Oˆ4 tr Oˆ2 2 + 3 1/2 . (3.44) This result refines Ref. [25] with more accurate ensemble averaging technique and provides an additional term ∼ tr Oˆ4 / tr Oˆ2 2 . Therefore, the sample fluctuation also depends on the observable being optimized. Specifically, for the XXZ model we considered, Eq. (3.44) becomes SD[K0] K0 ≃ s 3 L L d + 1 . (3.45) 49 When L ≫ d, the relative fluctuation SD[K0]/K0 ∼ 1/ √ d is constant. However, as d = 2n is exponential while realistic number of layers L is polynomial in n, therefore d ≫ L is more common, where the relative fluctuation SD[K0]/K0 ∼ p 1/L decays with the depth, consistent with Ref. [25]. We numerically evaluate the ensemble average in Fig. 3.6(d) and find a good agreement between our full analytical formula (red solid, Eq. (B.216) in Supplemental Material) and the numerical results (blue circle). The asymptotic result (magenta dashed, Eq. (3.45)) also captures the scaling correctly. The results refine the calculation of Ref. [25], which has a substantial deviation when L and d are comparable. 50 Chapter 4 Applications of quantum neural network in supervised and unsupervised learning In Section 4.1, I will summarize my work on the decay of classification error in QNN [42]. In Section 4.2, I will summarize my work on the proposal of quantum diffusion model for generative learning [43]. 4.1 Fast decay of classification error in variational quantum circuits 4.1.1 Introduction Quantum computation promises to solve classically intractable problems with a speedup in performance [3]. However, as scalable error-corrected quantum computers are not available, quantum information processing is limited to protocols using noisy intermediate-scale quantum (NISQ) [46] technology. The technological constraints also call for an alternative route towards a quantum advantage. Among the candidates, variational quantum circuits (VQCs) are a class of quantum-classical hybrid systems applicable to various tasks, including optimization [28], state preparation [125, 126], auto-encoding [127, 128], eigen-solvers [14, 27, 15, 129, 130, 131, 132], unsampling and state approximation [133, 134], state classification [135, 136, 31, 137], state tomography [138], sensor networks [87, 88], solving partial-differential equations [139], quantum simulation [140, 141, 142] and machine learning in general [12, 29, 91, 143, 144, 93, 92, 145, 146]. 51 Despite various applications, the fundamental understanding of the capability of VQCs in connection to circuit depth and circuit architecture is still missing. Recent progresses unveil the notion of depth efficiency on expressive and discriminative power in VQCs’ classical counterpart—neural networks [147, 148, 149, 150]; VQCs’ discriminative power on quantum data has also attracted much attention recently, showing great potential in the classification of few-qubit states [136] and quantum phases of many-body systems [31, 137], even in the presence of noise [135]. For VQCs, recent works propose to quantity the expressivity via the effective dimension from an information geometry perspective and show some quantum advantage in expressivity of VQCs compared with their classical counterparts [94]. At the same time, the expressivity of VQC ansatzs is also connected to the barren plateau phenomena [16, 18, 20], where the variance of gradient decays exponentially with the number of qubits in the system and therefore highly limits the training performance of large-scale VQCs. However, regardless of these challenges in training, it is still important to understand a VQC’s discriminative power in connection to its depth and architecture. In this paper, we take a further step to unveil how VQCs’ discriminative power quantitatively connects to circuit depth and architecture, and compares with the ultimate limit [151]. First, to unleash the genuine discriminative power of VQCs, we go beyond the popular approach—a single-qubit measurement on VQC output [152, 153]—to measure all qubits and then perform maximum-likelihood estimation (MLE) on the measurement results, even for a binary classification problem between two states. Our numerical results show that the MLE-VQC approach offers an order-of-magnitude smaller deviation from the Helstrom limit than the single-qubit approach. While both approaches can face the dilemma of barren plateau that limits the trainability of large circuits [16, 18, 20], numerical evaluation shows that our MLE-VQC approach has larger gradients relative to the single-qubit approach. With the full discriminative power of VQCs in hand, we proceed to explore its connection to circuit architecture and depth. When discriminating between complex quantum states, we find that the discrimination error is exponentially suppressed with the continuous increase of the VQC depth, until a saturation to the minimum 52 given by the Helstrom limit [154, 151]. When such a continuous increase of depth is forbidden by the nonextensive architectures, e.g., tree tensor network (TTN), multiscale entanglement renormalization ansatz (MERA) [153] and quantum convolutional neural networks (QCNNs) [31], the discriminative error deviates from the Helstrom limit, even for translation-invariant (TI) or less entangled input states. Indeed, these non-extensive architectures enable better trainability [19] at a cost. For extensive architectures that allow the continuous increase of VQC depth, we find that the discriminative power is closely connected to the scrambling power of VQCs. To reduce the complexity in experimental implementation, we consider simplified VQCs with fewer parameters or gates. Given the same VQC architecture, for symmetric input states, assuming a symmetric VQC gate parameters makes the VQC much easier to train while still competitive in the error probability performance; for real ground states from many-body systems, restricting the VQC to implement a real unitary significantly reduces the number of gates required to achieve the optimal performance. Indeed, simplification of VQCs helps state discrimination only when properly utilizing the symmetry of the input states. 4.1.2 Circuit architecture and main results As shown in Fig. 4.1, to perform state discrimination, our MLE-VQC system utilizes a VQC to process the input state, and then performs measurement. Different from existing approaches [135, 136, 31, 137], we consider a measurement on all qubits and optimal MLE post-processing to perform the state discrimination task. In this paper, we will consider multiple-qubit input states, while infinite-dimensional quantum states can be potentially considered generalizing the approach in Ref. [87, 88]. A VQC is determined by the type of allowed gates and the circuit architecture. To discriminate between general states, we allow each (two-qubit) gate in the VQC to be universal, composed of single qubit rotations and CNOT gates. As each gate only acts on two qubits, the spread and processing of quantum 53 Input state VQC …… MLE z z z z Figure 4.1: Schematic illustration of the MLE-VQC system. As any single qubit rotation can be combined into the VQC, we can fix each single-qubit measurement to be Pauli Z without loss of generality. An MLE strategy is utilized on the measurement results to make the decision. information is determined by the circuit architecture. We will start with the simple 1-D “Brickwall” local circuits (see Fig. 4.2a or Fig. 4.7a) with interchanging between gates acting on two set of neighboring pairs. In Sec. 4.1.5, we benchmark between different circuit architectures, including extensive ones (brickwall, prism and polygon [155]) and non-extensive ones (QCNN [31, 137], TTN and MERA [153]), as shown in Fig. 4.7. We also explore restricted gate sets to simplify VQCs for near-term implementations. In order to retrieve the maximal information from the states output by the VQC, we perform simultaneous single-qubit measurements on all n qubits in the system, as shown in Fig. 4.1. After the measurement, a decision is made on the input state. The performance of such a VQC state discrimination system is described by the error probability. For example, when discriminating between a pair of equal-prior pure states {ψ0, ψ1}, the error probability refers to the probability of events where the decision H˜ is not equal to the true state label H, namely PE(UD; ψ0, ψ1) = 1 − 1 2 X 1 h=0 PH˜ |H (h|h), (4.1) where PH˜ |H (h˜|h) is the conditional probability of making the decision of H˜ = h˜ (the state is ψh˜) while the true label is H = h (the state is ψh) and the dependence on the VQC unitary UD is implicit. In this paper, we take the MLE decision strategy, where the decision on the input state is chosen to maximize the posterior probability of the measurement outcome (see Sec. C.1 for details). Given the VQC and the final measurements, MLE is known to be the optimal decision strategy that minimizes the error probability. 54 The minimum ‘Helstrom’ error probability [154, 151] further optimizes over the measurement bases, leading to the ultimate error probability of state discrimination (see Sec. C.1). When discriminating between a pair of equal-prior pure states {ψ0, ψ1}, the Helstorm limit has a simple closed-form PH (ψ0, ψ1) = 1 2 h 1 − p 1 − | ⟨ψ0|ψ1⟩ |2 i . (4.2) We train the VQC unitary UD to achieve the lowest error probability PE (UD; ψ0, ψ1) in state discrimination (‘dis’) between states {ψ0, ψ1}. The corresponding cost function for training is chosen to be Cdis(UD; ψ0, ψ1) ≡ PE (UD; ψ0, ψ1) − PH (ψ0, ψ1). (4.3) Below we summarize the organization and main results in this work, which are obtained on VQCs after a sufficiently long period of training. In Sec. 4.1.3, we introduce the states being considered for the discrimination task. To make our results on discrimination representative, we consider states generated from random local circuits and ground states of many-body systems. In Sec. 4.1.4.1, we show that our MLE-VQC strategy exponentially suppresses the error with the growth of the circuit depth until the saturation to the Helstrom limit, when discriminating between complex quantum states. The VQC circuit does so by engineering the output state to be highly entangled such that local measurements can realize complex positive-valued operator measure (POVM) elements. In Sec. 4.1.4.2, we further demonstrate the importance of the MLE decision strategy. Given the same circuit architecture and gate set, the performance achievable by a single qubit measurement deviates from the Helstrom limit by an order of magnitude more, compared to the deviation of the MLE case. In addition, the MLE strategy also makes the training easier by increasing the gradients. 55 In Sec. 4.1.4.3, we explore the error decay for states generated by random circuits with different depths. When the complexity of input states—quantified by the preparation circuit depth D0—increases, the depth of the VQC circuit required to achieve close to the Helstrom limit increases linearly with D0. Compared with other tasks such as state generation, we find the VQC state discrimination task is easier, both in terms of the error and trainability, as detailed in Sec. 4.1.4.4. In addition, symmetry in the inputs makes the Helstrom limit larger, but otherwise preserves all other characters in a VQC state discrimination task. Sec. 4.1.5.1 addresses the benchmark between VQC architectures. Regardless of whether the random input states are symmetric or not, the extensive architectures with a constant number of gates per depth (brickwall, prism, polygon) work much better than those with a limited depth (QCNN, TTN and MERA). This shows a limitation of those over-simplified ansatzs, despite their advantage in trainability [19]. Due to nonlocal gates, prism and polygon have slight advantages in error probability over the brickwall architecture, consistent with their scrambling powers [62, 72, 63, 156, 155], as verified by operator size calculations in Sec. 4.1.5.2. Sec. 4.1.5.3 addresses the simplification of VQCs for near-term implementation. When the input is symmetric, given the same circuit architecture, the symmetric VQC ansatz works almost as good as the general VQC ansatz, and is much easier to train due to larger gradients. For ground states of a timereversal symmetric Hamiltonian that have real wave functions, assuming a real matrix representation of the VQC circuit offers similar performance advantages. However, simplifications not based on symmetry and structure of the input can harm the performance. Finally, we conclude with some additional discussions in Sec. 4.1.6. 56 4.1.3 Ensemble of states under discrimination In this section, we explain the ensemble of states being considered to benchmark the performance of our MLE-VQC systems in quantum state discrimination. In order to make the benchmark representative, we consider a wide range of applications for quantum state discrimination, in quantum communication, quantum sensing and many-body physics, which involves different types of states. In quantum communication, the decoding of classical information can be considered as a quantum state discrimination task, and the direct coding part of capacity theorems are often obtained via a hypothesis testing approach [157, 158, 159, 160, 161]. There, the state involved can be simple, for example coherent states in optical communication, and can also be entangled across a large number of inputs in more advanced encoding. In quantum sensing, distributed sensing [162, 163] and other applications [34, 164, 165, 166] involve entangled state in a complex form. In many-body physics, people are interested in detecting complex quantum phases of matter, which involves ground states that can be highly entangled [31, 137]. 4.1.3.1 States generated from local quantum circuits To represent different classes of states involved, we consider quantum states generated by inputting a trivial product state |0⟩ = |0⟩ ⊗n to local quantum circuits, composed of general two-qubit gates acting on neighboring qubits (see Fig. 4.2(a)). As we choose the gates randomly, the ensemble is characterized by the preparation circuit depth D0, and therefore denoted as H (D0). We utilize entanglement entropy as a measure on the complexity of the states generated [167, 168]. For a quantum system in state ρAB divided into subsystem A and B, the von-Neumann entanglement entropy between A and B is S(A, B) = − Tr (ρA log ρA), (4.4) 57 (a) (d) Energy gap Δ (b) Position of bipartition (c) Brickwall VQC ansatz Entanglement entropy Ferromagnetic Paramagnetic External field strength || Figure 4.2: (a) The open boundary local random unitary circuit, “Brickwall”. Every connected boxes represent local Haar 2−qubit unitary gate. Here we show an example of a depth D0 = 6 circuit on n = 6 qubits input. (b) Phase diagram for TFIM. The black curve represents the analytical expression for energy gap as ∆E = 2 (|g| − 1), and the dashed line indicates the critical point. (c) Ensemble-averaged bipartite von Neumann entanglement entropy (see Eq. (4.4)) of states prepared by circuit in (a). Black curve is the Page curve of Haar random states defined in Eq. (4.5). (d) Bipartite entanglement entropy curve for n = 6 spins TFIM ground states with g = 1, 10. Inset shows the maximal bipartite entanglement entropy with different n. where ρA = TrA(ρAB) is the reduced density matrix of the subsystem A. Typically, the entanglement of states in H (D0) and thus circuit complexity grows linearly with depth D0 [167, 168] before saturation. As shown in Fig. 4.2(c), when D0 < n is a fixed constant, the states generated have area-law entanglement where the bipartite entanglement S(A, B) in a system only depends on the boundary, a constant in 1- D system; while when D0 ∝ n is large, the states are typically highly entangled under a volume-law where entanglement S(A, B) is characterized by the size (number of qubits) of subsystem. The growth of entanglement is also analytically characterized by Ref. [62]. Indeed, H (D0) well captures the different problems of interest. In quantum communication, states in H (D0) are used as the random encoding [128] to achieve capacity; in many-body physics, the depth D0 will control the bond dimension of the matrix product representation of states. Moreover, H (D0) are also studied in quantum information scrambling [62, 72] and t-design complexity [169, 170, 61, 171]. 58 For D0 → ∞, the ensemble H (∞) approaches Haar random, where the average entanglement between subsystem A and B, known as Page curve, is [172] SH(∞) (A, B) ≃ log dA − dA 2dB (4.5) where dA, dB are the dimension of subsystem A and B with the assumption that dA ≤ dB. The typical Helstrom limit between states ψ0, ψ1 ∈ H (∞) can be evaluated ⟨PH (ψ0, ψ1)⟩H(∞) = 1 2 (2n+1 − 1) ∼ 1 2 n+2 , (4.6) For a finite D0, when 2 n+1 ≫ 1, Eq. (4.6) still holds to the leading order (see C.2 for details). As many-body systems often have a translational symmetry. We therefore also consider the subset of TI states S (D0), which is prepared by the periodic boundary TI local random unitary circuit with depth D0. The typical Helstrom limit for S (D0) is larger than Eq. (4.6) for H (D0), however still independent of D0 up to the leading order (see Fig. C.1). The increase of typical Helstrom limit for ensembles with symmetry can be understood as follows. The Helstrom limit for two pure states is directly related to the overlap between them; when there is symmetry or other constraints on the random states, the states come from a smaller Hilbert space and therefore have a larger typical overlap. This larger overlap then leads to a larger Helstrom limit from Eq. (4.2). 4.1.3.2 Ground states of many-body systems We also consider ground states of many body systems. We focus on the well-known toy model, transversefield Ising model (TFIM), whose Hamiltonian is HTFIM = − X i ZiZi+1 + g X i Xi , (4.7) 59 Error probability (a) (b) ⟨ ⟩ Depth Figure 4.3: Error probability of binary state discrimination (between two quantum states) with trained MLE-VQC systems. (a) State discrimination between Haar random states or complex TI states S (2n ) in a system of n = 6 qubits. Horizontal dashed lines represent the ensemble-averaged Helstrom limit ⟨PH⟩ and light color area represents the amount of ensemble fluctuation. (b) State discrimination between ground states of TFIM with g = 1 and g = 10 in a system of n = 6, 8, 10 qubits, where the y-axis is in a logarithmic scale. Horizontal dashed lines show the Helstrom limit in each case. We use the open-boundary brickwall VQC ansatz in all cases. where Zi , Xi are Pauli matrices at site i and g is the strength of external field relative to the coupling strength. To reduce the finite-size effects, we consider a periodic boundary condition. As depicted in Fig. 4.2(b), when |g| < 1, the system stays in an ordered ferromagnetic phase; as |g| increases, it transits to a disordered paramagnetic phase. In both phases, the ground states of the system show area-law entanglement (Fig. 4.2(d)). At the critical point, |g| = 1, the system undergoes a quantum phase transition. The entanglement entropy shows a logarithmic scaling behavior, which can also be described by conformal field theory with central charge c = 1/2 [173, 174, 175]. 4.1.4 Performance of the brickwall VQC in state discrimination With the VQC circuit depth increasing, the performance of the MLE-VQC system will eventually approach the ultimate Helstrom limit for pure state discrimination. This is because for an ensemble of pure states being considered in this paper, the optimal POVM elements are also rank-one projectors [176, 177]; therefore additional ancilla is not necessary in the measurement. At the same time, however, the training of the circuit becomes harder as the number of parameters increases and the gradient decays [16, 18, 22, 178, 179, 17, 20]. 60 In this section, we explore the error probability performance and trainability of the MLE-VQC system with the open-boundary brickwall VQC ansatz. To understand the performance with a finite depth circuit, in Sec. 4.1.4.1 we evaluate the decay of error probability towards the Helstrom limit for different ensemble of input states. Then we compare the performance and trainability between the MLE approach and single-qubit measurement approach in Sec. 4.1.4.2. Afterward, we explore the connection between the performance to the input state ensemble complexity in Sec. 4.1.4.3. We close the section by a comparison between generative tasks and discriminative tasks for VQCs in Sec. 4.1.4.4. 4.1.4.1 Fast error decay To begin with, we consider the average error probability for complex states discriminated by VQCs with different depth D. In Fig. 4.3(a), we consider Haar random states (blue dots) and find a fast error suppression– the error probability decays exponentially with D before saturation to the Helstrom limit (blue dashed horizontal line). To represent the symmetric case, we further consider the set of states S(2n ) prepared by TI local quantum circuits with a large enough depth 2 n ≫ 1. While symmetry increases the Helstrom limit (red dashed horizontal line), it does not change the exponential suppression of error probability with depth D(red dots). To extend the results beyond random states, we consider the discrimination between two ground states of TFIM with different parameters g in Fig. 4.3(b). We see as the number of qubits n increases, the Helstrom limit decreases and the error probability shows an exponentially suppressed trend with VQC depth D. Although the number of qubits is limited due to the increasing level of difficulty in the training, we see the depth required to saturate the Helstrom limit scales linearly with the system size. It is worthy to point out that the amount of entanglement in states before the final measurement is high in the MLE-VQC approach. For the Haar random input states, the optimal VQC of different depth D preserves the bipartite entanglement entropy ⟨S(n/2)⟩ at the Page curve value, as shown by the purple line in Fig. 4.5(b). Note that for less entangled inputs H(D0) prepared by random local circuits of depth D0, the 61 MLE (Haar) one measure (Haar) MLE ((2 )) one measure ((2 )) (a) (b) Cost function Variance of gradient Depth Number of qubits Entanglement Depth (c) Figure 4.4: MLE versus single-qubit measurement approach in a brickwall ansatz VQC for discriminating between Haar random states and TI states sampled from S(2n ). We consider a system of n = 6 qubits. We show the ensemble-averaged cost function ⟨Cdis⟩ in (a) and the parameter-averaged variance of gradient ⟨Var (gi)⟩ i for ansatz with depth D = 2 in (b). We show the half bipartite von Neumann entanglement entropy ⟨S(n/2)⟩ of VQC output states with MLE (circles) and single-qubit measurement (diamonds) in (c). The input states are Haar random states. Black dashed line indicates the Page entanglement (see Eq. (4.5)). In (a), the line going below the plot region decreases to zero at the machine precision; we choose the plot range to make the trends clear. VQC circuit increases the level of entanglement ⟨S(n/2)⟩ at the output side before the final measurements (green, red, blue for D0 = 2, 4, 6). From this, we see that the VQC is essentially sorting and increasing entanglement between the qubits to enable the best performance on the final separable measurement. In contrast, for the single-qubit measurement approach, as shown in Fig. 4.4(c). This is because when the final measurement is only performed on a single qubit, the circuit tries to concentrate all valuable information onto a single-qubit. Therefore, there is no incentive for the VQC to entanglement the final qubits, rather it tries to disentangle that qubit being measured from the rest of the system. 4.1.4.2 MLE’s superiority over single measurement schemes The simultaneous single-qubit measurements on all qubits and optimal MLE decision rule are crucial for our MLE-VQC approach to unleash the full power of the brickwall ansatz. To demonstrate such, we benchmark the MLE-VQC approach against the VQC with only a single-qubit Z−measurement at the center. To show their difference, we still focus on two ensembles, Haar random states and complex TI states S(2n ). As shown in Fig. 4.4(a), the residual error utilizing MLE-VQC approach is around an order of magnitude smaller than that of the single-qubit approach at a given D, which shows the power of MLE to gather the 62 full information from all qubits. Moreover, the MLE case shows a sharp drop in the error at a large enough VQC depth D, while the single-qubit case has a consistent decay. As for trainability, although the variance of gradient for both approaches decreases exponentially with the number of qubits n, the MLE approach typically has a larger gradient and therefore is easier to train (see Fig. 4.4(b)). Note that the local 2− qubit gates in VQC ansatzs we study are Haar random gates, and thus the depth of VQCs falls in the range of barren plateau for both single-qubit approach and MLE strategy whose trainability is much different in shallow VQCs according to their corresponding local and global cost function [18]. 4.1.4.3 Linear growth of complexity Here we explore how the complexity of the input state ensemble affects the error probability. As explained in Section 4.1.3.1, we can tune the complexity of output states H (D0) produced in a depth-D0 local random circuit by controlling the depth D0; therefore, we study the discrimination between states sampled from H (D0). For states sampled from H(D0), the fast suppression of error probability still holds, as shown in Fig. 4.5(a). With increasing input states complexity, the discrimination task becomes harder, leading to an increasing error probability for a fixed VQC depth D. As the saturation towards the Helstrom limit has a long tail, we consider the number of layers Dc required to achieve an error probability PE = 2PH. In Fig. 4.5(c), we see a linear growth of Dc with D0, as expected. This can also be explained by a suboptimal strategy mimicking the Kennedy receiver [180], which implements the POVM element Π0 = |ψ0⟩ with a depth D ∼ D0 VQC to achieve the error probability | ⟨ψ0|ψ1⟩ |2/2 ≃ 2PH when PH ≪ 1. One can also understand the increase from the increase of entanglement in the input state, which also shows a linear trend, as depicted in Fig. 4.5(d). Our error probability results are obtained from a finite system of six qubits, however, extending to larger systems 63 (a) = 2⟨⟩ Error probability ⟨ ⟩ Depth ⟨ ( / 2 ) ⟩ ⟨ ( / 2 ) ⟩ (b) (c) (d) Preparation circuit depth 0 Figure 4.5: (a) Discrimination error probability using the brickwall ansatz between a pair of random states sampled from H(D0) for n = 6 qubits. Black and grey dashed horizontal lines show the Helstrom limit ⟨PH⟩ and a critical value ⟨PE⟩ = 2 ⟨PH⟩. Note the relative large error bar in the error probability in states H(D0) generated by a shallow depth VQC is due to the lack of self-averaging. (b) Average maximal bipartite entanglement entropy ⟨S(n/2)⟩ of VQC output states before measurements for H(D0) discrimination. (c) The critical depth Dc to achieve ⟨PE⟩ = 2 ⟨PH⟩ versus input complexity D0. (d) Maximal bipartite entanglement entropy ⟨S(n/2)⟩ of the input states to be distinguished. In (c) and (d), we plot odd and even D0 separately. to further consolidate the conclusion is challenging, due to the exponential decay of gradient shown in Sec. 4.1.4.4. 4.1.4.4 Comparison between state generation and discrimination: performance and trainability To understand the level of difficulty of state discrimination (‘dis’), we benchmark with the most relevant task of state generation (‘gen’) [133, 134]. In an n-qubit state generation task, the VQC performs a unitary UD on a trivial product state |0⟩ = |0⟩ ⊗n to approximate a target state |ψ⟩. Similar to the discrimination case in Eq. (4.3), we utilize the following cost function Cgen(UD; ψ) ≡ 1 − | ⟨ψ|UD|0⟩ |2 , (4.8) as a function of the VQC unitary UD. From Fig. 4.6(a), we can identify a sharp transition in the cost function C for both discrimination and generation, where C is exponentially suppressed before reaching an extremely small value. Although we can find that Cdis is about an order of magnitude smaller than Cgen, due to the different cost function 64 definition, the gap does not necessarily indicate one task is harder than the other. Although we have focused on the brickwall ansatz in Fig. 4.6(a) in this section, the same gap between discrimination and generation exists in other architectures as we will discuss in Sec. 4.1.5. To explore the trainability of VQCs, we evaluate the gradient of the cost functions to the parameters. For both the generation and discrimination tasks, the cost function gradient gi with respect to parameter θi can be obtained numerically from a central finite-difference. Note that as the gradient can be positive or negative, we evaluate the variance Var(gi) among different positions to get a sense of the magnitude of gradients, similar to Ref. [16]. Moreover, we take an average over the different gradient directions to obtain the average variance of gradient ⟨Var(gi)⟩ i . In Fig. 4.6(b), the parameter-averaged variance of gradient decays exponentially with the number of qubits n, predicted by the well-known barren plateau phenomena [16]. The barren plateau, combined with the high average entanglement from t-design, makes training specifically hard in large systems for the t-design VQCs we study here; to mitigate the difficulty, we discuss the simplification of VQC ansatzs and utilization of symmetry in Sec. 4.1.5.3. 4.1.5 Performance benchmarks of different ansatz In Sec. 4.1.4, we employ the brickwall VQC ansatz, where each two-qubit gate is applied to pairs of nearest neighbors in one dimension. As we mentioned in Sec. 4.1.2, various other architectures have been proposed for different tasks. Since we find that VQCs relying on increasing the amount of entanglement in the quantum states to approach the Helstrom limit, we expect that architectures with non-local gates might improve the discriminative power of VQCs. In Sec. 4.1.5.1, we offer a benchmark between different architectures to confirm this. Then we provide insights into the different performance by evaluating the scrambling power of the VQCs in Sec. 4.1.5.2. Despite the different architectures, in the NISQ era, VQC implementations are limited in the circuit depth and number of gates, due to the accumulation of device imperfections. Therefore, we further explore 65 discrimination (Haar) generation (Haar) discrimination ((2)) generation ((2)) (a) (b) Cost function Variance of gradient Depth Number of qubits Figure 4.6: Performance and gradient of the open-boundary brickwall VQC. We consider two different state ensembles, Haar random states and TI states S(2n ). (a) Ensemble-averaged cost functions for discrimination and generation. Blue and red curves show the ensemble of Haar random states and TI states separately in a system of n = 6 qubits. (b) Parameter-averaged variance of gradient ⟨Var (gi)⟩ i in the D = 2 ansatz. In (a), the lines going below the plot region decrease to zero at the machine precision; we choose the plot range to make the trends clear. simplification of the gate sets in Sec. 4.1.5.3. In particular, we find that symmetry in the input states allows VQCs to be simplified. 4.1.5.1 Comparison between different architectures In this section, we benchmark the various VQC architectures (see Fig. 4.7) for both state discrimination and generation tasks. An architecture determines the layout of the quantum gates and therefore constrain the information flow in the VQC. In Sec. 4.1.4, we have focused on the two-local brickwall ansatz. To extend the interactions beyond two-local, prism and polygon [155] architectures generalize the line geometry to different shapes. These three architectures are extensive—they have the number of gates per layer roughly unchanged as the depth of the circuit increases. Other popular choices of architectures have a fixed depth and therefore are not extensive, including QCNN [31, 137] and tensor network architectures (TTN and MERA) [153]. In Fig. 4.8(a), we begin the benchmark with the error probability performance in discriminating Haar random state pairs. We see that the extensive architectures (brickwall, prism and polygon) provide a better performance over the non-extensive architectures (QCNN, TTN and MERA) at the same depth D. In particular, the extensive ones saturate the Helstrom limit at D = 5 exactly, while the non-extensive ones 66 are far from the Helstrom limit even at a larger depth. Among the three extensive architectures, we find prism and polygon to be slightly better than the brickwall architecture at a finite depth D, due to non-local gates more efficiently processing the global quantum information. The same conclusions also generalize to the complex TI inputs S(2n ), as shown in Fig. 4.8(c). In some sense, it is expected that QCNN, TTN and MERA do not work well, as they are developed for problems with specific structure and symmetry. We also consider the task of state generation, and find similar conclusions to hold—the relative ordering of the error is identical to that in state discrimination, as shown in Fig. 4.8(b)(d) for the Haar ensemble and TI ensemble. This shows a consistent ordering of quantum information processing power among the architectures, which also agrees with the quantum information scrambling capabilities explored in Sec. 4.1.5.2. Comparing between the state generation task in Fig. 4.8(b)(d) and the state discrimination task in Fig. 4.8(a)(c), we also extend the previous conclusion in Sec. 4.1.4.4 that state discrimination is easier than state generation to VQCs with different architectures. Comparing Fig. 4.7 bottom panels (c,d) with the top panels (a,b), we can see that for a general VQC, translational symmetry in the input merely increases the Helstrom limit, while at any depth the deviations to the Helstrom limit (the cost function) are almost identical with and without input symmetry. Although here symmetry does not make much difference, it will allow simplifications of the VQC, as we will explain in Sec. 4.1.5.3. 4.1.5.2 Discriminative power versus scrambling power As the circuit depth increases, a VQC generates more entangled outputs in order to achieve a better error probability in state discrimination (see Fig. 4.5(b)). The expressivity and entangling power of VQCs are also studied in terms of the connection pattern of 2− qubit gates in a relatively small system from a general perspective of view [181]. Because entanglement growth is also an important indicator of quantum 67 (a) Brickwall 1 2 3 4 5 6 (b) QCNN 1 2 3 4 5 6 (c) TTN (d) MERA 2 3 4 5 6 1 2 3 4 5 6 1 (e) Prism 2 3 4 5 6 1 1 2 3 4 5 6 (f) Polygon 2 3 4 5 6 1 1 2 6 3 5 4 Figure 4.7: Topological architectures of VQCs. Two connected boxes represents a universal 2−qubit gate. For (a) brickwall, (e) prism and (f) polygon ansatzs, we show the case with depth D = 2, 3, 3 separately. information scrambling in the circuit, we evaluate the scrambling power of the VQCs utilized in Sec. 4.1.5, in comparison to their performances. Similar to Ref. [155], we choose the operator size [63] as the metric to evaluate the scrambling power of VQCs. To define operator size, we consider a Pauli-Z operator M0 = I1 ⊗ · · ·Zn/2 ⊗ · · · ⊗ In initially located at the center, where Ik is an identity operator acting on the kth qubit. Under the VQC represented by unitary UD, the operator evolves to MD = U † DM0UD. In general, MD can be expanded in the Pauli 68 (a) (b) (c) (d) Brickwall Prism Polygon QCNN TTN MERA Discrimination Generation Depth Cost function Figure 4.8: Cost functions for the discrimination (left) and generation (right) among Haar random states (top) or complex TI states S(2n ) (bottom) in a system of n = 6 qubits. We compare performance with different VQC architectures. For comparison, the Helstrom limit ⟨PH⟩ ∼ 10−2.4 for (a) and ⟨PH⟩ ∼ 10−2 for (c), as shown in Fig. 4.3. The relative differences in the cost functions between top and bottom panels are below 3%. The lines going below the plot region decreases to zero at the machine precision; we choose the plot range to keep the trends clear. Note that each of the non-extensive architectures (TTN, MERA, QCNN) have a single fixed depth and therefore is represented by a single dot. bases, i.e., MD = P S αSS, where S = ⊗n k=1σk is a Pauli string with σk being one of the four Pauli operator Ik, Xk, Yk, Zk at kth qubit. The size of the evolved operator Size (MD) = X S |αS| 2L(S), (4.9) where L(S) is the number of non-identity elements in the Pauli string S. The operator size starts from the minimum value of unity when the operator MD is a single-qubit local operator, and saturates to the maximum value of 3n/4 when it is uniformly distributed in the space spanned by Pauli strings. We numerically study the ensemble-averaged operator size with different VQC architectures in Fig. 4.9, when each two-qubit gate is randomly chosen. Comparing the operator size growth with the performances in state discrimination and generation of Fig. 4.8, We find the same ordering for all VQC architectures. 69 0 2 4 6 8 10 12 Depth D 1.0 2.0 3.0 4.0 4.5 Operator size Brickwall Prism Polygon Figure 4.9: Ensemble-averaged operator size growth with the VQC depth for Z initially localized at the n/2-th qubit in an n = 6 system. Number brickwall sVQC real sVQC Parameters ∼ 15 2 nD 3n(D + 1) n(D + 1) CNOT/CZ gates ∼ 3 2 nD ∼ 1 2 nD ∼ 1 2 nD Table 4.1: The number of parameters and CNOT/CZ gates of brickwall ansatz, sVQC and real sVQC ansatz to the leading order O(nD) with depth D in a system of n qubits. This consistency confirms the connection between the scrambling power and the discrimination power of VQCs. 4.1.5.3 Performance of simplified gate sets In the NISQ era [46], quantum circuit implementations are limited due to device imperfections. In particular, the imperfections accumulate with the increase of the number of quantum gates and the depth of the circuit. In all the VQC architectures being explored in Sec. 4.1.5.1, the realization of a universal two-qubit gate in fact requires three CNOT gates and additional qubit rotations (see C.3), creating extra burdens in the VQC implementation. Another major constrain comes from the vanishing gradient due to barren plateau [18] that prevents the efficient training of VQCs, which limits the scale of the implementations. In this section, we consider different ways to simplify the quantum gates in the brickwall VQC and probe the induced change of the performance and trainability in state discrimination. 70 (a) TI brickwall ansatz periodic-boundary brickwall ansatz Variance of gradient (b) Cost function Depth Number of qubits Figure 4.10: Cost function ⟨Cdis⟩ (a) and average variance of gradient ⟨Var (gi)⟩ i (b) of different brickwall ansatz in discriminating between random states sampled from S(2n ) with number of qubits n = 6. We take TI and periodic-boundary brickwall ansatz. In (b) all ansatzs are set to be D = 2. In (a), the lines going below the plot region decrease to zero at the machine precision; we choose the plot range to make the trends clear. As we are often considering state discrimination between TI states, a natural attempt to simplify the VQC is to enforce TI on each layer of the VQC, including the gate parameters and a periodic boundary. As the TI symmetry reduces the number of parameters, we expect TI VQCs to be more trainable, which is confirmed by the gradient evaluations in Fig. 4.10(b): the TI VQC typically shows a much larger gradient. Although assuming symmetry might lose some performance, however, as show in Fig. 4.10(a), we find that when the input is symmetric, the TI brickwall ansatz (blue) provides almost identical performance to the periodic-boundary brickwall ansatz without the symmetry constraint (red). This shows that enforcing the VQC to have the same symmetry as the input simplifies the training of the VQC, while not losing much performance. Similar benefits from simplification are being explored in other tasks [182]. Next, we consider simplifying the set of gates in VQCs. We replace each universal two-qubit gate in Fig. 4.7(a) (which requires three CNOT gates) with a single CZ gate ∗ , and insert general single-qubit rotations in between each layer of CNOT gates, leading to the simple VQC (sVQC) of Fig. 4.11(a1) similar to the designs in Refs. [27, 93]. In an sVQC, each single-qubit rotation R(θ1, θ2, θ3) ≡ e −iθ1Z/2 e −iθ2Y /2 e −iθ3Z/2 is characterized by three angles θ1, θ2 and θ3, where Z, Y are Pauli matrices. We can further reduce the complexity of each single-qubit gate by constraining its matrix representation to be real (via fixing θ1 = θ3 = 0), leading to the real sVQC architecture which only implements real unitary UD in Fig. 4.11(a2). ∗ an extra CZ gate is inserted every two layers to form a periodic boundary condition 71 (a1) (a2) real sVQC ansatz sVQC ansatz brickwall ansatz Cost function Variance of gradient Number of qubits (b1) (b2) (c1) (c2) Layers of CNOT/CZ ∗ Figure 4.11: Architectures and performance for nisq VQCs. (a) Layout of an n = 6-qubit sVQC ansatz and real sVQC ansatz, plotted in (a1) and (a2) separately. The sVQC ansatz consists of CZ gates and generic single qubit rotations, and the real sVQC ansatz consists of CZ gates and RY rotations. The circuits surrounded by the red dashed box represent D⋆ = 2 layers and at the end of the circuit each qubit is applied with a rotation. (b1)-(b2) Cost function ⟨Cdis⟩ and average variance of gradient ⟨Var (gi)⟩ i for discriminating between Haar random states. (c1)-(c2) Cost function and average variance of gradient for discriminating between TFIM ground states with g = 1, 10. We benchmark in a system of n = 6 qubits. In (b2) and (c2) we take D⋆ = 6 for all ansatz. Note for the brickwall ansatz, it corresponds to D = 2. In (b1) and (c1), the lines going below the plot region decrease to zero at the machine precision; we choose the plot range to keep the trends clear. 72 Real sVQCs are widely utilized in eigen-solvers [131, 132], as the ground state of a time-reversal-symmetric Hamiltonian can be taken as real. The overall number of parameters (which equals the number of single qubit gates) and the number of CNOT/CZ gates are listed in Tab. 4.1 for comparison: both sVQC and real sVQC reduce the number of CNOT/CZ gates by a factor of three; while sVQC reduces the number of parameters roughly by half, the real sVQC reduces the number of parameters roughly by a factor of seven. To ensure a fair comparison of error probability performance and trainability, instead of the depth D of VQCs in terms of universal two-qubit gates that we consider in most of the paper, we count the number of layers of CNOT/CZ gates D⋆ in the final physical implementations in the original (brickwall) ansatz, sVQC ansatz and real sVQC ansatz. In Fig. 4.11, we find that sVQC outperforms the original ansatz consistently at the same D⋆ , while the trainability barely changes. This is due to a certain level of redundancy in requiring each two-qubit gate to be universal in the original ansatz. The real sVQC further restricts the unitary implemented by the VQC to be real without losing much performance compared to sVQC in the discrimination between real ground states of TFIM, as shown in Fig. 4.11(c1); indeed, the trainability of the real sVQC improves due to the further simplification, as indicated by the larger gradients shown in Fig. 4.11(c2). While for Haar random states, due to the lack of symmetry of the input, such a restriction to real unitaries harms the performance while not improving the trainability, as shown in Fig. 4.11(b1)(b2). 4.1.6 Discussion To conclude, we propose an MLE-VQC scheme for quantum data classification, which shows an enormous error probability advantage over the single-qubit measurement approach. As the depth of the VQC increases in an extensive way, the error probability of VQCs decreases exponentially towards the Helstrom limit. Despite being popular choices, non-extensive VQCs such as QCNN and MERA are suboptimal in their error probability performances. The proposed MLE-VQC scheme can be implemented on a near-term quantum device, and has the potential to be used in various applications. It is an important future direction 73 to explore the MLE-VQC’s performance in different applications, as in each application the symmetry and structure of the problem vary and may allow additional simplifications. Finally, we discuss some important messages of our work. Adopting VQCs to the symmetry of the problem will maintain the discriminative power for the particular problem while simplifying the VQC implementations. For example, when the input is TI, constraining the VQC to be TI not only reduces the number of parameters and improves the trainability, but also preserves the performance; Similar advantages apply to assuming real VQCs for real ground states of TFIM. However, oversimplification can be problematic. As we have seen in Fig. 4.10, constraining the unitary to be real will significantly harm the performance when discriminating between general complex quantum states. As in the typical case, powerful VQCs are hard to train due to small gradients [18], one needs to utilize the structure and symmetry of the task to simplify the circuit and enable efficient training. Our results indicate that such a simplification needs to be tailored with caution. 4.2 Generative quantum machine learning via denoising diffusion probabilistic models 4.2.1 Introduction Variational parameterized quantum circuits (PQC) [13, 144, 94, 31] provide a near-term platform for quantum machine learning [29, 32, 183]. In terms of generative models [184, 185, 186, 187], quantum generative adversarial networks (QuGANs) is recently proposed [39, 188, 189, 190, 191], in analogy to classical generative adversarial networks (GANs) [35]. Despite the success, classical GAN models are known for training issues such as mode collapse. In classical deep learning, denoising diffusion probabilistic models (DDPMs) and their close relatives [192, 37, 36, 193, 194] have recently gained much attention due to relatively simple 74 PQC � � () 0 ⊗ Z � PQC …… (−1) �+1 � (+1) 0 ⊗ Z � (1) PQC �1 0 ⊗ Z � (0) � () QSC 1 +1 …… −1 () 1 0 QSC QSC +1 …… …… Backward denoise process via measurement (c1) = 0 Forward noisy diffusion process via scrambling (b1) = 0 (b2) = 5 (b3) = 10 (b4) = 15 (b5) = 20 (c2) = 5 (c3) = 10 (c4) = 15 (c5) = 20 sample data noise generated data noise (a) (d) Figure 4.12: Schematic of QuDDPM. The forward noisy process is implemented by quantum scrambling circuit (QSC) in (a), while in the backward denoising process is achieved via measurement enabled by ancilla and PQC in (d). Subplots (b1-b5) and (c1-c4) presents the Bloch sphere dynamics in generation of states clustering around |0⟩. training schemes and their ability to generate diverse and high-quality samples in many computer vision tasks [195, 196, 197, 198] over the best GANs, and to incorporate flexible model architectures. In this work, we propose the quantum denoising diffusion probabilistic model (QuDDPM) as an efficiently trainable scheme to generative quantum learning, through a coordination between a forward noisy diffusion process via quantum scrambling [62, 72] and a backward denoising process via quantum measurement. We provide bounds on the learning error and then demonstrate QuDDPM’s capability in examples relevant to characterizing quantum device noises, learning quantum many-body phases and capturing topological structure of quantum data. For an n-qubit problem, QuDDPM adopts linear-in-n layers of circuits to guarantee expressivity, while introduces T ∼ n/ log(n) intermediate training tasks and to guarantee efficient training. 75 4.2.2 General formulation of QuDDPM We consider the task of the generating new elements from an unknown distribution E0 of quantum states, provided only a number of samples S0 = {|ψk⟩} ∼ E0 from the distribution. The task under consideration— generating individual states from the distribution (e.g., a single Haar random state or K-design state)—is not equivalent to generating the average state of a distribution (e.g., a fully mixed state for Haar ensemble) considered in previous works of QuGAN [39]. To complete the task, QuDDPM learns a map from a noisy unstructured distribution of states to the structured target distribution E0. It does so via a divide-andconquer strategy of creating smooth interpolations between the target distribution and full noise, so that the training is divided to sub-tasks on low-depth circuit to avoid barren plateau [16, 18, 22, 21]. As shown in Fig. 4.12, QuDDPM includes two quantum circuits, one to enable the forward diffusion of sample data towards noise via scrambling and one to realize the backward denoising from noise towards generated data via measurement. For each data |ψ (0) i ⟩, the forward scrambling circuit (Fig. 4.12a) samples a series of T random unitary gates U (i) 1 , . . . , U(i) T independently, such that the ensemble Sk = {|ψ (k) i ⟩ = Qk ℓ=1 U (i) ℓ |ψ (0) i ⟩}i evolves from the sample data towards a random ensemble of pure states from k = 0 to k = T. A Bloch sphere visualization of such a forward scrambling dynamics is depicted in Fig. 4.12(b1-b5) for a toy problem of learning single-qubit states S0 clustered around a single pure state, e.g. |0⟩, (b1), where the noise ST is uniform on Bloch sphere (b5). With the interpolation from data S0 and noise ST in hand, the backward process can start from randomly sampled noise S˜ T (Fig. 4.12c5) and reduce the noise gradually via measurement step by step, towards the final generated data S˜ 0 (Fig. 4.12c1) that mimic the sample data (Fig. 4.12b1). Measurements are necessary, as the denoising map is contractive and maintain the purity of each generated data in S˜ 0. As shown in Fig. 4.12d, each denoising step adopts a unitary U˜ k on nA number of ancilla qubits in |0⟩ and performs a projective measurement in computational bases on the ancilla after the unitary U˜ k. Starting from the state |ψ˜ (T) i ⟩, which is randomly sampled from noise ensemble, each unitary plus measurement step evolves the 76 Forward noisy diffusion process via scrambling QSC 1 +1 …… −1 () 1 0 Backward denoise process via measurement QSC QSC PQC � � () 0 ⊗ Z � …… (−1) +1 PQC �+1 � (+1) 0 ⊗ Z � (1) PQC �1 0 ⊗ Z � (0) � () …… …… Min ℒ(, tr) Figure 4.13: The training of QuDDPM at each step t = k. Pairwise distance between states in generated ensemble ψ˜ (k) i ∈ S˜ k and true diffusion ensemble ψ (k) j ∈ Sk is measured and utilized in the evaluation of the loss function L. random state towards the generated data |ψ˜ (0) i ⟩. Note that here all unitaries U˜ k are fixed after training. In practice, the generation of noisy |ψ˜ (T) i ⟩ can be directly completed by running the T layers of forward scrambling circuit. Via training, the denoising process learns information about target from the ensembles in the forward scrambling, stores information in the circuit parameters, and then encodes onto the generated data. 4.2.3 Training strategy In classical DDPM, the Gaussian nature of the diffusion allows efficient training via maximizing an evidence lower bound (ELBO) for the log-likelihood function, which can be evaluated analytically [192, 37] (Sec. D). However, in QuDDPM, we do not expect such analytical simplification to exist at all—classical simulation of quantum device is inherently inefficient. Instead, the training of the QuDDPM relies on the capability of quantum measurements to extract information about the ensemble of quantum states for the efficient evaluation of a loss function. The training of a T-step QuDDPM consists of T training cycles, starting from the first denoising step U˜ T towards the last U˜ 1. As shown in Fig. 4.13, at the training cycle (T + 1−k), the forward noisy diffusion 77 process is implemented from U (i) 1 to U (i) k to generate the noisy ensemble Sk = {|ψ (k) i ⟩}i . While the backward denoising process performs the denoising steps U˜ T to U˜ k+1 to generate the denoising ensemble S˜ k = {|ψ˜ (k) i ⟩}i . Within the training cycle, the parameters of the denoising PQC U˜ k+1 are updated such that the generated denoising ensemble S˜ k converges to the noisy ensemble Sk. Therefore, QuDDPM divides the original training problem into T smaller and easier ones. Indeed, even with a global loss function, we can divide the Ω(n) layers (required by expressivity) of gates to T ∈ Ω(n/ log n) diffusion steps, such that each U˜ k+1 has order log(n) layers of gates to avoid barren plateau [18]. 4.2.4 Loss function To enable training, a loss function quantifies the distance between the two ensembles of quantum states. In this work, we focus on the Maximum Mean Discrepancy (MMD) [199] and the Wasserstein distance [200, 201] based on the state overlaps |⟨ψ (k) i |ψ˜ (k) i ⟩|2 estimated via a swap test in Sec. D. Given two independent distributions of pure states E1 and E2 on the state vector space V . The statewise fidelity between |ψ⟩ and |ϕ⟩ is defined as F(|ϕ⟩, |ψ⟩) = |⟨ϕ|ψ⟩|2 , and we can further define the mean fidelity as F(E1, E2) = E|ϕ⟩∼E1,|ψ⟩∼E2 |⟨ϕ|ψ⟩|2 , (4.10) where the random states |ϕ⟩ ∼ E1 and |ψ⟩ ∼ E2 are drawn independently. Since the fidelity F is a symmetric and positive definite quadratic kernel, according to the theory of reproducing kernel Hilbert space (RKHS), the MMD distance can be written as (Sec. D) DMMD(E1, E2) = F(E1, E1) + F(E2, E2) − 2F(E2, E1), (4.11) which allows the estimation of MMD through sampled state ensembles S1 and S2. The expressivity of general MMD as a statistical distance measure depends on the kernel. On one hand, identifiability requires, 78 0 10000 20000 30000 40000 Training steps 0.0 0.2 0.4 0.6 0.8 1.0 DMMD ( ˜St, E0 ) QuDDPM QuDT QuGAN Training Testing 20 15 10 5 0 Diffusion steps diffusion Figure 4.14: The decay of MMD distance D between generated ensemble S˜ t using different models and target ensemble of states E0 clustered around |0, 0⟩ versus training steps. The converged value is D ≃ 0.002 for QuDDPM, showing a two-order-of-magnitude advantage over QuDT and QuGAN. distance is zero if and only if E1 = E2. On the other hand, one also needs to ensure the quality of statistical estimation of the distance with finite sample size of state ensembles. Hence, whether fidelity (4.10) is a proper kernel choice is problem-dependent. In Sec. D, we show an example where DMMD in Eq. (4.11) fails to distinguish two simple distributions. To resolve such an issue, we may alternatively consider the Wasserstein distance, a geometrically meaningful distance for comparing complex data distributions based on the theory of optimal transport [200, 201] (see Methods for details). As shown in Fig. 4.13, in the training cycle t = k, loss is a function of the unitary {U˜ ℓ} T k+1 and depends on the noise distribution E˜ T and the scrambled data distribution Ek, L({U˜ ℓ} T k+1, Ek, E˜ T ) = D Ek, E˜ k[{U˜ ℓ} T k+1, E˜ T ] , (4.12) where D can be the MMD distance or the Wasserstein distance. The distribution E˜ k is a function of all the reverse denoising steps and the noise distribution E˜ T . In practice, we use finite samples to approximate the loss function as L({U˜ ℓ} T k+1, Sk, S˜ T ). Here we present the training history of a more challenging two-qubit example of preparing states clustered around |0, 0⟩, to allow a meaningful comparison with other algorithms. In each of the 20 steps of 79 training cycles, the loss function is minimized till convergence. To quantify the convergence, we also evaluate the MMD distance (see Fig. 4.14) D(S˜ t , E0) between the true distribution E0 and the trained ensemble of states S˜ t throughout the training cycles (blue), showing a convergence towards D = 0. The periodic spikes show the initial increase of the MMD distance at each training cycle, due to introducing randomly initialized PQC in a new denoising step. For reference, we also plot the evolution of the MMD distance throughout the forward-diffusion (red circles), which starts from zero at diffusion step 0 and grows towards larger value as diffusion step increases (from right to left). We see the training results (blue) follows closely to the diffusion results (red) as expected. In addition, the testing results (green) also agree well with the training results (blue) for QuDDPM. As benchmarks, we consider two major quantum generative models, QuGAN and quantum direct transport (QuDT). QuDT can be regarded as the generalization of quantum circuit Born machine [202, 203, 204, 205] towards quantum data. Previous works of both models only considered a single quantum state or classical distributions [189, 206, 203], and here we generalize them to adapt to the state ensemble generation task by allowing Haar random states as inputs and introducing ancilla to be measured Sec. D. For a fair comparison, we keep the number of variational parameters of generator circuits in QuDT and QuGAN the same as QuDDPM, listed in Tab. 4.3. As shown in Fig. 4.14, QuDT and QuGAN converges to generate ensembles with a substantial MMD deviation to the true ensemble, demonstrating QuDDPM’s advantage due to its unique diffusion and denoising process. 4.2.5 Gate complexity and convergence Now we discuss the number of local gates required and convergence analysis for QuDDPM to solve an n-qubit generative task. For simplicity, we assume the qubits are one-dimensional with nearest neighbor interactions, while similar counting can be done for other cases. To guarantee convergence towards noise, the forward scrambling circuits need the number of layers linear in n as predicted by K-design [171, 207], 80 100 101 T 10−2 10−1 100 Egen (a) 101 102 N (b) Figure 4.15: The generalization error of QuDDPM in generating cluster states versus (a) diffusion steps T and (b) training dataset size N. Dots are numerical results and orange dashed line is linear fitting results with both exponents equal to 1 within the numerical precision. leading to O(n 2 ) total gates. The backward circuit will be similar, with at most nA ≤ 2n additional ancillas and O(n 2 ) gates, leading to the overall gate complexity of QuDDPM to be O(n 2 ). Similar to the classical case [208], the total error of QuDDPM involves three parts, E ≃ Ediff + EM + Egen, (4.13) with deviation of ST to true random states Ediff, measurement error EM and generalization error Egen. We discuss the scaling of three part separately in the following. Suppose the diffusion circuits approach approximate K-design, its diffusion error is known as [171] Ediff ∼ 2 nKe −T /A(K)C ∼ O e −T (4.14) where A(K) = ⌈log2 (4K)⌉ 2K5 t 3.1/ log(2) is a polynomial of K and C is a constant determined by the circuit in a single step. For measurement, the standard error in estimating the fidelity Fij between any two states |ψi⟩, |ψ˜ j ⟩ is SE(Fij ) = p (1 − Fij )/m, where m is the number of repetitions of measurement. With N data in the two sets S, S˜, the measurement error of estimating the mean fidelity is EM = 1 N2 vuutX N i,j=1 SE (Fij ) 2 ∼ O 1 N √ m . (4.15) 8 Correlated noise learning (a) (b) Many-body phase learning Figure 4.16: Generation of states with probabilistic correlated noise on a specific state in (a) and (b) states with ferromagnetic phase. In (a), average fidelity F10 between states at step t and |10⟩ for diffusion (red), training (blue) and testing (green) are plotted. In (b), we show the distribution of magnetization for generated data from training (blue) and testing (green) data set, and compared to true data (red) and full noise (orange). Four qubits are considered in (b). Finally, we provide numerical evidence that the generalization error [209, 210] Egen({U˜ ℓ} T 1 ) ≡ L({U˜ ℓ} T 1 , E0, E˜ T ) − L({U˜ ℓ} T 1 , S0, S˜ T ) (4.16) has the scaling O 1 T N , as shown in Fig. 4.15 for an n = 4 qubit clustering state generation task. Here we estimate the generalization error via a validation set independently sampled, while the proof is an open problem [209, 210]. The 1/N scaling agrees with classical machine learning results [211, 212]. 4.2.6 Applications To showcase QuDDPM’s applications, we consider a particular realization of QuDDPM with each unitary U (i) k and U˜ k implemented by the fast scrambling model [213]—layers of general single-qubit rotations in between homogeneous tunable entangling layers of all-to-all ZZ rotations—and hardware efficient ansatz [27]—layers of X and Y single-qubit rotations in between layers of nearest-neighbor control-Z gates—separately (Sec. D). While the MMD distance characterization similar to Fig. 4.14 is presented in Sec. D, we adopt more direct measures of performance in each application. 8 Backward denoising process via measurement: training data Forward noisy diffusion process via scrambling (a1) Backward denoising process via measurement: test data (b1) (c1) (a2) (b2) (c2) (a3) (b3) (c3) (e) (d) t=0 t=20 t=40 Figure 4.17: Bloch visualization of the forward (a1-a3) and backward (b1-b3,c1-c3) process. (d)(e) deviation of generated states from unit circle in X − Z plane. The deviation ⟨Y ⟩ 2 for forward diffusion (red), backward training (blue), backward test (green) are plotted. The shaded area show the sample standard deviation. 4.2.6.1 Learning correlated noise When a real quantum device is programmed to generate a quantum state, it inevitably suffers from potentially correlated errors in the gate control parameters [214, 215]. As a result, the generated states S0 are close to the target state but has nontrivial coherent errors, which can be learned by QuDDPM. We take a two-qubit example of target state |Ψ⟩ = c0 |00⟩ + c1 |01⟩ + c3 |11⟩ under the influence of fully correlated noise, where e −iδX1X2 and e −iδZ1Z2 rotations happen with probability p and 1 − p. Here Xk and Zk are Pauli operator for qubit k. In each case, the angle of rotation δ is uniformly sampled from the range [−δ0, δ0]. As the |10⟩ component in the superposition only appears when XX error happens, we can utilize average fidelity F10 = ES˜0 |⟨10|ψ (0)⟩|2 as the performance metric to estimate the error probability p via p˜ = F10/ |c1| 2 Eδ sin2 δ . We show a numerical example in Fig. 4.16(a), where the generated ensemble average fidelity in training and testing, agrees with the theoretical prediction up to a finite sample size deviation. 8 4.2.6.2 Learning many-body phases As a proof-of-principle, we take the simple and well-known transverse-field Ising model (TFIM) described by the Hamitonian HTFIM = − P i ZiZi+1 − g P i Xi . When g increases from zero, the system undergoes a phase transition from ordered ferromagnetic phase to disordered phase, with the critical point at g = 1. The states before diffusion is chosen from ground states of HTFIM with g ∈ [0.2, 0.4) uniformly distributed. To test the capability of QuDDPM, we utilize the magnetization, M = (P i Zi)/n, to identify the phase of generated states from QuDDPM, and show its distribution in Fig. 4.16(b). Most generated states (blue and green) of QuDDPM lives in the ferromagentic phase, and shows a sharp contrast to the random states (orange). 4.2.6.3 Learning nontrivial topology We consider an ensemble of states with a ring structure—generated by applying a unitary on a single state, e.g., |ψi⟩ = e −ixi·G |0⟩, which models the scenario where one encodes the classical data xi onto the quantum data ψi , as commonly adopted in quantum machine learning to solve classical problems [93, 216, 217, 190]. We test QuDDPM with a single qubit toy example, where the generators are chosen as Pauli-Y and the rotation angles uniform in [0, 2π). In the QuDDPM training, we use the Wasserstein distance [201] to cope with the nontrivial topology. The forward noisy diffusion process on the sample data and the backward denoising process for training and testing are depicted in Fig. 4.17. To quantitatively evaluate the performance of QuDDPM, we evaluate the deviation by Pauli-Y expectation ⟨Y ⟩ 2 in Fig. 4.17(d)(e), where gradual transition between zero and Haar value of 1/3 is observed in both forward diffusion and backward denoising. 84 4.2.7 Discussion Finally, we point out some future directions, besides various applications of QuDDPM in learning quantum systems. Our current QuDDPM architecture requires a loss function based on fidelity estimations. For large systems, fidelity estimation can be challenging to implement. Towards efficient training in large systems, alternative loss functions can be adopted. For example, one may consider adopting another quantum circuit trained for telling the ensembles apart, such as quantum convolutional neural network [31] and other circuit architecture [42]. Such an approach will combine QuDDPM and the adversarial agent in QuGAN to resolve the training problem in QuGAN. Another future direction is controlled diffusion [218]: when the ensemble has special symmetry, one can restrict the forward scrambling, the backward denoising and the random noise ensemble to that symmetry. It is also an interesting open problem how to introduce a control knob such that QuDDPM can learn multiple distributions and generate states according to an input requesting one of the distributions. Besides learning quantum errors and many-body phases, quantum sensor networks [87, 88] provide another application scenario of QuDDPM. In this scenario, one sends quantum probes to sense unitary physical process; on the return side, the receiver will collect a pure state from a distribution in the ideal case. It is an open problem how QuDDPM can be adopted to provide advantage in quantum sensing. Furthermore, the distribution can also be quantum states encoding classical data, where QuDDPM can also process classical data. Benchmarking QuDDPM and previous algorithms for classical data generative learning is an open direction. 85 Generation task n nA L T N Cost func. Performance Clustered state (Fig. 4.12 of Sec. 4.2) 1 1 4 20 100 MMD F0,data = 0.987 ± 0.013 F0,tr = 0.992 ± 0.021 F0,te = 0.993 ± 0.014 Clustered state (Fig. 4.14 of Sec. 4.2) 2 1 6 20 100 MMD F0,data = 0.977 ± 0.014 F0,tr = 0.952 ± 0.070 F0,te = 0.944 ± 0.075 Clustered state (Fig. 4.15a and 4.15b) 4 2 8 20 (Fig. 4.15b) 100 (Fig. 4.15a) MMD see Fig. 4.15a and 4.15b Correlated noise (Fig. 4.16a) 2 2 6 20 500 MMD data: 0.129 training: 0.128 testing: 0.133 Many-body phase (Fig. 4.16b) 4 2 12 30 100 MMD Measured by magnetization. data: 1 training: 0.9 testing: 0.96 Circular states (Fig. 4.17) 1 2 6 40 500 Wasserstein ⟨Y ⟩ 2 data = 0 ⟨Y ⟩ 2 tr = 0.00367 ± 0.0251 ⟨Y ⟩ 2 te = 0.00506 ± 0.0439 Table 4.2: List of hyperparameters of QuDDPM and its performance in different generative learning tasks. To test the performance after training, we randomly sample Nte random noise states, and perform the optimized backward PQC to generate the sampled data. Data set size Ntr = Nte = N. n is the number of data qubit n and nA the ancilla qubit. L is the PQC depth. T is the diffusion steps. For cluster state generation, we evaluate the average fidelity with the center state in each cluster, i.e. |0⟩ for single-qubit and |0, 0⟩ for two-qubit in Sec. 4.2. 4.2.8 Methods 4.2.8.1 Details of parameters We list hyperparameters and performance for all generative learning tasks in Table 4.2 for reference, and state the targeted distribution of states to generate in the following. The major codes and data of the work can be found in Ref. [219]. In Fig. 4.12(b)(c), we consider data in the form of |ψ (0)⟩ ∼ |0⟩ + ϵc1 |1⟩ up to a normalization constant where Re{c1},Im{c1} ∼ N (0, 1) is Gaussian distributed and the scale factor is chosen ϵ = 0.08. We have taken single qubit rotations as U (i) k , where each angle is randomly sampled, e.g., from the uniform 86 Model n nA # variational parameters N Cost func. Performance QuDDPM 2 1 720 100 MMD F0,tr = 0.947 ± 0.070 F0,te = 0.948 ± 0.061 QuDT 2 1 720 100 MMD F0,tr = 0.572 ± 0.321 F0,te = 0.465 ± 0.349 QuGAN 2 1 720 (generator) 96 (discriminator) 100 Error probability based Cost func. F0,tr = 0.570 ± 0.250 F0,te = 0.443 ± 0.269 Table 4.3: List of hyperparameters of QuDDPM, QuDT and QuGAN for generating clustered state in Fig. 4.14 and Fig.8 in Sec. 4.2. Data set size Ntr = Nte = N. n is the number of data qubit n and nA the ancilla qubit. In performance, the mean fidelity with the center state of the cluster |0, 0⟩ is F0 = E|ψ⟩∈S˜| ⟨0, 0|ψ⟩ |2 and for true data it is F0,data = 0.977 ± 0.014 in Sec. 4.2. distribution U[−π/8, π/8]. In the generation of states with probabilistic correlated noise of Fig. 4.16(a), the noise perturbation range is δ ∈ [−π/3, π/3]. 4.2.8.2 Wasserstein distance For pure states, choosing the quantum trace distance (equaling infidelity) D2 (|ϕ⟩, |ψ⟩) = 1 − |⟨ϕ|ψ⟩|2 , then Kantorovich’s formulation for optimal transportation is to solve the following optimization problem OPT := min π∈Π(E1,E2) Z V ×V Dp (|ϕ⟩, |ψ⟩) dπ(|ϕ⟩, |ψ⟩) (4.17) for p ≥ 1, where Π(E1, E2)is the set of admissible transport plans (i.e.,couplings) of probability distributions on V × V such that π(B × V ) = E1(B) and π(V × B) = E2(B) for any measurable B ⊂ V ; namely Π(E1, E2) stands for all distributions with marginals as E1 and E2. The Kantorovich problem in (4.17) induces a metric, known as the p-Wasserstein distance, on the space Pp(V ) of probability distributions on V with finite p-th moment. In particular, the p-Wasserstein distance Wp(E1, E2) = OPT1/p, and it has identifiability in the sense that Wp(E1, E2) = 0 if and only if E1 = E2. More details can be found in Sec. 4.2. 87 4.2.8.3 Related works The proposed QuDDPM represents a use case of theoretical idea of quantum information scrambling [62, 72] in the forward diffusion, and its backward denoising also connects to the measurement-induced phase transitions [220]. Here we point out that our forward diffusion circuits include an actual implementation of scrambling as part of the QuDDPM algorithm, while previous papers utilize tools from the study of quantum scrambling to understand quantum neural networks [97, 98]. Below, we discuss several related works. Ref. [221] utilizes diffusion map (DM) for unsupervised learning of topological phases and [222] proposes a diffusion K-means manifold clustering approach based on the diffusion distance [223]. Quantum DM algorithm has also been considered [224] for potentially quantum speed-up. However, these works are not on generative learning and do not consider any denoising process. Layer-wise training [225] also attempts to divide a training problem into sub-tasks in non-generative learning; however, the performance of such strategies is limited [226]. QuDDPM integrates the division of training task and an actual noisy diffusion process to enable provable benefit in training. After the completion of our work, we became aware of a recent paper [227], where classical diffusion on classical data is implemented. Our work focuses on quantum data, provides explicit construction of quantum diffusion and quantum denoising, loss function, training strategy and error analyses, and presents several applications. 88 Chapter 5 Trainability and application of hybrid QNN In Section 5.1, I will summarize my work in the trainability analysis on universal hybrid QNN [44]. In Section 5.2, I will summarize my work on the proposal of quantum diffusion model for generative learning [45]. 5.1 Energy-dependent barren plateau in bosonic variational quantum circuits 5.1.1 Introduction Variational quantum circuits (VQCs) [13] are candidates for achieving practical quantum advantages in the noisy intermediate-scale quantum (NISQ) era [46], when scalable error-corrected quantum computers are not yet available. VQCs utilize classical control to optimize a quantum circuit to solve computation problems, including optimization [28], eigen-system problem [14, 27, 15, 129, 130, 131, 132], partial-differential equations [139], quantum simulation [140, 141, 142] and machine learning [12, 29, 91, 143, 144, 93, 92, 145, 146, 228]. As a general approach of designing quantum circuits, VQCs have also found applications in the approximation [134], preparation [125, 126], classification [135, 136, 31, 137, 42] and tomography [138] of quantum states. 89 Initial works on VQCs concern discrete-variable (DV) finite-dimensional systems of qubits, which are natural for computation; while continuous-variable (CV) systems of bosonic qumodes are less explored. Yet, many important quantum systems are modelled by qumodes, including microwave cavities, trapped ion mechanical modes, and optical systems. For example, quantum communication and networking [229, 230, 231, 232, 233] rely on photons—the only flying quantum information carrier. In this regard, quantum transduction and entanglement distillation are shown to be enhanced by CV VQCs [45]; Photonic quantum computers [234, 235] are also relying on bosonic encoding such as the cat code and GottesmanKitaev-Preskill (GKP) code [236], which has shown great promise [237, 96]. The engineering of such code states are greatly boosted by CV VQCs [238, 239, 240, 241]; Finally, distributed entangled sensor networks ubiquitously rely on CV VQCs to achieve quantum advantages in sensing [162, 242, 243, 244] and data classification [87, 88]. Different from traditional algorithms, the runtime of VQCs is characterized by the time necessary to train the variational parameters to optimize a cost function. Therefore, the landscape of the cost function determines the VQC’s trainability—large gradients across the landscape will guarantee a smooth training process. For DV systems, thanks to the well-established toolbox of random unitaries and t-design [169, 170, 61, 171], rigorous results unveil the barren plateau phenomena in expressive DV VQCs [16, 18, 22, 21], which states that cost function gradient typically vanishes exponentially with the number of qubits when the circuit depth is not too shallow. However, the trainability of expressive CV VQCs is unexplored, perhaps due to the fundamental problem of the non-existence of t-design with t ≥ 2 in infinite dimension [245, 63, 246]. In this work, we overcome the problem by developing the notion of energy-constrained random quantum circuits and prove a unique energy-dependent barren plateau phenomena for such circuits. For an M-mode state preparation task, the variance of the gradient asymptotically decays as ∼ 1/EM ν, exponential in the number of modes M while power-law in the circuit energy E. The barren-plateau exponent 90 Discrete-variable (DV) Finite-dimension ℋ Global Cost function Continuous-variable (CV) Infinite-dimension ℋ 1 Ω() Energy Depth Target-dependent behavior Deep region ∈ Ω(log) Barren plateau Var ∼ 1/2 Shallow region ∈ log Barren plateau Var ∼ 1/ Depth Shallow Circuit poly log Barren plateau Var ∈ 1/2 [59] Deep Circuit poly Barren plateau Var ∈ 1/22 [58] log (exp) …… State preparation 2 …… …… 1st layer th layer qubit qubits qumodes Figure 5.1: Summary of VQC trainability in DV and CV systems. The Hilbert space is finite-dimension in DV systems while for CV ones it is infinite-dimension. The universal DV VQC is built from local 2- design unitary gates (lime green), and universal CV VQCs consist of single qubit rotations (cyan) and echoed conditional displacement (ECD) gates (pink) [241]. In DV VQCs, the variance of the gradient decays exponentially with the number of qubits N in shallow [18] and deep [16] DV VQCs optimizing a global cost function. In this paper, we show the variance decays exponentially in number of modes M but polynomially with circuit energy E in shallow and deep region of CV VQCs. ν = 1 for shallow circuits and ν = 2 for deep circuits. We prove the results for state preparation tasks of general Gaussian states and number states. As the VQCs are randomly initialized, we expect the energydependent barren plateau to be universal for all state preparation tasks and provide supporting numerical evidence. Moreover, as the energy of the circuit is a controllable parameter upon random initialization, we provide a strategy to mitigate barren plateau by heuristically choosing the initial circuit energy in each training problem. 91 5.1.2 Results In general, the goal of a VQC U(x) is to minimize a cost function C(x) = Tr h OU(x)ρ0U † (x) i (5.1) over the tunable parameters x, where ρ0 is the initial state and O is a Hermitian observable. The performance of VQCs relies on a balance between expressivity and trainability. Expressivity concerns about the size of the solution space a VQC ansatz can cover and can be quantified by the cardinality of the unitary set needed to well approximate the VQC ansatz [247]; while trainability concerns about how fast a VQC can converge to an optimal configuration within the solution space. The study of DV VQCs has revealed a trade-off between expressivity and trainability [23, 40]. To ensure expressivity, DV VQCs consist of single-qubit rotations and multi-qubit gates (see Fig. 5.1), for example in the hardware-efficient ansatz [27]. When the number of layers is linear in the number of qubits N, a VQC can approximate complex unitary ensembles classified as t-design [169, 170, 61, 171]. However, increasing the expressivity makes the training of VQCs more challenging. Upon random initialization, the gradient of cost function C with respect to any parameter is exponentially small in the number of qubits. More precisely, Refs. [16, 18] show that while the mean is zero, the variance of the gradient decays as O(1/2 N ) for shallow circuits and as O(1/2 2N ) for deep circuits (summarized in Fig. 5.1 top panel). For the CV case, to ensure expressivity over M oscillators, we consider the approach of echoed conditional displacement (ECD) gates via coupling to a qubit, a common approach of oscillator control adopted in both microwave cavities and trapped ions [241, 248] (see Fig. 5.1). Each layer of the VQC consists of M ECD gates in between single-qubit rotations: the qubit controls the displacement of each qumode via an ECD gate UECD(β) = D(β) ⊗ |1⟩ ⟨0| + D(−β) ⊗ |0⟩ ⟨1| after a single qubit rotation UR(θ, ϕ) = exp[−iθ/2(cos ϕσx + sin ϕσy )], where σ x and σ y are Pauli-X and Y operators and D(β) ≡ exp βm† − β ∗m 9 is the displacement operator on mode m. Here, despite the lack of Haar random unitaries and t-design (t ≥ 2) [245, 63, 246] due to the infinite dimension, we establish the energy-regularized circuit ensemble to represent ‘typical’ qubit-qumode circuits to enable the analyses. Our main result shows that when training such a CV VQC for state preparation, the cost function exhibits an energy-dependent barren plateau. Specifically, when preparing an M-mode state with energy per mode Et , the variance of the gradient decays polynomially with circuit energy and exponentially with the number of modes M, when the circuit energy per mode E ∈ Ω(Et). When the circuit has a shallow log-depth (L ∈ O(log E)), the variance of the gradient decays as 1/EM; in the deep circuit region (L ∈ Ω(log E)), it decays as 1/E2M. Alternatively, for a fixed circuit depth L ≳ log(Et), as the circuit energy increases, the variance of the gradient first displays target-dependent behaviors before a quick 1/E2M decay, followed by a transition to the 1/EM scaling at the critical energy E ∼ exp(L). We prove these results for the state preparation of a fairly general class of states known as Gaussian states [249], including non-classical multipartite entangled states useful in quantum computing [250] and distributed quantum sensing [162, 242, 87, 88]. To extend the results beyond Gaussian states, we also prove them for Fock number states and verify them numerically for general random states. Furthermore, as circuit energy E is a continuously tunable real parameter, the energy-dependent barren plateau provides an opportunity of mitigating the challenges in training. 5.1.2.1 Energy-regularized circuit ensemble To ensure the expressivity of the CV VQCs, we adopt a universal gate set over a qubit and M qumodes. While it is known that ECD gates with single-qubit unitaries can achieve universal control on one qubit and one mode [241], we extend the proof to the system of multiple qubits and qumodes (see Methods). We state the following lemma: 93 Lemma 6 Universal control over M ≥ 1 qumodes and N ≥ 1 qubits can be realized by all ECD gates between any qumode and any qubit and all single qubit rotations. Alternatively, universal control can also be achieved by a variant of the ECD gate—the Conditional Not Displacement gate [251]. The results of our paper still hold in that case. Although the ECD gates can have arbitrarily large complex amplitudes, as realistic systems are always subject to energy constraints, we introduce the following definition to model a ‘typical’ circuit layout for the CV VQCs (see Fig. 5.1 for schematic of the circuits): Definition 7 The energy-regularized L-layer qubit-qumode circuits UE,L,M is the ensemble of unitaries U = Y L ℓ=1 Y M j=1 U (j) ECD β (j) ℓ UR θM(ℓ−1)+j , ϕM(ℓ−1)+j , (5.2) where each ECD gate’s complex displacement β (j) ℓ ∼ N C E/L is Gaussian distributed with zero mean and variance E/L, and all qubit rotation angles {θk, ϕk}ML k=1 ∼ U[0, 2π) are uniformly distributed. For simplicity, we denote a zero-mean complex Gaussian distribution with variance σ 2/2 on both real and imaginary parts as N C σ2 . It is worth noting that the choice of a Gaussian distribution for β (j) ℓ is convenient but not essential due to the central limit theorem. Although the ensemble of circuits comes from physical regularization, we will see later that it also enables analytical solutions to various properties of the VQCs and provides an excellent playground of CV quantum information processing. The expressivity of the VQCs in UE,L,M increases with the depth L. Due to the universality in Lemma 6, the VQCs in UE,L,M with L ≫ 1 contain all unitaries relevant to the energy scale E, and a larger circuit energy E will enable expressivity over larger Hilbert space volumes. At the same time, the trainability of VQCs in UE,L,M is unclear, which we aim to resolve in this work. As we focus on state preparation tasks, below we consider the states generated by the random VQCs applied on trivial initial states. This 9 is analogous to the relationship between unitary design and state design, both of which unfortunately do not exist in the CV case. We define the energy-regularized state ensemble ΨE,L,M as |ψ(x)⟩ q,m = U(x)|0⟩ q⊗M j=1 |0⟩mj , where we randomly apply U(x) ∈ UE,L,M on spin-up qubits and vacuum qumodes. Here we denote the overall parameters as x, including all β (j) ℓ , θk and ϕk. In our notation, |a⟩ q denotes the a = 0, 1 state of the qubit ‘q’ and |α⟩mj denotes the coherent state with complex amplitude α of the mode mj (see Methods) and m = (m1, · · · , mM) denote all modes. With random qubit rotations UR’s in Eq. (5.2), the consequent ECD gate U (j) ECD(β (j) ℓ ) applied on each mode will lead to a random superposition of performing complex displacement +β (j) ℓ and −β (j) ℓ for all modes j = 1, · · · , M. Therefore, each layer of ECD gates with random qubit rotations corresponds to a superposition of all possible choices of a random-walk step in the 2M-dimensional phase space. The accumulation of the displacements leads to the final output state in a superposition of random-walk trajectories. For example, in the one-mode case (M = 1), we have the superposition |ψ(x)⟩ q,m = X 1 a=0 X s vs,a(x)|a⟩ q |(−1)a s · β⟩m , (5.3) where sign vector s sums over {−1, +1} L under the constraint that sL = −1 and β = {β1, · · · , βL} are the amplitudes of displacements. The coefficients vs,a(x) are lengthy, we specify them and present the detailed proof of the state representation in Sec. E.1.1. One can directly check that in the energy regularized ensemble, the displacement s · β with arbitrary choice of sign vectors s obeys a Gaussian distribution and the ensemble average energy of the states in ΨE,L,1 is E m†m = E due to energy regularization. Moreover, because of the accumulation of displacements in the summation s · β = P ℓ (±)βℓ , central limit theorem dictates that a Gaussian distribution is universal in the amplitudes of the final displacement, regardless of the distribution of each βℓ . 95 Shallow Deep 1/ 1/2 () ℓ Shallow Deep 1/ 1/2 ℓ = 50: () DSV state () Fock state () = 4: Figure 5.2: Variance of gradient Var[∂θk C] at k = L/2 in preparation of (a) displaced squeezed vacuum (DSV) state with γ = 2, ζ = sinh−1 (2) and (b) Fock state with Et = 8. Orange and red dots with errorbars show numerical results of variance in shallow and deep circuits. Orange solid curve represents the analytical variance in Theorem 8; the dashed and solid magenta curves show the lower and upper bounds in Ineqs. (5.6), (5.7). Black dotted and dashed lines indicate the scaling of 1/E and 1/E2 . Insets in (a)(b) we plot the logarithm in base ten of the upper bound in Ineq. (5.7) versus circuit depth and energy. Green triangle (main) and line (inset) show the corresponding boundary of variance at Ec(L) and ℓc(E). The M-mode case is a direct generalization of Eq. (5.3), via extending the coherent state to the product of coherent states in the superposition (see Sec. E.1.4). At the same time, the corresponding energy per mode is still E. 5.1.2.2 Barren plateau in state preparation The trainability of the circuit U(x) initialized in the energy-regularized ensemble UE,L,M is characterized by the typical gradient of cost function C in Eq. (5.1). Here, we focus on the state engineering for the qumodes, while the qubit acts as an ancilla. To prepare a general M-mode state |ψ⟩m, the observable can be chosen as O = |0⟩⟨0| q ⊗ |ψ⟩⟨ψ|m and we can set the initial state as vacuum without loss of generality. Consider the gradient with respect to the kth qubit rotation angle θk, one can check that the parameter shift rule [252] holds for qubit rotation angles in the CV circuit, therefore ∂θk C = (⟨O⟩k (+) − ⟨O⟩k (−) ) /2, 96 where ⟨O⟩k (±) corresponds to expectation of O for the state ρk (±) prepared under circuit U(x) with the rotation angle θk shifted by ±π/2. In the same spirit of Refs. [16, 18], we consider random initial conditions— when U(x) is randomly sampled from the energy-regularized ensemble UE,L,M , the mean of the gradient E [∂θk C] = 1 2 (E [⟨O⟩k (+) ] − E [⟨O⟩k (−) ]) = 0, (5.4) due to the fact that the ensemble average over parameters with θk ±π/2 are equal. In this case, the typical values of gradients are characterized by the variance Var [∂θk C] = 1 2 E h ⟨O⟩ 2 k (+) i − 1 2 E h ⟨O⟩k (+) ⟨O⟩k (−) i . (5.5) The exact evaluation of Var [∂θk C] is in general challenging, due to the lack of theoretical tools such as t-design in infinite dimension. Instead, our proposed energy-regularized ensemble allows us to solve a pair of asymptotic lower and upper bounds when E ≫ 1 as (ignoring higher-order terms, see Sec. E.1.4) Var [∂θk C] ≥ 1 2 3ML−1 4ML C1 + 1 4 − 3ML 4ML min ℓ C2 ℓ L , (5.6) Var [∂θk C] ≤ 1 2 3ML−1 4ML C1 + 1 4 + 2ML−1 4ML max ℓ C2 ℓ L , (5.7) where the minimization and maximization are over the vector ℓ = (ℓ (1), . . . , ℓ(M) ) T with each integer element ℓ (j) ∈ [1, L − 1] ∩ N. The state-dependent correlators are C1 = Eα h |m⟨ψ|α⟩m| 4 i , (5.8) C2(z) = Eαz,α1−z "Y 1 h=0 m⟨ψ|αz + (−1)hα1−z⟩m 2 # . (5.9) 97 Here we have defined the vector notation |α⟩m = NM j=1 |α (j) ⟩mj , with each |α (j) ⟩mj being a coherent state with displacement α (j) for mode mj . The ensemble average in Eq. (5.8) is over each component α (j) sampling from N C E , due to Definition 7. Similarly, |αz + (−1)hα1−z⟩m = NM j=1 |αzj + (−1)hα1−zj ⟩mj , with each component αzj ∼ N C zjE and α1−zj ∼ N C (1−zj )E . As the above two correlators are only functions of state fidelity, numerical evaluation is often efficient and analytical evaluation is sometimes possible. Note that for M ≥ 2, Eq. (5.9) has already ignored terms exponentially small in L, which will not affect our results (see Sec. E.1.4). In the following, we present analytical results for general Gaussian states and number states, and numerical results for randomly generated states in the lower and upper bounds. Although Gaussian states can be efficiently prepared with a Gaussian circuit of linear optics and squeezers [249], as the random initialized VQCs are not leveraging any heuristics, we expect trainability found there to be universal, which is supported by our other results. We begin with one-mode state preparation and then generalize to the multi-mode case. One-mode state preparation For one-mode state preparation, we consider Gaussian states, number states and random states. An one-mode (pure) Gaussian state can be represented by a displaced squeezed vacuum state D(γ)S(ζ)|0⟩m, where S(ζ) = exp ζ(m2 − m†2 ) is the squeezing operator (see Methods). This is an important class of states, as coherent states provide a quantum model for lasers and squeezed vacuum is a key resource of quantum sensing, e.g. Laser Interferometer Gravitational-Wave Observatory [253, 254, 255] and dark matter search [256]. Here we omit any possible phase rotation, which is treated in Sec. E.1.3.1. The energy for such a state is Et = |γ| 2 + sinh2 (ζ). We can analytically solve the two correlators for one-mode Gaussian state as C Gauss 1 = sech2 (ζ)e −R(E)/G1(E) p G1(E) , (5.10) C Gauss 2 (z) = sech2 (ζ)e −R(zE)/G1(zE) p G1(E − zE)G1(zE) , (5.11) 98 where we define G1(x) = 1 + 4x + 4 sech2 (ζ)x 2 and R(x) = 2|γ| 2 + 2 tanh(ζ)(Re{γ} 2 − Im{γ} 2 ) + 4 sech2 (ζ)|γ| 2x. To go beyond Gaussian states, we consider the Fock number state with Et ∈ N photons, whose correlators can be solved in closed-form as C Fock 1 = (2Et)! (2EtEt !)2 (1 + 1/2E) −2Et 1 + 2E , (5.12) C Fock 2 (z) = η (1 + 2Et)(2Et)! (2EtEt !)2 (1 + 1/2Ez) −2Et (1 + 2Ez)[1 + 2E(1 − z)]. (5.13) In the second line, the right-hand-side represents the lower and upper bounds of C Fock 2 (z) as exact evaluation becomes hard: for the upper bound η = 1 and for the lower bound η = 2F1(1/2, −Et , 1, 1) where 2F1 is the original hypergeometric function. By having those correlators in Ineqs. (5.6), (5.7), we can have the corresponding bounds for variance of gradient in preparation of coherent states and Fock states. With the above correlators in hand, we found that when the circuit depth L is shallow, the lower and upper bounds in Ineqs. (5.6) and (5.7) coincide to the leading order of ∼ 1/E for large E. When depth L is large, the ∼ 1/E2 terms will dominate in both lower and upper bounds. Quantitatively, the cross-over of the scaling happens at depth ℓc(E) ∈ O(log E). For the full formula of ℓc(E), please refer to Sec. E.1.3. Equivalently, for circuits with a fixed depth L, the transition of scaling from 1/E2 to 1/E takes place at Ec(L) ∈ Ω(exp(L)). Overall, we have the following theorems: Theorem 8 (Barren plateau for shallow depth.) For a single-mode (M = 1) CV VQC randomly initialized from the energy-regularized ensemble UE,L,1 with a shallow depth L ≤ ℓc(E) ∈ O(log E), the gradient with respect to qubit rotation angles when preparing a Gaussian or a Fock state has zero-mean and variance Var [∂θk C] = 1 6 3 4 L C1 + O 1 E2 . (5.14) 99 = 4 = 50 1/2 1/ Figure 5.3: Variance of gradient Var[∂θk P C] at k = L/2 in preparation of random CV states |ψ⟩m = n bn |n⟩m with L = 4 (a) and L = 50 (b) circuits. Curves with same color show the variance of different sample target states. Black dotted and dashed lines in (a) and (b) represent the scaling of 1/E and 1/E2 . In our calculation, we have chosen cutoff nc ∼ 2Et and ϵ = 0.1. where C1 is correlator in Eq. (5.10) or (5.12). In particular, Var [∂θk C] ∼ 1/E in the large E limit. Theorem 9 (Barren plateau for deep depth.) For a single mode (M = 1) CV VQC randomly initialized from the energy-regularized ensemble UE,L,1, with layers L ≥ ℓc(E) ∈ Ω(log E), the gradient with respect to qubit rotation angles when preparing a Gaussian state or a Fock state has zero-mean and asymptotic variance Var ∼ 1/E2 . In terms of the asymptotic region, in practice we find the scaling 1/E2 to hold as long as E ∈ Ω(Et). In Fig. 5.2(a)(b), we show the variance of the gradient Var [∂θk C] versus ensemble energy E in shallow and deep circuits and compare numerical results to the bounds and theorems. We consider the preparation of a displaced squeezed vacuum (DSV) state with γ = 2, ζ = sinh−1 (2) and Fock states with Et = 8 separately. We see that the numerical variance in shallow circuits L = 4 (orange dots) agrees well with Eq. (5.14) stated in Theorem 8 (orange line), which suggests a scaling of 1/E in the asymptotic region of E. For deep circuits L = 50, the numerical results (red) lie between the lower bound and upper bound, and indeed obey the 1/E2 scaling, which supports Theorem 9. To our surprise, the lower bound in Ineq. (5.6) becomes extremely tight in asymptotic region of E in both cases. At the same time, despite the large circuit depth L = 50, given the extremely high ensemble energy above Ec(L), the VQCs are again shallow 100 () = 10 () = 103 Figure 5.4: Correlators C Gauss 1 and C Gauss 2 (z) with z = {1/2, · · · , 1/2} in Eqs. (5.17), (5.18) versus (a) ensemble energy E and (b) modes M. The target state |ψ⟩m is generated by global random passive Gaussian unitary following a single-mode squeezer with strength r = 8. compared with ℓc(E) and the variance of the gradient obeys the 1/E scaling for shallow circuits. To understand the transition between shallow and deep circuits, in the inset of Fig. 5.2 we present the contours of the upper bound in Ineq. (5.7) versus circuit depth L and energy E. Here we identify a clear contrast between shallow and deep circuits in terms of circuit depth and ensemble energy, where the boundary representing ℓc ∈ O(log E) is indicated by the green curve. For general state preparation, the evaluation of the correlators C1, C2 in Ineqs. (5.6) and (5.7) can be hard. However, informed from the above results, we conjecture that Theorems 8, 9 hold for arbitrary singlemode state preparation. To support this conjecture, we present numerical evidence for the preparation of randomly generated CV states. These states are random superposition of the number bases in the form of |ψ⟩m ∝ Pnc n=0 bn |n⟩m, where each bn ∼ N C 2 is randomly chosen. We post-select states within the energy window [Et − ϵ, Et + ϵ]. With the target states generated, we evaluate the variances of gradient in preparing each fixed state versus circuit energy for different choices of Et (indicated by the color) in Fig. 5.3. Despite state-dependent behaviors in the low energy part, a universal decay of the gradients with the energy can be identified in the E ≳ Et region. The decay shows a scaling of ∼ 1/E in shallow circuits (subplot (a)) and a scaling of ∼ 1/E2 in deep circuits (subplot (b)). 101 Multi-mode state preparation Now we generalize our results of energy-dependent barren plateau to the multi-mode case, including analytical results on general Gaussian states, number states and numerical results for random states. We begin our discussion with the simple case of a product state |ψ⟩m = ⊗M j=1 |ψj ⟩mj , whose correlators in Eqs. (5.8), (5.9) have the form C Prod 1 = Y M j=1 Eα(j)∼N C E ⟨ψj |α (j) ⟩mj 4 , (5.15) C Prod 2 (z) = Y M j=1 Eαy∼N C yE "Y 1 h=0 ⟨ψj |αzj + (−1)hα1−zj ⟩mj 2 # , (5.16) which reduce to a product of single-mode correlators. Consequently, Theorems 8 and 9 directly generalize to state preparation of products of single-mode Gaussian states and products of number states: the scaling of the variance of the gradient is 1/EM for shallow circuits L ∈ O(log E) and 1/E2M for deep circuits L ∈ Ω(log E) (see Sec. E.1.4.1 for a detailed proof). To go beyond product states, we consider an arbitrary M-mode Gaussian state, which is typically highly entangled [257]. A general Gaussian state |ψ⟩m can be described by its mean and covariance matrix Vm of its Wigner function (see Methods). For simplicity, we show the zero-mean results in the main text, and the non-zero mean case is presented in Sec. E.1.4.2. The correlators C1 and C2 can be analytically solved as (see Sec. E.1.4.2) C Gauss 1 = 4M det(K) p det(4K + I/E)EM (5.17) C Gauss 2 (z) = 4M det(K) p det(4K + Sz) det(4K + S1−z) × 1 hQM j=1 zj (1 − zj ) i E2M , (5.18) where I is the 2M×2M identity matrix. Here we have defined K = (Vm+I) −1 and Sz = ⊕M j=1I2/(zjE) with I2 being the 2 × 2 identity matrix. In the asymptotic limit of E ≫ 1, one can directly see that 102 () 1/2 1/4 TMSV state Shallow Deep Figure 5.5: Variance of gradient Var[∂θk C] at k = ML/2 in preparation of a TMSV state |ζ⟩TMSV with ζ = sinh−1 (2). Orange and red dots with errorbars show numerical results of variance in shallow and deep circuits. Orange solid curve represents the (3/4)2LC TMSV 1 /6 for reference; the dashed and solid magenta curves show the lower and upper bounds in Ineqs. (5.6), (5.7). Black dotted and dashed lines indicate the scaling of 1/E2 and 1/E4 . Inset shows the logarithm in base ten upper bound versus circuit depth and energy. Green triangle (main) and line (inset) show the corresponding boundary of variance at Ec(L) and ℓc(E). C Gauss 1 ∼ 1/EM while C Gauss 2 (z) ∼ 1/E2M (see Sec. E.1.4 for a proof in the non-zero mean case), which leads to the following theorem. Theorem 10 (Barren plateau for multi-mode Gaussian states) For an M-mode CV VQC randomly initialized from the energy-regularized ensemble UE,L,M , the gradient with respect to qubit rotation angles when preparing an M-mode general Gaussian state has zero-mean and asymptotic variance Var ∼ 1/EM with a shallow depth L ∈ O(log E), while Var ∼ 1/E2M with a deep depth L ∈ Ω(log E). As an example, we consider a multipartite entangled distributed squeezed state generated by passing a single-mode squeezed vacuum over a random beamsplitter array (a global random passive Gaussian unitary), which is known to be typically highly entangled in the study of continuous-variable quantum information scrambling [156]. They are also the crucial form of entanglement that empowers distributed quantum sensing applications [162, 242, 87, 88, 243, 244]. For a fixed number of modes M = 10, in Fig. 5.4(a), we see that the correlators C Gauss 1 (orange) and C Gauss 2 (z) with z = {1/2, · · · , 1/2} (red) approach the scaling of 1/EM (orange dotted) and 1/E2M (red dashed) separately. On the other hand, with an asymptotic energy of E = 103 , we see the two correlators decays exponentially with the mode 103 = 4 = 50 1/4 1/2 Figure 5.6: Variance of gradient Var[∂θk C] at k = ML/2 in preparation of random two-mode CV states |ψ⟩m = P n1,n2 bn1,n2 |n1⟩m1 |n2⟩m2 with L = 4 (a) and L = 50 (b) circuits. Curves with same color show the variance of different sample target states. Black dotted and dashed lines in (a) and (b) represent the scaling of 1/E2 and 1/E4 . In the calculation, we have chosen ϵ = 0.1 and nc ∼ 2Et . number M in Fig. 5.4(b). In this case, however, the direct evaluation of gradients is challenging due to the large number of M = 10 modes. To evaluate the gradient of CV VQCs in preparation of Gaussian states for a direct comparison with Theorem 10, we consider the two-mode squeezed vacuum (TMSV) states, the CV analog of Bell states. A TMSV state |ζ⟩TMSV is generated by applying a two-mode squeezing operator S2(ζ) = exph ζ(m1m2 − m † 1m † 2 )/2 i on vacuums |0⟩m1 |0⟩m2 , and has energy per mode Et = sinh2 (ζ). The correlators can be found utilizing Eqs. (5.17), (5.18) as C TMSV 1 = sech4 (ζ) G1(E) , (5.19) C TMSV 2 (z1, z2) = sech4 (ζ) G2(z1, z2)G2(1 − z1, 1 − z2) , (5.20) where G2(z1, z2) = 1+2(z1+z2)E+4 sech2 (ζ)z1z2E2 . In the asymptotic region, the correlators show the scaling of 1/E2 and 1/E4 separately, and thus correspond to the scaling of gradient variance. We compare the results above to numerical simulation in Fig. 5.5, and see good agreement in asymptotic region of E for both shallow and deep circuits, while the variance in shallow circuits with finite energy E could deviate from the asymptotic predictions. 104 Similar to the single-mode case, the evaluation of correlators for non-Gaussian states is in general challenging. We conjecture that for general non-Gaussian states the barren plateau also holds, and support it with numerical results of preparation of randomly generated target states—a natural generalization to the one-mode case studied in Fig. 5.3, |ψ⟩m ∝ Pnc n1,n2=0 bn1,n2 |n1⟩m1 |n2⟩m2 , where each bn1,n2 ∼ N C 2 is randomly chosen. We post-select states within the energy window [Et − ϵ, Et + ϵ]. With the target states generated, we evaluate the gradient variance versus circuit energy for different choices of target state energy Et (indicated by the color) in Fig. 5.6. Again, the scaling of ∼ 1/E2 in shallow circuits (left) and ∼ 1/E4 in deep circuits (right) are verified. However, numerical simulation for non-Gaussian states with more modes is still challenging due to the enormous demand of computing resource. 5.1.2.3 Strategies to circumvent training issues Barren plateau in general creates a challenge in training VQCs to solve problems. Although general approaches of entirely solving the training issues seem out of reach due to complexity arguments, problemspecific heuristics is promising in mitigating barren plateau. For DV VQCs, various methods [258, 182, 259, 260, 261, 262] have been proposed to mitigate barren plateau in training. In the case of the CV VQCs, the random ensemble UE,L,M has a unique tunable parameter—the circuit energy E—besides the circuit depth, thanks to the infinite dimensional Hilbert space. Therefore, different random initialization strategies can be adopted by varying the circuit energy E. As shown in Fig. 5.7(a) for the preparation of a Fock number state, when we adopt different initial circuit energy E, the training history of the cost function (state infidelity) is drastically different. When the circuit energy E roughly matches the target state energy Et = 20, we see the best convergence. This is due to the peak of variance of gradient shown in Fig. 5.2(b) in Fock state preparation. As expected, the decrease of infidelity is also reflected from the ensemble state energy which decays to the target state energy, shown in Fig. 5.7(b). In 105 0 250 500 750 1000 Steps 10−1 100 Infidelity E = 0.5 E = 1 E = 20 E = 40 E = 60 0 20 40 60 State Energy 0 500 1000 Steps 0 20 40 60 Circuit Energy (a) (b) (c) Figure 5.7: Training for Fock state |ψ⟩m = |20⟩m with a L = 50 CV VQC initilized with different ensemble energy E. We show (a) average infidelity of output state, (b) average output state energy and (c) average circuit energy PL j=1 |βj | 2 versus training steps. contrast, we also plot the circuit energy, defined as PL ℓ=1 |βℓ | 2 , in Fig. 5.7(c), and all of them changed only slightly during the training. Overall, by initializing the circuit at low energy, one can mitigate the barren plateau to speed up the convergence. The best initial energy is, however, target state dependent. For instance, to prepare a Gaussian state, the quadrature mean of target state affects monotonicity of variance of gradient in the nonasymptotic region of E (see Eqs. (5.10) and (5.11)): for zero-mean cases, the gradients keep decreasing with increasing circuit energy, as we see in Fig. 5.5 for TMSV; for non-zero mean cases, the gradients are maximal when the circuit energy E matches the target state energy Et , as we see in Fig. 5.2(a). For general states, both monotonic decreasing cases and peaked cases can be found in Fig. 5.3 for random CV states, while the number state is found to be peaked in gradient variance in Fig. 5.2(b). In practice, when training 106 the circuit to prepare a specific state, one can spend some computation resource in evaluating the gradients in the low energy region (E ≲ Et ) to pick the best initial point before the actual training, which can drastically speed up the convergence. 5.1.3 Discussion In this paper, we explore the trainability of CV VQCs implemented with universal control based on ECD gates. Through examples of preparing M-mode general Gaussian states and Fock number states, we analytically identify the barren plateau phenomena that the variance of the gradient decays with the circuit energy Var ∼ 1/EM in shallow circuit, while Var ∼ 1/E2M in deep ones. The barren plateau is also numerically extended to the preparation of arbitrary non-Gaussian single-mode and two-mode CV states. To mitigate the barren plateau, we propose a strategy by tuning the ensemble energy in initialization to match the behavior of gradient variance, which is shown to be able to boost performance. Finally, we point out a few open problems. We have focused on the gradients with respect to qubit rotations to study the trainability of CV VQCs (as low trainability in part of the parameters suffices to demonstrate a barren plateau), it is unknown how the gradients with respect to the displacement parameters in the ECD gates decay with the energy and number of modes. It is also an important task to explore trainability in general tasks other than state preparation. In this regard, we believe that the energy-dependent barren plateau can be generalized to training any bounded multi-mode CV observables. As any bounded operator acting on qumodes has a Glaubenr–Sudarshan P representation [263, 264] in coherent state basis, the expectation value of the bounded operator can then be obtained by integration over coherent states’ expectation values. Therefore, arbitrary bounded cost function can always be interpreted as a weighted average (with possible complex weight) of the cost function for coherent states, and the energy-dependent barren plateau in state preparation should typically represent the trainability involving any bounded operator. For unbounded operators, the trainability of the CV VQC remains to be explored. Beyond barren 107 plateau, it is also of interest to explore phenomena such as traps [265], overparametrization [266, 267, 71, 26] and study extension of the theory of quantum neural tangent kernel [24, 25] in training the CV VQCs. We want to point out Ref. [268] on barren plateau in passive linear optical systems, which is nonuniversal and has very limited expressivity. The results there rely on Haar randomness in passive linear optics. On the contrary, our work adopts an entirely different approach to reveal the energy-dependent barren plateau in universal circuits and is relevant to important training problems in experiments [238, 239, 240, 241]. 5.1.4 Methods 5.1.4.1 Gaussian states Here we provide a succinct introduction to Gaussian states. More details can be found in Ref. [249], the convention of which is utilized here. A system consisting of M modes is described by M annihilation and creation operators, {mj , m † j }M j=1, and they satisfy the commutation relation [mj , m † j ′] = δj,j′. One can also describe it via the position and momentum operators in the phase space qj = mj + m † j and pj = i(m † j − mj ). Those quadratures can be grouped together to form a quadrature vector as X = (q1, p1, . . . , qM, pM) T . The first and second moments, which are also known as the mean quadrature and covariance matrix (CM) as X = ⟨X ⟩ (5.21) Vij = 1 2 ⟨{qi − ⟨qi⟩, qj − ⟨qj ⟩}⟩, (5.22) 108 where ⟨·⟩ represents the expectation and {A, B} is the anticommutator of operators A, B. Any one-mode pure Gaussian state can be represented as a rotated and displaced squeezed state |ψ⟩m = D(γ)R(τ )S(ζ)|0⟩m , (5.23) where R(τ ) = exp −iτm†m is the phase rotation and S(ζ) = exp ζ(m2 − m†2 )/2 is the squeezing operator. The displacement operator D(β) = exp βm† − β ∗m satisfies the braiding relation D(α)D(β) = e (αβ⋆−α ⋆β)/2D(α + β). Its mean quadrature and CM are X = (2 Re{γ}, 2 Im{γ}) T (5.24) V = e 2ζ sin2 (τ ) + e −2ζ cos2 (τ ) sin(2τ ) sinh(2ζ) sin(2τ ) sinh(2ζ) e 2ζ cos2 (τ ) + e −2ζ sin2 (τ ) , (5.25) Below we specify the CM and mean for some examples; A coherent state has τ = ζ = 0. Therefore, the CM is reduced to I2, a 2 × 2 identity matrix while the mean can be arbitrary; A single-mode squeezed vacuum (SMSV) state has γ = τ = 0. Therefore, it has zero mean X = 0 and a diagonal CM V = diag(e −2ζ , e2ζ ). A two-mode squeezed vacuum (TMSV) state is the maximally entangled Gaussian state for two modes, which can be generated by a two-mode squeezing operator S2(ζ) = exph ζ(m1m2 − m † 1m † 2 )/2 i on a product of vacuum states |0⟩m1 |0⟩m2 . It also has a zero mean and its CM becomes VTMSV = cosh(2ζ)I2 sinh(2ζ)σ z sinh(2ζ)σ z cosh(2ζ)I2 , (5.26) where I2 is 2 × 2 identity operator and σ z is Pauli-Z operator. 1 The fidelity between two Gaussian quantum state ρA, ρB is fully determined by the mean quadratures X A, X B and CMs VA, VB. Moreover, it can be analytically solved [269, 270, 271]. When one of the state is pure, it has the simple form as [270] F(ρA, ρB) = Trq√ ρAρB √ ρA 2 (5.27) = 2M p det(VA + VB) exp − 1 2 d T (VA + VB) −1d , (5.28) where d ≡ X A − X B. 5.1.4.2 Universality of ECD gate sets: proof of Lemma 6 For the CV VQCs in the main text involving interactions between a qubit and a qumode, Ref. [241] has shown that the gate set of ECD gates and single qubit rotations is universal, in the sense that linear combinations of repeated nested commutators of the gate generators cover the full Lie algebra of the qubitqumode system [272, 57]. Below, we review the proof. The generators of ECD gates and single qubit rotations are G = {qσz , pσz , σx , σy }, (5.29) where q, p are position and momentum of the qumode, and σ c with c ∈ {x, y, z} are Pauli operators on the qubit. First, the commutators between qσz , pσz and σ x , σy produce operators in the form of qσc , pσc with c ∈ {x, y, z}. To obtain operators of higher order polynomials of q, p, one can consider the commutator [qσa , qσb ] ∝ ϵabcq 2σ c where ϵabc is the three-dimensional Levi-Civita symbol. The commutation can be repeated to generate operators q jσ a with j ≥ 2 and similarly for p jσ a . To obtain the terms coupling quadratures along with Pauli operators, we consider the commutator [q j+1σ a , pσb ] = 2iϵabcq j+1pσc + (j + 1)ϵabcq jσ c , (5.30) 110 assuming a ̸= b. Combined with q jσ c , it leads to p j+1σ c . By repeating the process on commutator with pσb , one has all polynomial quadrature terms q jp kσ c with c ∈ {x, y, z}. The last step is to eliminate the Pauli operators by commutator [q j+1p kσ a , pσa ] ∝ q jp k . Therefore, all unitaries with generators in the form q jp kσ c with σ c ∈ {I2, σx , σy , σz} are achieved by the gate sets involving ECD gates and single-qubit rotations. Now we generalize the universality of ECD gates and single-qubit unitaries to arbitrary M ≥ 1 qumodes and N ≥ 1 qubits. The generators of ECD gates (between any qubit-qumode pair) combined with single qubit rotations are G = "[ M ℓ=1 [ N r=1 {qℓσ z r , pℓσ z r} #[ "[ N r=1 {σ x r , σy r } # . (5.31) For two modes ℓ, ℓ′ that are coupled to a qubit r by ECD gates, the commutator between q jℓ ℓ p kℓ ℓ σ a r and q jℓ ′ ℓ ′ p kℓ ′ ℓ ′ σ b r (assuming ℓ ̸= ℓ ′ ) is h q jℓ ℓ p kℓ ℓ σ a r , q jℓ ′ ℓ ′ p kℓ ′ ℓ ′ σ b r i = q jℓ ℓ p kℓ ℓ q jℓ ′ ℓ ′ p kℓ ′ ℓ ′ h σ a r , σb r i ∝ q jℓ ℓ p kℓ ℓ q jℓ ′ ℓ ′ p kℓ ′ ℓ ′ ϵabcσ c r . (5.32) Lastly, the commutator h q jℓ ℓ p kℓ ℓ q jℓ ′+1 ℓ ′ p kℓ ′ ℓ ′ σ x r , pℓ ′σ x r i ∝ q jℓ ℓ p kℓ ℓ q jℓ ′ ℓ ′ p k ′ ℓ ℓ ′ , where any operators in the form q jℓ ℓ p kℓ ℓ q jℓ ′ ℓ ′ p kℓ ′ ℓ ′ σ c r with σ c r ∈ {I2, σx r , σ y r , σz r} can be generated. Through repeated process above for the other M −2 modes and one can generate arbitrary unitary with generator QM ℓ=1 q jℓ ℓ p kℓ ℓ σ c r with universal control on M modes and one qubit. Next, we consider two qubits r, r′ connected to one mode ℓ, the commutator between q jℓ ℓ p kℓ+1 ℓ σ z r and qℓσ z r ′ is h q jℓ ℓ p kℓ+1 ℓ σ z r , qℓσ z r ′ i = q jℓ ℓ h p kℓ+1 ℓ , qℓ i σ z rσ z r ′ ∝ q jℓ ℓ p kℓ ℓ σ z rσ z r ′. (5.33) 111 With the commutators between q jℓ ℓ p kℓ ℓ σ z rσ z r ′ and {σ x r , σx r ′, σ y r , σ y r ′}, we have the form q jℓ ℓ p kℓ ℓ σ c rσ c ′ r ′ with σ c r , σc ′ r ′ ∈ {I2, σx r , σ y r , σz r}. By repeating the process discussed above, we can generate all unitaries whose generator is in the form of q jℓ ℓ p kℓ ℓ ( Q r σ c r ). Combined with the results with M modes and one qubit, we finally obtain generators in the form QM ℓ=1 q jℓ ℓ p kℓ ℓ QN r=1 σ c r where the universal control is shown to be performed on the system with M modes and n qubits. 5.2 Hybrid entanglement distribution between remote microwave quantum computers empowered by machine learning 5.2.1 Introduction Empowered by the law of quantum mechanics, quantum computers have the potential of speeding up the solution of various classically hard problems [3, 11, 10, 273]. Among the candidate platforms for quantum computing, superconducting circuits with Josephson junctions stand out with high scalability and strong single-photon nonlinearity, as exemplified by recent demonstrations [47, 48, 240, 241]. To unleash their full capabilities, quantum computers needs to be connected via entanglement [230, 232, 233]. Although direct microwave links can be used at short distances for proof-of-principle demonstrations [274, 275], the connection between superconducting quantum computing circuits over long distances is best implemented with optical photons via microwave-optical transduction [276, 277, 278, 279, 280, 281, 282, 283, 284]. Only optical photons can maintain the quantum coherence with the low dissipation and decoherence rate at room temperature. In spite of significant improvement, it is still challenging to realize high-performance microwave-optical transduction with near-unity efficiency and near-zero noise. Then a critical question is—what is the most efficient way to connect superconducting microwave quantum computers given the non-ideal performance of microwave-optical transduction. 112 (a) (b) Figure 5.8: (a) Interconnect system between two microwave quantum computers. Two cavities enable the generation of microwave-optical entanglement. The optical modes are detected for entanglement-swap, generating microwave-microwave entanglement after displacement Dˆ. Finally, the transmon qubits interact with the microwave modes to generate Bell pairs. (b) Schematic of hybrid local variational quantum circuits (hybrid LVQCs) to distill entanglement from noisy entangled microwave modes m1, m2 to transmon qubits q1, q2. The hybrid LVQC is shown in detail in the dashed box with D echoed conditional displacement (ECD) blocks (ECD gates and single qubit rotations) followed by displacement and rotation in the end. To answer this question, a fundamental issue to be resolved is the gap between continuous-variable (CV) and discrete-variable (DV) quantum information processing: The theory of quantum channel capacity indicates that the entire CV degree of freedom must be utilized to achieve a high entanglement rate [285]; On the other hand, microwave quantum computers typically use a DV approach for information encoding. Moreover, entanglement distillation protocols that improve entanglement quality are also designed separately for DV encoding [286, 287, 288, 289, 290, 291, 292, 293, 294] or CV encoding [295, 296, 297, 298, 299, 300, 301, 301, 302, 303, 304, 305]. In this work, we propose a hybrid CV-DV protocol to connect superconducting microwave quantum computers. CV microwave entanglement is first established through optical swapping of opticalmicrowave entanglement pairs. Then, we develop two entanglement conversion and distillation protocols to transform the noisy CV entanglement to high-fidelity DV entanglement, which can directly interface with transmon qubits. The first protocol adopts the direct CV-to-DV conversion followed by traditional DV entanglement distillation. The second protocol is based on a machine-learning enabled hybrid local variational quantum circuit (LVQC) to complete the conversion and distillation simultaneously, which adds 113 to the general toolbox of entanglement manipulation. Estimated with practical non-ideal device parameters, both protocols show huge rate advantage compared with single-photon based pure DV protocols. The LVQC protocol further shows more than ten-fold improvement for the fidelity-success-probability trade-off, compared with the direct CV-to-DV conversion protocol. 5.2.2 System setup The overall interconnect protocol consists of two steps, a CV entanglement generation step and a CVto-DV hybrid conversion and distillation step, as shown in Fig. 5.8(a). The ultimate goal is to generate high-fidelity entanglement at a high rate between two superconducting microwave quantum computers. To begin with, entanglement is generated between the transduction microwave and optical ports ({aˆ1, mˆ 1} and {aˆ2, mˆ 2}) on both sides. The optical modes aˆ1 and aˆ2 then travel through optical links to a center node for entanglement swap to generate noisy CV entanglement between microwave modes mˆ 1 and mˆ 2. Next, each microwave mode is coupled into a conversion and distillation module implemented with superconducting quantum computing circuits. Then high-fidelity DV entanglement can be generated between two remote microwave modes. While our protocol is general, we consider the special case of transmon qubits for the wide use in superconducting quantum computing. The LVQC conversion and distillation step is based on the controlled operation on the microwave modes using transmon qubits (Fig. 5.8(b)), and the final DV entanglement is also between two transmon qubits ({q1, q2}). For simplicity, we assume identical configuration and performance for the two quantum computing and transduction systems. We also want to point out that we are considering a system with tunable modequbit coupling so that the entanglement generation step and entanglement conversion/distillation step can be considered separately. At the same time, we are assuming steady-state operation with perfect spectraltemporal mode matching. 114 5.2.3 Entangling microwave modes via CV swap While our protocol applies to general transduction systems, we focus on cavity electro-optic systems [279, 306, 307, 280] for its simplicity as it does not involve intermediate excitations. A typical cavity electrooptic system is shown in Fig. 5.8(a), where the optical cavity with χ (2) nonlinearity is placed between the capacitors of a LC microwave resonator. The electric field of the microwave mode modulates the optical resonant frequency across the capacitor via changing the refractive index of the optical cavity. Due to the mixing (rectification) between optical pump and signal in χ (2) material, a microwave field can be generated. The interaction Hamiltonian has the standard three-wave mixing form between two optical modes (aˆℓ and ˆbℓ ) and one microwave mode (mˆ ℓ ) [285, 306, 307, 279, 280], Hˆ ℓ = iℏ(gaˆ † ℓ ˆbℓmˆ † ℓ − g ∗ aˆℓ ˆb † ℓmˆ ℓ), (5.34) with g the coupling coefficient and ℓ = 1, 2 for each side. When the optical mode ˆbℓ is coherently pumped, the optical mode aˆℓ and the microwave mode mˆ ℓ will be entangled in a noisy two-mode squeezed vacuum (TMSV) state ρˆm,o, with zero mean and a covariance matrix [285] Vm,o = 1 2 uI2 vZ2 vZ2 wI2 , (5.35) where Z2, I2 are Pauli-Z and identity matrix, and u = 1 + 8ζm[C + nin(1 − ζm)] (1 − C) 2 , (5.36a) v = 4 √ ζoζmC[1 + C + 2nin(1 − ζm)] (1 − C) 2 , (5.36b) w = 1 + 8Cζo [1 + nin (1 − ζm)] (1 − C) 2 . (5.36c) 115 Here ζo and ζm are the extraction efficiencies for the optical and microwave mode, respectively. The cooperativity C ∝ g 2 describes the interaction strength [285]. The optical thermal noise is neglected due to its small occupation while the microwave thermal noise has non-zero mean occupation number nin. To generate entanglement between remote microwave modes, we adopt an entanglement-swap approach. As shown in Fig. 5.8(a), consider two pairs of entangled microwave-optical modes {aˆ1, mˆ 1} and {aˆ2, mˆ 2}, each with the covariance matrix Eq. (5.35). One can interfere the optical modes aˆ1 and aˆ2 on a balanced beamsplitter and perform homodyne detection on the output. Conditioned on the homodyne results, displacement operations will be applied to the two microwave modes respectively. After the optical homodyne detection and conditioned displacement operation, the microwave modes mˆ 1 and mˆ 2 form a noisy TMSV state ρˆm,m with the covariance matrix Vm,m = 1 2 (u − v 2 2w )I2 v 2 2wZ2 v 2 2wZ2 (u − v 2 2w )I2 , (5.37) In the ideal lossless case with ζo = ζm = 1, the state ρˆm,m becomes a pure TMSV state. We first derive the ultimate rate of entanglement generation, as all later steps are local operations and classical communications (LOCC), which does not increase the entanglement rate [308]. To characterize the distillable entanglement, we calculate the upper bound by entanglement of formation (EoF) [286] and the lower bound by reverse coherent information (RCI) [309]. The unit of the rate is ebit per round, and can be understood as the following: a rate of r ebit per round means that in M ≫ 1 rounds, asymptotically Mr pairs of Bell states can be obtained if arbitrary encoding and decoding is allowed. To begin with, we consider the rate versus the cooperativity in Fig. 5.9. In the ideal case ζo = ζm = 1, EoF and RCI are equal and reduced to entanglement entropy (green line in Fig. 5.9a). For comparison, we also evaluate the EoF and RCI rate bounds of a pure DV protocol based on the time-bin entanglement [310], 116 Figure 5.9: Rate (in ebit per round) comparison (a) ζm = ζo = 1. (b) ζm = 0.95, ζo = 0.9 and nin = 0.2. The rate of CV entanglement swap is within green shaded and the rate of time-bin entanglement swap is within purple region. where microwave-optical single-photon entanglement is generated by post-selecting on the state ρˆm,o. For all cooperativity values, the proposed CV scheme has more than two order-of-magnitude rate advantage over the pure DV protocol based on time-bin entanglement. In Fig. 5.9b, we further consider the non-ideal extraction with ζm = 0.95 and ζo = 0.9, and non-zero microwave thermal noise nin = 0.2 (corresponding to ∼ 0.2 Kelvin temperature at 8 GHz). Although rigorous advantage only happens at cooperativity above 0.12, we expect the lower bound to be non-tight and actual advantage should still be large in the low cooperativity region. We note a recent paper [311] showing similar rate advantage with a different platform and approach. 5.2.4 Conversion and distillation protocols Next, we design protocols to obtain high-fidelity DV entanglement ρˆq,q between two transmon qubits from the noisy two-mode squeezing between microwave modes ρˆm,m. Distillation has been explored separately for either DV [286, 287, 288, 289, 290, 291, 292, 293, 294] or CV [295, 296, 297, 298, 299, 300, 301, 301, 302, 303, 304, 305] entanglement. Here, we require a hybrid conversion and distillation protocol that produces high fidelity DV entanglement from noisy CV entanglement. We present two different approaches: (i) direct swap where conversion and distillation are separate; and (ii) hyrid LVQC where universal control completes both conversion and distillation simultaneously. With symmetric configuration between 117 the two ends of the interconnect system, we will omit subscripts in the following. As pracrical entanglement distillation is by nature probabilistic, the performance will be quantified by two metrics—the success probability and the fidelity conditioned on success. 5.2.4.1 Direct conversion We begin with a simple direct conversion from CV to noisy entangled qubit pairs. Inspired by Ref. [312], we consider the interaction between the microwave mode and a transmon qubit with Hamiltonian Hˆ swap = ˆm ⊗ |1⟩⟨0| q + h.c., (5.38) We can control the interaction time t such that the unitary operator Uˆ = exp −itHˆ swap gives the maximum EoF between the qubits from the two sides after disregarding the corresponding microwave modes (see Appendix E.2.2). This will produce a noisy entangled qubit pair deterministically. One can then follow up with further DV distillation protocols. Later we show that such a simple scheme is able to convert most of the entanglement already, therefore we do not consider more complicated CV-to-DV conversion schemes [313]. 5.2.4.2 Variational hybrid conversion and distillation To explore the ultimate performance, we further use a variational approach to design a hybrid conversion and distillation protocol, as shown in Fig. 5.8(b). Two qubits are initialized in |0⟩ q , and put into interaction with the two microwave modes respectively. The hybrid LVQC includes a series of single-qubit rotation and echoed conditional displacement (ECD), followed by another displacement at the end [241]. Each single qubit rotation is characterized by two angles ϕ and θ, Rˆ(θ, ϕ) = exp[−i(θ/2)(ˆσx cos ϕ + ˆσy sin ϕ)] (5.39) 118 where σˆx and σˆy are the Pauli operators. Each ECD gate acts on the microwave mode mˆ and the qubit qˆ as ECD( ˆ β) = Dˆ (β/2) ⊗ |1⟩ ⟨0| q + h.c., (5.40) where Dˆ(α) ≡ exp αmˆ † − α ∗mˆ is the displacement operator. Compared with previous proposals for universal control [238, 239], the considered ECD gate approach has advantage in both control speed and fidelity [241]. To determine the success of entanglement conversion and distillation, we perform measurements characterized by positive operator-valued measure (POVM) {Πˆ s, ˆI − Πˆ s}. The success probability is therefore Psuccess = Tr n Πˆ s UˆD ⊗ UˆD ρˆqm,qm Uˆ† D ⊗ Uˆ† D o , (5.41) where ρˆqm,qm is the composite initial quantum state including both microwave modes and qubits on the two sides and UˆD is the unitary of the LVQC with D layers of ECD and single-qubit rotations. We choose photon counting on the microwave modes and post-select on photon number lower than certain threshold nth, so Πs = Pnth−1 i,j=0 |i⟩⟨i|m1 ⊗ |j⟩⟨j|m2 , where |k⟩ is the number state. In this paper, we choose nth = 5, which is half of the photon number cut off in simulation. Due to the variational training, we expect the performance will not depend on the photon number threshold, as long as it is not too small. The state of transmon qubits conditioned on successful conversion and distillation is given by ρˆq,q ∝ Trm,m n Πˆ s UˆD ⊗ UˆD ρˆqm,qm Uˆ† D ⊗ Uˆ† D Πˆ s o . (5.42) We train the hybrid LVQC towards distilling a perfect Bell pair |Ψ+⟩ ≡ (|0⟩ q |0⟩ q +|1⟩ q |1⟩ q )/ √ 2. The cost function is defined with the success probability Psuccess and fidelity F (ˆρqq) = ⟨Ψ+| ρˆq,q |Ψ+⟩, as C ({β}, {θ}, {ϕ}) = (1 − Psuccess) + λ × Softplus(Fc − F), (5.43) 11 � one copy � two copy Hybrid LVQC PPT bound Direct swap Direct swap PPT bound Hybrid LVQC+DV Direct swap PPT bound Direct swap Direct swap +DEJMPS +DV LVQC time bin swap Figure 5.10: Infidelity of transmon qubits versus success probability on (a) one-copy and (b) two-copy entangling microwave modes with C = 0.1, nin = 0.2, ζm = 0.992, ζo = 0.99. The PPT bound (black) in (a) goes below 10−4 and saturates; red and pink dashed curves in (a)(b) correspond to PPT bound of one-copy and two-copy two-qubit state by direct swap. There are D = 10 ECD blocks in hybrid LVQCs and L = 6 layers in DV LVQCs. where the penalty coefficient λ and critical fidelity Fc are hyperparameters to tune the tradeoff between success probability and fidelity. The softplus function Softplus(x) ≡ log (1 + e γx) /γ with γ = 20 is introduced to enable a smooth penalty on the F ≤ Fc events. Both the direct swap and hybrid LVQC can be operated on a single copy of ρˆm,m to generate a single copy (M = 1) of entangled qubit pair. To further improve their performance, we also consider a two-copy case (M = 2) where two copies of noisy CV entanglement ρˆm,m are first converted to two pairs of noisy entangled qubits, and then further DV distillation is performed on the two noisy qubit pairs to produce the final entangled state. In the last step, we consider both the traditional DEJMPS protocol [288] and a DV LVQC protocol. The DV LVQC protocol utilizes a standard universal gate set of single-qubit rotations and CNOTs, with Pauli-Z measurement providing the post-selection and the same cost function as Eq. (5.43), as we detail in Appendix E.2.3. 5.2.4.3 Performance comparison We begin our performance comparison with the trade-off between infidelity 1 − F and the success probability Psuccess. As shown in Fig. 5.10(a), the simple direct swap produces 1−F ∼ 0.23 deterministically (red 120 one copy two copy Figure 5.11: Entanglement rate per copy on (a) one-copy and (b) two-copy noisy entangled microwave modes versus infidelity 1 − F, at identical parameters to Fig. 5.10. Dot-dashed and dashed curves correspondingly represent RCI lower bounds and EoF upper bounds. Shaded areas and line segments show the range of rate. The shallow blue curves and areas in (b) are same as (a) for comparison. cross). The performance of any protocol following the direct swap will be bounded by the positive-partialtranspose (PPT) bound (red dashed), which can be numerically evaluated by a semidefinite program [289]. In contrast, the hybrid LVQC approach directly achieves an one-order-of-magnitude advantage in the infidelity (blue solid). Indeed, when applying the PPT bound on ρˆm,m, the (generally loose) lower bound of infidelity for LVQC protocols (black dashed) also decreases substantially, compared with the PPT bound from the direct swap (red dashed). To further lower the infidelity, we consider the two-copy case, where DV processing is further performed on the output of the one-copy case. We first consider the traditional DEJMPS following the direct swap protocol as the reference (pink cross in Fig. 5.10(b)). When the traditional DEJMPS is replaced by DV LVQC (orange line), lower infidelity can be achieved with lower success probability. When we use DV LVQC to further distill the two qubit pairs from the hybrid LVQC, a two-order-of-magnitude lower infidelity (green) can be achieved. Note that even compared with the (generally loose) two-copy PPT lower bound of the direct swap approach (pink dashed), advantages can still be identified in the low infidelity region. As a benchmark, the time-bin based entanglement swap (purple star) is one-order-of-magnitude worse in terms of the infidelity, when compared with the hybrid LVQC (green) at the same success probability. 121 Although we only considered two copies, the advantage of the hyrid LVQC scheme over the direct swap scheme can be generalized to more copies. As the hyrid LVQC protocol starts with better entanglement generation performance (Fig. 5.10(a)), it will also perform better when further distillation steps are implemented. Finally, we compare the entanglement rate per copy for a M-copy conversion and distillation protocol, RE(ˆρq,q) = Psuccess × E (ˆρq,q) /M ≤ E (ˆρm,m), (5.44) where E (·) is the distillable entanglement. As we discussed, we utilize the lower and upper bounds of distillable entanglement, RCI and EoF for evaluation. In Fig. 5.11, we see that the rate in general decays as the required infidelity decreases; this is because we are considering a fixed number of copies, which is far from an asymptotic limit. In the one-copy case, the hybrid LVQC approach (blue) is able to achieve close to optimal rate while improving the fidelity up to 0.97, while further improving fidelity drastically decreases the rate; at the same time, direct swap (red) only achieves a single point with a large infidelity and a similar rate, compared with a versatile rate-infidelity trade-off enabled by LVQC. In the two-copy case, hybrid LVQC with additional DV post-processing is able to improve the infidelity to ∼ 10−3 while the rate is kept ∼ 10−2 (green). At higher infidelity, hybrid LVQC is also as good as the other approaches (orange) with direct swap and additional DV post-processing (including a DV LVQC circuit). Most notably, at the same level of infidelity, the rate of the hybrid LVQC approach (green) is a factor of 3.4 higher than the time-bin entanglement swap approach (purple dot), as shown in Fig. 5.11(b). In Fig. 5.11(b), we also compare the two-copy results with the one-copy hybrid LVQC results (shallow blue). It is interesting to note that the one-copy protocol has higher rates for the same infidelity when 1 − F ≳ 0.024. This is mainly due to the fact that in a two-copy protocol we need to post-select on 122 Figure 5.12: Infidelity of transmon qubits (orange) and distillation success probability (blue) versus depolarization noise on one-copy entangling microwave modes at identical parameters as in Fig. 5.10. Orange and blue horizontal lines indicate the infidelity and success probability without noise. two one-copy success events, which substantially reduces the overall success probability. However, if one wants to reach substantially lower infidelity, then two-copy protocols give a much better rate. In practice, the transmon qubits in the hybrid system can undergo various types of noise and decoherence. To bring an insight to the noisy scenario of hybrid LVQCs, we consider the depolarizing noise accompanied with each qubit operation. Such noise is modeled by a depolarization channel ∆p with parameter p, which takes an input single-qubit state ρˆ to the output state contaminated by the fully mixed state ˆI/2 as ∆p(ˆρ) = (1 − p)ˆρ + p ˆI/2. In short, for a D-layer hybrid LVQC, the qubits undergo D + 1 depolarization channels. To understand the performance decay with noise, we consider the same hybrid LVQC set-up of Fig. 5.10 at the operating point of infidelity at 1 − F ∼ 0.02 and success probability Psuccess ∼ 0.017. The simulation results are shown Fig. 5.12. Starting from the noiseless circuit achieving infidelity at 1 − F ∼ 0.02 (orange dashed) and success probability Psuccess ∼ 0.017 (blue dashed), as the noise become stronger, we see the infidelity grows as expected. While for success probability, as the depolarization channel leads to a mixture with the maximally mixed state, entangling modes through the hybrid LVQCs approaches a uniform superposition of displacements determined by noiseless optimization with stronger noise p, therefore the success probability p increases given the fixed final measurement, though slowly compared to the infidelity. 123 5.2.5 Discussions We propose an interconnect system based on CV entanglement swap and hybrid variational entanglement conversion and distillation to entangle microwave superconducting quantum computers. Our approach provides a huge rate and fidelity advantage compared with time-bin approach. In particular, the hybrid conversion and distillation protocol provides an infidelity-success-probability trade-off, with orderof-magnitude advantage over the direct swap approach. In the multi-copy scenario, it is an open problem how to optimize the circuit design to get closer to the PPT lower bound. While we have analyzed the depolarizing noise in qubit operations, the impact of other non-ideal factors in experiments still need further investigation. For example, the upper and lower bounds are calculated assuming steady-state operation with perfect spectral-temporal mode matching. Non-perfect interference and cavity reflection can happen with spectral-temporal mismatching, leading to the decrease of the entanglement rate. Therefore, the upper bound is still accurate, but the lower bound should be calculated depending on specific device metrics. Although we have considered cavity electro-optics as the transduction system, our protocol also applies to other transduction systems and the analyses can be done in the same way based on a modified interaction Hamiltonian Eq. (5.35). 124 Chapter 6 Conclusions This thesis investigated several important questions, trainability, dynamics about QNNs and identified different kinds of phase transitions from it including computation hardness phase transition of QAOA, dynamical phase transition of deep QNN induced by target value in optimization. In the applications of QNNs, for supervised learning we studied the state classification performance of QNN, and for unsupervised learning, we proposed the QuDDPM for generative learning of an ensemble of quantum states. There are still many open questions that need to be answered. The dynamics of solved QNN in QML with multiple input data states can provide more fruitful physics and interesting phase diagrams. Meanwhile, it is also important to characterize the training dynamics for relatively shallow QNNs and special-designed ansatz such as QAOA, which are practical in the stage without fault-tolerant computing and may lead to a deeper understanding towards potential quantum advantage. A more quantitatively accurate understanding of QuDDPM’s performance in ensemble generation and the reason of its advantage over other candidates are also important questions. 125 Bibliography [1] Richard P Feynman. “Simulating Physics with Computers”. In: International Journal of Theoretical Physics 21.6/7 (1982). [2] CH BENNET. “Quantum cryptography: Public key distribution and coin tossing”. In: Proc. of IEEE Int. Conf. on Comp. Sys. and Signal Proc., Dec. 1984. 1984. [3] Peter W Shor. “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer”. In: SIAM review 41.2 (1999), pp. 303–332. [4] Artur K Ekert. “Quantum cryptography based on Bell’s theorem”. In: Phys. Rev Lett. 67.6 (1991), p. 661. [5] Peter W Shor. “Fault-tolerant quantum computation”. In: Proceedings of 37th conference on foundations of computer science. IEEE. 1996, pp. 56–65. [6] John Preskill. “Fault-tolerant quantum computation”. In: Introduction to quantum computation and information. World Scientific, 1998, pp. 213–269. [7] Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. “Quantum-enhanced measurements: beating the standard quantum limit”. In: Science 306.5700 (2004), pp. 1330–1336. [8] Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. “Quantum metrology”. In: Phys. Rev. Lett. 96.1 (2006), p. 010401. [9] Michael A Nielsen and Isaac L Chuang. Quantum computation and quantum information. Vol. 2. Cambridge university press Cambridge, 2001. [10] Lov K Grover. “A fast quantum mechanical algorithm for database search”. In: Proceedings of the twenty-eighth annual ACM symposium on Theory of computing. 1996, pp. 212–219. [11] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd. “Quantum algorithm for linear systems of equations”. In: Phys. Rev. Lett. 103.15 (2009), p. 150502. [12] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. “An introduction to quantum machine learning”. In: Contemp. Phys. 56.2 (2015), pp. 172–185. 126 [13] Marco Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, et al. “Variational quantum algorithms”. In: Nat. Rev. Phys. 3.9 (2021), pp. 625–644. [14] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J Love, Alan Aspuru-Guzik, and Jeremy L Obrien. “A variational eigenvalue solver on a photonic quantum processor”. In: Nat. Commun. 5.4213 (2014), p. 4213. [15] Jarrod R McClean, Jonathan Romero, Ryan Babbush, and Alán Aspuru-Guzik. “The theory of variational hybrid quantum-classical algorithms”. In: New J. Phys. 18.2 (2016), p. 023023. [16] Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush, and Hartmut Neven. “Barren plateaus in quantum neural network training landscapes”. In: Nat. Commun 9.1 (2018), p. 4812. [17] Andrew Arrasmith, Zoë Holmes, Marco Cerezo, and Patrick J Coles. “Equivalence of quantum barren plateaus to cost concentration and narrow gorges”. In: Quantum Sci. Technol. 7.4 (2022), p. 045015. [18] Marco Cerezo, Akira Sone, Tyler Volkoff, Lukasz Cincio, and Patrick J Coles. “Cost function dependent barren plateaus in shallow parametrized quantum circuits”. In: Nat. Commun 12.1 (2021), p. 1791. [19] Arthur Pesah, Marco Cerezo, Samson Wang, Tyler Volkoff, Andrew T Sornborger, and Patrick J Coles. “Absence of barren plateaus in quantum convolutional neural networks”. In: Phys. Rev. X 11.4 (2021), p. 041011. [20] Zoë Holmes, Kunal Sharma, Marco Cerezo, and Patrick J Coles. “Connecting ansatz expressibility to gradient magnitudes and barren plateaus”. In: PRX Quantum 3.1 (2022), p. 010313. [21] Carlos Ortiz Marrero, Mária Kieferová, and Nathan Wiebe. “Entanglement-induced barren plateaus”. In: PRX Quantum 2.4 (2021), p. 040316. [22] Samson Wang, Enrico Fontana, Marco Cerezo, Kunal Sharma, Akira Sone, Lukasz Cincio, and Patrick J Coles. “Noise-induced barren plateaus in variational quantum algorithms”. In: Nat. Commun 12.1 (2021), p. 6961. [23] Martin Larocca, Piotr Czarnik, Kunal Sharma, Gopikrishnan Muraleedharan, Patrick J Coles, and Marco Cerezo. “Diagnosing barren plateaus with tools from quantum optimal control”. In: Quantum 6 (2022), p. 824. [24] Junyu Liu, Francesco Tacchino, Jennifer R Glick, Liang Jiang, and Antonio Mezzacapo. “Representation learning via quantum neural tangent kernels”. In: PRX Quantum 3.3 (2022), p. 030323. [25] Junyu Liu, Khadijeh Najafi, Kunal Sharma, Francesco Tacchino, Liang Jiang, and Antonio Mezzacapo. “Analytic theory for the dynamics of wide quantum neural networks”. In: Phys. Rev. Lett. 130.15 (2023), p. 150601. 127 [26] Xuchen You, Shouvanik Chakrabarti, and Xiaodi Wu. “A convergence theory for over-parameterized variational quantum eigensolvers”. In: arXiv preprint arXiv:2205.12481 (2022). [27] Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, Maika Takita, Markus Brink, Jerry M Chow, and Jay M Gambetta. “Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets”. In: Nature 549.7671 (2017), pp. 242–246. [28] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. “A quantum approximate optimization algorithm”. In: arXiv preprint arXiv:1411.4028 (2014). [29] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. “Quantum machine learning”. In: Nature 549.7671 (2017), pp. 195–202. [30] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018. [31] Iris Cong, Soonwon Choi, and Mikhail D Lukin. “Quantum convolutional neural networks”. In: Nat. Phys. 15.12 (2019), pp. 1273–1278. [32] Patrick Rebentrost, Masoud Mohseni, and Seth Lloyd. “Quantum support vector machine for big data classification”. In: Phys. Rev. Lett. 113.13 (2014), p. 130503. [33] Seth Lloyd. “Enhanced sensitivity of photodetection via quantum illumination”. In: Science 321.5895 (2008), pp. 1463–1465. [34] Si-Hui Tan, Baris I Erkmen, Vittorio Giovannetti, Saikat Guha, Seth Lloyd, Lorenzo Maccone, Stefano Pirandola, and Jeffrey H Shapiro. “Quantum illumination with Gaussian states”. In: Phys. Rev. Lett. 101.25 (2008), p. 253601. [35] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “Generative adversarial nets”. In: Advances in neural information processing systems 27 (2014). [36] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. “Score-Based Generative Modeling through Stochastic Differential Equations”. In: The Ninth International Conference on Learning Representations. 2021. url: https://openreview.net/forum?id=PxTIG12RRHS. [37] Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models”. In: Advances in neural information processing systems 33 (2020), pp. 6840–6851. [38] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. “Improving language understanding by generative pre-training”. In: (2018). [39] Seth Lloyd and Christian Weedbrook. “Quantum generative adversarial learning”. In: Phys. Rev. Lett. 121.4 (2018), p. 040502. [40] Bingzhi Zhang, Akira Sone, and Quntao Zhuang. “Quantum computational phase transition in combinatorial problems”. In: npj Quantum Inf. 8.1 (2022), p. 87. 128 [41] Bingzhi Zhang, Junyu Liu, Xiao-Chuan Wu, Liang Jiang, and Quntao Zhuang. “Dynamical phase transition in quantum neural networks with large depth”. In: arXiv preprint arXiv:2311.18144 (2023). [42] Bingzhi Zhang and Quntao Zhuang. “Fast decay of classification error in variational quantum circuits”. In: Quantum Sci. Technol. 7.3 (2022), p. 035017. [43] Bingzhi Zhang, Peng Xu, Xiaohui Chen, and Quntao Zhuang. “Generative quantum machine learning via denoising diffusion probabilistic models”. In: Phys. Rev. Lett. 132.10 (2024), p. 100602. [44] Bingzhi Zhang and Quntao Zhuang. “Energy-dependent barren plateau in bosonic variational quantum circuits”. In: arXiv preprint arXiv:2305.01799 (2023). [45] Bingzhi Zhang, Jing Wu, Linran Fan, and Quntao Zhuang. “Hybrid entanglement distribution between remote microwave quantum computers empowered by machine learning”. In: Phys. Rev. Appli. 18.6 (2022), p. 064016. [46] John Preskill. “Quantum computing in the NISQ era and beyond”. In: Quantum 2 (2018), p. 79. [47] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando GSL Brandao, David A Buell, et al. “Quantum supremacy using a programmable superconducting processor”. In: Nature 574.7779 (2019), pp. 505–510. [48] Yulin Wu, Wan-Su Bao, Sirui Cao, Fusheng Chen, Ming-Cheng Chen, Xiawei Chen, Tung-Hsun Chung, Hui Deng, Yajie Du, Daojin Fan, et al. “Strong quantum computational advantage using a superconducting quantum processor”. In: Phys. Rev. Lett. 127.18 (2021), p. 180501. [49] Matthew P Harrigan, Kevin J Sung, Matthew Neeley, Kevin J Satzinger, Frank Arute, Kunal Arya, Juan Atalaya, Joseph C Bardin, Rami Barends, Sergio Boixo, et al. “Quantum approximate optimization of non-planar graph problems on a planar superconducting processor”. In: Nat. Phys. 17.3 (2021), pp. 332–336. [50] Peter C Cheeseman, Bob Kanefsky, William M Taylor, et al. “Where the really hard problems are.” In: IJCAI. Vol. 91. 1991, pp. 331–337. [51] David Mitchell, Bart Selman, Hector Levesque, et al. “Hard and easy distributions of SAT problems”. In: AAAI. Vol. 92. Citeseer. 1992, pp. 459–465. [52] Dimitris Achlioptas, Arthur Chtcherba, Gabriel Istrate, and Cristopher Moore. “The phase transition in 1-in-k SAT and NAE 3-SAT”. In: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms. 2001, pp. 721–722. [53] Kevin Leyton-Brown, Holger H Hoos, Frank Hutter, and Lin Xu. “Understanding the empirical hardness of NP-complete problems”. In: Commun. ACM 57.5 (2014), pp. 98–107. [54] Vamsi Kalapala and Cris Moore. “The phase transition in exact cover”. In: arXiv:cs/0508037 (2005). 129 [55] Johan Håstad. “Some optimal inapproximability results”. In: J. ACM (JACM) 48.4 (2001), pp. 798–859. [56] Vishwanathan Akshay, Hariphan Philathong, Mauro ES Morales, and Jacob D Biamonte. “Reachability deficits in quantum approximate optimization”. In: Phys. Rev. Lett. 124.9 (2020), p. 090504. [57] Domenico d’Alessandro. Introduction to quantum control and dynamics. Chapman and hall/CRC, 2021. [58] Xiaoting Wang, Daniel Burgarth, and S. Schirmer. “Subspace controllability of spin- 1 2 chains with symmetries”. In: Phys. Rev. A 94 (2016), p. 052319. [59] Domenico D’Alessandro. “Constructive decomposition of the controllability Lie algebra for quantum systems”. In: IEEE transactions on automatic control 55.6 (2010), pp. 1416–1421. [60] Christoph Dankert, Richard Cleve, Joseph Emerson, and Etera Livine. “Exact and approximate unitary 2-designs and their application to fidelity estimation”. In: Phys. Rev. A 80.1 (2009), p. 012304. [61] Daniel A Roberts and Beni Yoshida. “Chaos and complexity by design”. In: J. High Energy Phys. 2017.4 (2017), p. 121. [62] Adam Nahum, Jonathan Ruhman, Sagar Vijay, and Jeongwan Haah. “Quantum entanglement growth under random unitary dynamics”. In: Phys. Rev. X 7.3 (2017), p. 031016. [63] Quntao Zhuang, Thomas Schuster, Beni Yoshida, and Norman Y Yao. “Scrambling and complexity in phase space”. In: Phys. Rev. A 99.6 (2019), p. 062334. [64] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, Joshua Lapan, Andrew Lundgren, and Daniel Preda. “A quantum adiabatic evolution algorithm applied to random instances of an NP-complete problem”. In: Science 292.5516 (2001), pp. 472–475. [65] AP Young, S Knysh, and VN Smelyanskiy. “First-order phase transition in the quantum adiabatic algorithm”. In: Phys. Rev. Lett. 104.2 (2010), p. 020502. [66] Quntao Zhuang. “Increase of degeneracy improves the performance of the quantum adiabatic algorithm”. In: Phys. Rev. A 90.5 (2014), p. 052317. [67] Andreas Goerdt. “A threshold for unsatisfiability”. In: International Symposium on Mathematical Foundations of Computer Science. Springer. 1992, pp. 264–274. [68] D. E. Knuth. “Dancing links”. In: arXiv:cs/0011047 (2000). [69] Michael R Garey, David S Johnson, and Larry Stockmeyer. “Some simplified NP-complete problems”. In: Proceedings of the sixth annual ACM symposium on Theory of computing. 1974, pp. 47–63. 130 [70] Andreas Bengtsson, Pontus Vikstål, Christopher Warren, Marika Svensson, Xiu Gu, Anton Frisk Kockum, Philip Krantz, Christian Križan, Daryoush Shiri, Ida-Maria Svensson, et al. “Improved success probability with greater circuit depth for the quantum approximate optimization algorithm”. In: Phys. Rev. Applied 14.3 (2020), p. 034010. [71] Martin Larocca, Nathan Ju, Diego García-Martín, Patrick J Coles, and Marco Cerezo. “Theory of overparametrization in quantum neural networks”. In: Nat. Comput. Sci. 3.6 (2023), pp. 542–551. [72] Adam Nahum, Sagar Vijay, and Jeongwan Haah. “Operator spreading in random unitary circuits”. In: Phys. Rev. X 8.2 (2018), p. 021014. [73] Vicky Choi. “Adiabatic quantum algorithms for the NP-complete Maximum-Weight Independent set, Exact Cover and 3SAT problems”. In: arXiv:1004.2226 (2010). [74] Andrew Lucas. “Ising formulations of many NP problems”. In: Front. Phys. 2 (2014), p. 5. [75] Shuichi Sakai, Mitsunori Togasaki, and Koichi Yamazaki. “A note on greedy algorithms for the maximum weighted independent set problem”. In: Discret. Appl. Math. 126.2-3 (2003), pp. 313–322. [76] Akihisa Kako, Takao Ono, Tomio Hirata, and Magnús M Halldórsson. “Approximation algorithms for the weighted independent set problem”. In: International Workshop on Graph-Theoretic Concepts in Computer Science. Springer. 2005, pp. 341–350. [77] Wenceslas Fernandez de la Vega and Marek Karpinski. “1.0957-approximation algorithm for random MAX-3SAT”. In: RAIRO-Operations Research 41.1 (2007), pp. 95–103. [78] J Robert Johansson, Paul D Nation, and Franco Nori. “QuTiP: An open-source Python framework for the dynamics of open quantum systems”. In: Comput. Phys. Commun. 183.8 (2012), pp. 1760–1772. [79] Donny Cheung, Peter Høyer, and Nathan Wiebe. “Improved error bounds for the adiabatic approximation”. In: J. Phys. A 44.41 (2011), p. 415302. [80] Tameem Albash and Daniel A Lidar. “Adiabatic quantum computation”. In: Rev. Mod. Phys 90.1 (2018), p. 015002. [81] Francesca Albertini and Domenico D’Alessandro. “Controllability of symmetric spin networks”. In: J. Math. Phys. 59 (2008), p. 052102. [82] Sam McArdle, Suguru Endo, Alán Aspuru-Guzik, Simon C Benjamin, and Xiao Yuan. “Quantum computational chemistry”. In: Rev. Mod. Phys. 92.1 (2020), p. 015003. [83] Sepehr Ebadi, Alexander Keesling, Madelyn Cain, Tout T Wang, Harry Levine, Dolev Bluvstein, Giulia Semeghini, Ahmed Omran, J-G Liu, Rhine Samajdar, et al. “Quantum optimization of maximum independent set using Rydberg atom arrays”. In: Science 376.6598 (2022), pp. 1209–1215. [84] Xiao Yuan, Suguru Endo, Qi Zhao, Ying Li, and Simon C Benjamin. “Theory of variational quantum simulation”. In: Quantum 3 (2019), p. 191. 131 [85] Yong-Xin Yao, Niladri Gomes, Feng Zhang, Cai-Zhuang Wang, Kai-Ming Ho, Thomas Iadecola, and Peter P Orth. “Adaptive variational quantum dynamics simulations”. In: PRX Quantum 2.3 (2021), p. 030307. [86] Junaid ur Rehman, Seongjin Hong, Yong-Su Kim, and Hyundong Shin. “Variational estimation of capacity bounds for quantum channels”. In: Phys. Rev. A 105.3 (2022), p. 032616. [87] Quntao Zhuang and Zheshen Zhang. “Physical-layer supervised learning assisted by an entangled sensor network”. In: Phys. Rev. X 9.4 (2019), p. 041023. [88] Yi Xia, Wei Li, Quntao Zhuang, and Zheshen Zhang. “Quantum-enhanced data classification with a variational entangled sensor network”. In: Phys. Rev. X 11.2 (2021), p. 021047. [89] Peter Wittek. Quantum machine learning: what quantum computing means to data mining. Academic Press, 2014. [90] Edward Farhi and Hartmut Neven. “Classification with quantum neural networks on near term processors”. In: arXiv:1802.06002 (2018). [91] Vedran Dunjko and Hans J Briegel. “Machine learning & artificial intelligence in the quantum domain: a review of recent progress”. In: Rep. Prog. Phys. 81.7 (2018), p. 074001. [92] Maria Schuld and Nathan Killoran. “Quantum machine learning in feature Hilbert spaces”. In: Phys. Rev. Lett. 122.4 (2019), p. 040504. [93] Vojtěch Havlíček, Antonio D Córcoles, Kristan Temme, Aram W Harrow, Abhinav Kandala, Jerry M Chow, and Jay M Gambetta. “Supervised learning with quantum-enhanced feature spaces”. In: Nature 567.7747 (2019), pp. 209–212. [94] Amira Abbas, David Sutter, Christa Zoufal, Aurélien Lucchi, Alessio Figalli, and Stefan Woerner. “The power of quantum neural networks”. In: Nat. Computat. Sci. 1.6 (2021), pp. 403–409. [95] Zhongchu Ni, Sai Li, Xiaowei Deng, Yanyan Cai, Libo Zhang, Weiting Wang, Zhen-Biao Yang, Haifeng Yu, Fei Yan, Song Liu, et al. “Beating the break-even point with a discrete-variable-encoded logical qubit”. In: Nature 616.7955 (2023), pp. 56–60. [96] VV Sivak, Alec Eickbusch, Baptiste Royer, Shraddha Singh, Ioannis Tsioutsios, Suhas Ganjam, Alessandro Miano, BL Brock, AZ Ding, Luigi Frunzio, et al. “Real-time quantum error correction beyond break-even”. In: Nature 616.7955 (2023), pp. 50–55. [97] Huitao Shen, Pengfei Zhang, Yi-Zhuang You, and Hui Zhai. “Information scrambling in quantum neural networks”. In: Phys. Rev. Lett. 124.20 (2020), p. 200504. [98] Roy J Garcia, Kaifeng Bu, and Arthur Jaffe. “Quantifying scrambling in quantum neural networks”. In: J. High Energy Phys. 2022.3 (2022), pp. 1–40. [99] Junyu Liu, Zexi Lin, and Liang Jiang. “Laziness, barren plateau, and noise in machine learning”. In: arXiv:2206.09313 (2022). 132 [100] Xinbiao Wang, Junyu Liu, Tongliang Liu, Yong Luo, Yuxuan Du, and Dacheng Tao. “Symmetric pruning in quantum neural networks”. In: arXiv:2208.14057 (2022). [101] Li-Wei Yu, Weikang Li, Qi Ye, Zhide Lu, Zizhao Han, and Dong-Ling Deng. “Expressibility-induced Concentration of Quantum Neural Tangent Kernels”. In: arXiv:2311.04965 (2023). [102] Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. “Deep neural networks as gaussian processes”. In: arXiv:1711.00165 (2017). [103] Arthur Jacot, Franck Gabriel, and Clément Hongler. “Neural tangent kernel: Convergence and generalization in neural networks”. In: Advances in neural information processing systems 31 (2018). [104] Jaehoon Lee, Lechao Xiao, Samuel Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, and Jeffrey Pennington. “Wide neural networks of any depth evolve as linear models under gradient descent”. In: Adv. Neural Inf. Process. Syst. 32 (2019), pp. 8572–8583. [105] Jascha Sohl-Dickstein, Roman Novak, Samuel S Schoenholz, and Jaehoon Lee. “On the infinite width limit of neural networks with a standard parameterization”. In: arXiv:2001.07301 (2020). [106] Greg Yang and Edward J Hu. “Feature learning in infinite-width neural networks”. In: arXiv:2011.14522 (2020). [107] Sho Yaida. “Non-Gaussian processes and neural networks at finite widths”. In: Mathematical and Scientific Machine Learning. PMLR. 2020, pp. 165–192. [108] Sanjeev Arora, Simon S Du, Wei Hu, Zhiyuan Li, Russ R Salakhutdinov, and Ruosong Wang. “On exact computation with an infinitely wide neural net”. In: Advances in neural information processing systems 32 (2019). [109] Ethan Dyer and Guy Gur-Ari. “Asymptotics of wide networks from feynman diagrams”. In: arXiv:1909.11304 (2019). [110] James Halverson, Anindita Maiti, and Keegan Stoner. “Neural networks and quantum field theory”. In: Mach. Learn.: Sci. Technol. 2.3 (2021), p. 035002. [111] Daniel A Roberts. “Why is AI hard and Physics simple?” In: arXiv:2104.00008 (2021). [112] Daniel A Roberts, Sho Yaida, and Boris Hanin. The principles of deep learning theory. Vol. 46. Cambridge University Press Cambridge, MA, USA, 2022. [113] Jordan Cotler, Nicholas Hunter-Jones, Junyu Liu, and Beni Yoshida. “Chaos, complexity, and random matrices”. In: J. High Energy Phys. 2017.11 (2017), pp. 1–60. [114] Junyu Liu. “Spectral form factors and late time quantum chaos”. In: Phys. Rev. D 98.8 (2018), p. 086026. 133 [115] Junyu Liu. “Scrambling and decoding the charged quantum information”. In: Phys. Rev. Res. 2.4 (2020), p. 043164. [116] Xuchen You and Xiaodi Wu. “Exponentially many local minima in quantum neural networks”. In: International Conference on Machine Learning. PMLR. 2021, pp. 12144–12155. [117] Eric R Anschuetz. “Critical points in quantum generative models”. In: arXiv:2109.06957 (2021). [118] Josef Hofbauer and Karl Sigmund. Evolutionary games and population dynamics. Cambridge university press, 1998. [119] Yavuz Nutku. “Hamiltonian structure of the Lotka-Volterra equations”. In: Phys. Lett. A 145.1 (1990), pp. 27–28. [120] Qiskit contributors. Qiskit: An Open-source Framework for Quantum Computing. 2023. doi: 10.5281/zenodo.2573505. [121] Aitor Lewkowycz, Yasaman Bahri, Ethan Dyer, Jascha Sohl-Dickstein, and Guy Gur-Ari. “The large learning rate phase of deep learning: the catapult mechanism”. In: arXiv:2003.02218 (2020). [122] David Meltzer and Junyu Liu. “Catapult dynamics and phase transitions in quadratic nets”. In: arXiv preprint arXiv:2301.07737 (2023). [123] Henry W Lin, Max Tegmark, and David Rolnick. “Why does deep and cheap learning work so well?” In: J. Stat. Phys. 168 (2017), pp. 1223–1247. [124] Joshua Batson, C Grace Haaf, Yonatan Kahn, and Daniel A Roberts. “Topological obstructions to autoencoding”. In: J. High Energy Phys. 2021.4 (2021), pp. 1–43. [125] Dave Wecker, Matthew B Hastings, and Matthias Troyer. “Progress towards practical quantum variational algorithms”. In: Phys. Rev. A 92.4 (2015), p. 042303. [126] Ming-Cheng Chen, Ming Gong, Xiaosi Xu, Xiao Yuan, Jian-Wen Wang, Can Wang, Chong Ying, Jin Lin, Yu Xu, Yulin Wu, et al. “Demonstration of adiabatic variational quantum computing with a superconducting quantum coprocessor”. In: Phys. Rev. Lett. 125.18 (2020), p. 180501. [127] Jonathan Romero, Jonathan P Olson, and Alan Aspuru-Guzik. “Quantum autoencoders for efficient compression of quantum data”. In: Quantum Sci. Technol. 2.4 (2017), p. 045001. [128] Michael J Gullans, Stefan Krastanov, David A Huse, Liang Jiang, and Steven T Flammia. “Quantum coding with low-depth random circuits”. In: Phys. Rev. X 11.3 (2021), p. 031066. [129] Peter JJ O’Malley, Ryan Babbush, Ian D Kivlichan, Jonathan Romero, Jarrod R McClean, Rami Barends, Julian Kelly, Pedram Roushan, Andrew Tranter, Nan Ding, et al. “Scalable quantum simulation of molecular energies”. In: Phys. Rev. X 6.3 (2016), p. 031007. [130] James I Colless, Vinay V Ramasesh, Dar Dahlen, Machiel S Blok, ME Kimchi-Schwartz, JR McClean, J Carter, WA De Jong, and I Siddiqi. “Computation of molecular spectra on a quantum processor with an error-resilient algorithm”. In: Phys. Rev. X 8.1 (2018), p. 011021. 134 [131] Carlos Bravo-Prieto, Josep Lumbreras-Zarapico, Luca Tagliacozzo, and José I Latorre. “Scaling of variational quantum circuit depth for condensed matter systems”. In: Quantum 4 (2020), p. 272. [132] Roeland Wiersema, Cunlu Zhou, Yvette de Sereville, Juan Felipe Carrasquilla, Yong Baek Kim, and Henry Yuen. “Exploring entanglement and optimization within the Hamiltonian Variational Ansatz”. In: PRX Quantum 1.2 (2020), p. 020319. [133] Jacques Carolan, Masoud Mohseni, Jonathan P Olson, Mihika Prabhu, Changchen Chen, Darius Bunandar, Murphy Yuezhen Niu, Nicholas C Harris, Franco NC Wong, Michael Hochberg, et al. “Variational quantum unsampling on a quantum photonic processor”. In: Nat. Phys. 16.3 (2020), pp. 322–327. [134] Marcello Benedetti, Edward Grant, Leonard Wossnig, and Simone Severini. “Adversarial quantum circuit learning for pure state approximation”. In: New J. Phys. 21.4 (2019), p. 043023. [135] Andrew Patterson, Hongxiang Chen, Leonard Wossnig, Simone Severini, Dan Browne, and Ivan Rungger. “Quantum state discrimination using noisy quantum neural networks”. In: Phys. Rev. Res. 3.1 (2021), p. 013063. [136] Hongxiang Chen, Leonard Wossnig, Simone Severini, Hartmut Neven, and Masoud Mohseni. “Universal discriminative quantum neural networks”. In: Quantum Mach. Intell. 3.1 (2021), pp. 1–11. [137] Ian MacCormack, Conor Delaney, Alexey Galda, Nidhi Aggarwal, and Prineha Narang. “Branching quantum convolutional neural networks”. In: Phys. Rev. Res. 4.1 (2022), p. 013117. [138] Yong Liu, Dongyang Wang, Shichuan Xue, Anqi Huang, Xiang Fu, Xiaogang Qiang, Ping Xu, He-Liang Huang, Mingtang Deng, Chu Guo, et al. “Variational quantum circuits for quantum state tomography”. In: Phys. Rev. A 101.5 (2020), p. 052316. [139] Michael Lubasch, Jaewoo Joo, Pierre Moinier, Martin Kiffner, and Dieter Jaksch. “Variational quantum algorithms for nonlinear problems”. In: Phys. Rev. A 101.1 (2020), p. 010301. [140] Ying Li and Simon C Benjamin. “Efficient variational quantum simulator incorporating active error minimization”. In: Phys. Rev. X 7.2 (2017), p. 021050. [141] Eugene F Dumitrescu, Alex J McCaskey, Gaute Hagen, Gustav R Jansen, Titus D Morris, T Papenbrock, Raphael C Pooser, David Jarvis Dean, and Pavel Lougovski. “Cloud quantum computing of an atomic nucleus”. In: Phys. Rev. Lett. 120.21 (2018), p. 210501. [142] Sam McArdle, Tyson Jones, Suguru Endo, Ying Li, Simon C Benjamin, and Xiao Yuan. “Variational ansatz-based quantum simulation of imaginary time evolution”. In: npj Quantum Inf. 5.1 (2019), pp. 1–6. [143] Patrick Rebentrost, Thomas R Bromley, Christian Weedbrook, and Seth Lloyd. “Quantum Hopfield neural network”. In: Phys. Rev. A 98.4 (2018), p. 042308. 135 [144] Nathan Killoran, Thomas R Bromley, Juan Miguel Arrazola, Maria Schuld, Nicolás Quesada, and Seth Lloyd. “Continuous-variable quantum neural networks”. In: Phys. Rev. Res. 1.3 (2019), p. 033063. [145] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, and Dacheng Tao. “Expressive power of parametrized quantum circuits”. In: Phys. Rev. Res. 2.3 (2020), p. 033125. [146] Chengran Yang, Andrew JP Garner, Feiyang Liu, Nora Tischler, Jayne Thompson, Man-Hong Yung, Mile Gu, and Oscar Dahlsten. “Provably superior accuracy in quantum stochastic modeling”. In: Phys. Rev. A 108.2 (2023), p. 022411. [147] Ronen Eldan and Ohad Shamir. “The power of depth for feedforward neural networks”. In: Conference on learning theory. PMLR. 2016, pp. 907–940. [148] Matus Telgarsky. “Benefits of depth in neural networks”. In: Conference on learning theory. PMLR. 2016, pp. 1517–1539. [149] David Rolnick and Max Tegmark. “The power of deeper networks for expressing natural functions”. In: arXiv:1705.05502 (2017). [150] Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. “The expressive power of neural networks: A view from the width”. In: Advances in neural information processing systems 30 (2017). [151] Carl Helstrom. Quantum detection and estimation theory. Academic press New York, 1976. [152] Maria Schuld, Alex Bocharov, Krysta M Svore, and Nathan Wiebe. “Circuit-centric quantum classifiers”. In: Phys. Rev. A 101.3 (2020), p. 032308. [153] Edward Grant, Marcello Benedetti, Shuxiang Cao, Andrew Hallam, Joshua Lockhart, Vid Stojevic, Andrew G Green, and Simone Severini. “Hierarchical quantum classifiers”. In: npj Quantum Inf. 4.1 (2018), pp. 1–8. [154] CW Helstrom. “Minimum mean-squared error of estimates in quantum statistics”. In: Phys. Lett. A 25.2 (1967), pp. 101–102. [155] Yadong Wu, Pengfei Zhang, and Hui Zhai. “Scrambling ability of quantum neural network architectures”. In: Phys. Rev. Res. 3.3 (2021), p. L032057. [156] Bingzhi Zhang and Quntao Zhuang. “Entanglement formation in continuous-variable random quantum networks”. In: npj Quantum Inf. 7.1 (2021), pp. 1–12. [157] Nilanjana Datta, Milan Mosonyi, Min-Hsiu Hsieh, and Fernando GSL Brandao. “A smooth entropy approach to quantum hypothesis testing and the classical capacity of quantum channels”. In: IEEE transactions on information theory 59.12 (2013), pp. 8014–8026. [158] Ligong Wang and Renato Renner. “One-shot classical-quantum capacity and hypothesis testing”. In: Phys. Rev. Lett. 108.20 (2012), p. 200501. 136 [159] Anurag Anshu, Vamsi Krishna Devabathini, and Rahul Jain. “Quantum communication using coherent rejection sampling”. In: Phys. Rev. Lett. 119.12 (2017), p. 120506. [160] Anurag Anshu, Rahul Jain, and Naqueeb Ahmad Warsi. “Building blocks for communication over noisy quantum networks”. In: IEEE Transactions on Information Theory 65.2 (2018), pp. 1287–1306. [161] Masahito Hayashi and Hiroshi Nagaoka. “General formulas for capacity of classical-quantum channels”. In: IEEE Transactions on Information Theory 49.7 (2003), pp. 1753–1768. [162] Quntao Zhuang, Zheshen Zhang, and Jeffrey H Shapiro. “Distributed quantum sensing using continuous-variable multipartite entanglement”. In: Phys. Rev. A 97.3 (2018), p. 032329. [163] Wenchao Ge, Kurt Jacobs, Zachary Eldredge, Alexey V Gorshkov, and Michael Foss-Feig. “Distributed quantum metrology with linear networks and separable inputs”. In: Phys. Rev. Lett. 121.4 (2018), p. 043604. [164] Quntao Zhuang. “Quantum ranging with Gaussian entanglement”. In: Phys. Rev. Lett. 126.24 (2021), p. 240501. [165] Stefano Pirandola. “Quantum reading of a classical digital memory”. In: Phys. Rev. Lett. 106.9 (2011), p. 090504. [166] Haowei Shi, Zheshen Zhang, Stefano Pirandola, and Quntao Zhuang. “Entanglement-assisted absorption spectroscopy”. In: Phys. Rev. Lett. 125.18 (2020), p. 180502. [167] Jens Eisert. “Entangling power and quantum circuit complexity”. In: Phys. Rev. Lett. 127.2 (2021), p. 020501. [168] Jonas Haferkamp, Philippe Faist, Naga BT Kothakonda, Jens Eisert, and Nicole Yunger Halpern. “Linear growth of quantum circuit complexity”. In: Nat. Phys. 18.5 (2022), pp. 528–532. [169] David Gross, Koenraad Audenaert, and Jens Eisert. “Evenly distributed unitaries: On the structure of unitary designs”. In: J. Math. Phys. 48.5 (2007), p. 052104. [170] Andris Ambainis and Joseph Emerson. “Quantum t-designs: t-wise independence in the quantum world”. In: Twenty-Second Annual IEEE Conference on Computational Complexity (CCC’07). IEEE. 2007, pp. 129–140. [171] Fernando GSL Brandao, Aram W Harrow, and Michał Horodecki. “Local random quantum circuits are approximate polynomial-designs”. In: Commun. Math. Phys. 346.2 (2016), pp. 397–434. [172] Don N Page. “Average entropy of a subsystem”. In: Phys. Rev. Lett. 71.9 (1993), p. 1291. [173] Christoph Holzhey, Finn Larsen, and Frank Wilczek. “Geometric and renormalized entropy in conformal field theory”. In: Nucl. Phys. B 424.3 (1994), pp. 443–467. [174] José Ignacio Latorre, Enrique Rico, and Guifré Vidal. “Ground state entanglement in quantum spin chains”. In: arXiv preprint quant-ph/0304098 (2003). 137 [175] Pasquale Calabrese and John Cardy. “Entanglement entropy and quantum field theory”. In: J. Stat. Mech: Theory Exp. 2004.06 (2004), P06002. [176] Robert S Kennedy. “On the optimum receiver for the M-ary linearly independent pure state problem”. In: Quarterly Progress Report 110 (1973), pp. 142–146. [177] Yonina C Eldar, Alexandre Megretski, and George C Verghese. “Designing optimal quantum detectors via semidefinite programming”. In: IEEE Transactions on Information Theory 49.4 (2003), pp. 1007–1012. [178] Andrew Arrasmith, Marco Cerezo, Piotr Czarnik, Lukasz Cincio, and Patrick J Coles. “Effect of barren plateaus on gradient-free optimization”. In: Quantum 5 (2021), p. 558. [179] M Cerezo and Patrick J Coles. “Higher order derivatives of quantum neural networks with barren plateaus”. In: Quantum Sci. Technol. 6.3 (2021), p. 035006. [180] Robert S Kennedy. In: Quarterly Progress Report 108 (1973), pp. 219–225. [181] Sukin Sim, Peter D Johnson, and Alán Aspuru-Guzik. “Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms”. In: Adv. Quantum Technol. 2.12 (2019), p. 1900070. [182] Tyler Volkoff and Patrick J Coles. “Large gradients via correlation in random parameterized quantum circuits”. In: Quantum Sci. Technol. 6.2 (2021), p. 025008. [183] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. “Quantum principal component analysis”. In: Nat. Phys. 10.9 (2014), pp. 631–633. [184] Xun Gao, Eric R Anschuetz, Sheng-Tao Wang, J Ignacio Cirac, and Mikhail D Lukin. “Enhancing generative models via quantum correlations”. In: Phys. Rev. X 12.2 (2022), p. 021037. [185] Amir Khoshaman, Walter Vinci, Brandon Denis, Evgeny Andriyash, Hossein Sadeghi, and Mohammad H Amin. “Quantum variational autoencoder”. In: Quantum. Sci. Technol. 4.1 (2018), p. 014001. [186] Mohammad H Amin, Evgeny Andriyash, Jason Rolfe, Bohdan Kulchytskyy, and Roger Melko. “Quantum boltzmann machine”. In: Phys. Rev. X 8.2 (2018), p. 021050. [187] Xun Gao, Z-Y Zhang, and L-M Duan. “A quantum machine learning algorithm based on generative models”. In: Sci. Adv. 4.12 (2018), eaat9004. [188] Pierre-Luc Dallaire-Demers and Nathan Killoran. “Quantum generative adversarial networks”. In: Phys. Rev. A 98.1 (2018), p. 012324. [189] Ling Hu, Shu-Hao Wu, Weizhou Cai, Yuwei Ma, Xianghao Mu, Yuan Xu, Haiyan Wang, Yipu Song, Dong-Ling Deng, Chang-Ling Zou, et al. “Quantum generative adversarial learning in a superconducting quantum circuit”. In: Sci. Adv. 5.1 (2019), eaav2761. 138 [190] He-Liang Huang, Yuxuan Du, Ming Gong, Youwei Zhao, Yulin Wu, Chaoyue Wang, Shaowei Li, Futian Liang, Jin Lin, Yu Xu, et al. “Experimental quantum generative adversarial networks for image generation”. In: Phys. Rev. Appl 16.2 (2021), p. 024051. [191] Elton Yechao Zhu, Sonika Johri, Dave Bacon, Mert Esencan, Jungsang Kim, Mark Muir, Nikhil Murgai, Jason Nguyen, Neal Pisenti, Adam Schouela, et al. “Generative quantum learning of joint probability distribution functions”. In: Phys. Rev. Res. 4.4 (2022), p. 043092. [192] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. “Deep unsupervised learning using nonequilibrium thermodynamics”. In: International conference on machine learning. PMLR. 2015, pp. 2256–2265. [193] Arne Schneuing, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, et al. “Structure-based drug design with equivariant diffusion models”. In: arXiv preprint arXiv:2210.13695 (2022). [194] Yongxin Chen, Tryphon T Georgiou, and Michele Pavon. “On the relation between optimal transport and Schrödinger bridges: A stochastic control viewpoint”. In: Journal of Optimization Theory and Applications 169 (2016), pp. 671–691. [195] Prafulla Dhariwal and Alexander Nichol. “Diffusion models beat gans on image synthesis”. In: Adv. Neural Inf. Process. 34 (2021), pp. 8780–8794. [196] Gustav Müller-Franzes, Jan Moritz Niehues, Firas Khader, Soroosh Tayebi Arasteh, Christoph Haarburger, Christiane Kuhl, Tianci Wang, Tianyu Han, Sven Nebelung, Jakob Nikolas Kather, et al. “Diffusion probabilistic models beat gans on medical images”. In: arXiv:2212.07501 (2022). [197] Ajil Jalal, Marius Arvinte, Giannis Daras, Eric Price, Alexandros G Dimakis, and Jon Tamir. “Robust Compressed Sensing MRI with Deep Generative Priors”. In: Adv. Neural Inf. Process. Ed. by M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan. Vol. 34. Curran Associates, Inc., 2021, pp. 14938–14954. url: https://proceedings.neurips.cc/paper_files/paper/2021/file/7d6044e95a16761171b130dcb476a43ePaper.pdf. [198] Yang Song and Stefano Ermon. “Generative Modeling by Estimating Gradients of the Data Distribution”. In: Adv. Neural Inf. Process. Ed. by H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett. Vol. 32. Curran Associates, Inc., 2019. url: https://proceedings.neurips.cc/paper_files/paper/2019/file/3001ef257407d5a371a96dcd947c7d93- Paper.pdf. [199] Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. “A kernel two-sample test”. In: The Journal of Machine Learning Research 13.1 (2012), pp. 723–773. [200] C. Villani. Topics in Optimal Transportation. Graduate studies in mathematics. American Mathematical Society, 2003. isbn: 9781470418045. [201] Ognyan Oreshkov and John Calsamiglia. “Distinguishability measures between ensembles of quantum states”. In: Phys. Rev. A 79.3 (2009), p. 032336. 139 [202] Jin-Guo Liu and Lei Wang. “Differentiable learning of quantum circuit born machines”. In: Phys. Rev. A 98.6 (2018), p. 062324. [203] Marcello Benedetti, Delfina Garcia-Pintos, Oscar Perdomo, Vicente Leyton-Ortega, Yunseong Nam, and Alejandro Perdomo-Ortiz. “A generative modeling approach for benchmarking and training shallow quantum circuits”. In: npj Quantum Inf. 5.1 (2019), p. 45. [204] Brian Coyle, Daniel Mills, Vincent Danos, and Elham Kashefi. “The Born supremacy: quantum advantage and training of an Ising Born machine”. In: npj Quantum Inf. 6.1 (2020), p. 60. [205] Kaitlin Gili, Mohamed Hibat-Allah, Marta Mauri, Chris Ballance, and Alejandro Perdomo-Ortiz. “Do quantum circuit Born machines generalize?” In: Quantum Sci. Technol. 8.3 (2023), p. 035021. [206] Christa Zoufal, Aurélien Lucchi, and Stefan Woerner. “Quantum generative adversarial networks for learning and loading random distributions”. In: npj Quantum Inf. 5.1 (2019), p. 103. [207] Aram W Harrow and Saeed Mehraban. “Approximate unitary t-designs by short random quantum circuits using nearest-neighbor and long-range gates”. In: Commun. Math. Phys. 401 (2023). [208] Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru R Zhang. “Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions”. In: arXiv:2209.11215 (2022). [209] Matthias C Caro, Hsin-Yuan Huang, Nicholas Ezzell, Joe Gibbs, Andrew T Sornborger, Lukasz Cincio, Patrick J Coles, and Zoë Holmes. “Out-of-distribution generalization for learning quantum dynamics”. In: Nat. Commun. 14.1 (2023), p. 3751. [210] Leonardo Banchi, Jason Pereira, and Stefano Pirandola. “Generalization in quantum machine learning: A quantum information standpoint”. In: PRX Quantum 2.4 (2021), p. 040321. [211] Nathan Srebro, Karthik Sridharan, and Ambuj Tewari. “Smoothness, low noise and fast rates”. In: Adv. Neural Inf. Process. Syst. 23 (2010). [212] Rentian Yao, Xiaohui Chen, and Yun Yang. “Mean-field nonparametric estimation of interacting particle systems”. In: Conference on Learning Theory. PMLR. 2022, pp. 2242–2275. [213] Ron Belyansky, Przemyslaw Bienias, Yaroslav A Kharkov, Alexey V Gorshkov, and Brian Swingle. “Minimal Model for Fast Scrambling”. In: Phys. Rev. Lett. 125.13 (2020), p. 130601. [214] Robin Harper, Steven T Flammia, and Joel J Wallman. “Efficient learning of quantum noise”. In: Nat. Phys. 16.12 (2020), pp. 1184–1188. [215] Senrui Chen, Yunchao Liu, Matthew Otten, Alireza Seif, Bill Fefferman, and Liang Jiang. “The learnability of Pauli noise”. In: Nat. Commun. 14.1 (2023), p. 52. [216] Maria Schuld. “Supervised quantum machine learning models are kernel methods”. In: arXiv:2101.11020 (2021). 140 [217] Guangxi Li, Ruilin Ye, Xuanqiang Zhao, and Xin Wang. “Concentration of data encoding in parameterized quantum circuits”. In: Adv. Neural Inf. Process. 35 (2022), pp. 19456–19469. [218] Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. “Solving Inverse Problems in Medical Imaging with Score-Based Generative Models”. In: International Conference on Learning Representations. 2021. [219] QuantGenMdl. https://github.com/Francis-Hsu/QuantGenMdl. Accessed: 2024-02-01. [220] Brian Skinner, Jonathan Ruhman, and Adam Nahum. “Measurement-induced phase transitions in the dynamics of entanglement”. In: Phys. Rev. X 9.3 (2019), p. 031009. [221] Joaquin F Rodriguez-Nieva and Mathias S Scheurer. “Identifying topological order through unsupervised machine learning”. In: Nat. Phys 15.8 (2019), pp. 790–795. [222] Xiaohui Chen and Yun Yang. “Diffusion K-means clustering on manifolds: Provable exact recovery via semidefinite relaxations”. In: Applied and Computational Harmonic Analysis 52 (2021), pp. 303–347. [223] Ronald R Coifman and Stéphane Lafon. “Diffusion maps”. In: Applied and computational harmonic analysis 21.1 (2006), pp. 5–30. [224] Apimuk Sornsaeng, Ninnat Dangniam, Pantita Palittapongarnpim, and Thiparat Chotibut. “Quantum diffusion map for nonlinear dimensionality reduction”. In: Phys. Rev. A 104.5 (2021), p. 052410. [225] Andrea Skolik, Jarrod R McClean, Masoud Mohseni, Patrick van der Smagt, and Martin Leib. “Layerwise learning for quantum neural networks”. In: Quantum. Mach. 3 (2021), pp. 1–11. [226] Ernesto Campos, Daniil Rabinovich, Vishwanathan Akshay, and J Biamonte. “Training saturation in layerwise quantum approximate optimization”. In: Phys. Rev. A 104.3 (2021), p. L030401. [227] Marco Parigi, Stefano Martina, and Filippo Caruso. “Quantum-Noise-driven Generative Diffusion Models”. In: arXiv:2308.12013 (2023). [228] Saad Yalouz, Bruno Senjean, Filippo Miatto, and Vedran Dunjko. “Encoding strongly-correlated many-boson wavefunctions on a photonic quantum computer: application to the attractive Bose-Hubbard model”. In: Quantum 5 (2021), p. 572. [229] Nicolas Gisin and Rob Thew. “Quantum communication”. In: Nat. Photonics 1.3 (2007), p. 165. [230] H Jeff Kimble. “The quantum internet”. In: Nature 453.7198 (2008), pp. 1023–1030. [231] Jacob Biamonte, Mauro Faccin, and Manlio De Domenico. “Complex networks from classical to quantum”. In: Commun. Phys. 2.53 (2019). [232] Stephanie Wehner, David Elkouss, and Ronald Hanson. “Quantum internet: A vision for the road ahead”. In: Science 362.6412 (2018). 141 [233] Wojciech Kozlowski and Stephanie Wehner. “Towards large-scale quantum networks”. In: Proceedings of the Sixth Annual ACM International Conference on Nanoscale Computing and Communication. 2019, pp. 1–7. [234] Ben Q Baragiola, Giacomo Pantaleoni, Rafael N Alexander, Angela Karanjai, and Nicolas C Menicucci. “All-Gaussian universality and fault tolerance with the Gottesman-Kitaev-Preskill code”. In: Phys. Rev. Lett. 123.20 (2019), p. 200502. [235] Mikkel V Larsen, Christopher Chamberland, Kyungjoo Noh, Jonas S Neergaard-Nielsen, and Ulrik L Andersen. “Fault-tolerant continuous-variable measurement-based quantum computation architecture”. In: PRX Quantum 2.3 (2021), p. 030325. [236] Daniel Gottesman, Alexei Kitaev, and John Preskill. “Encoding a qubit in an oscillator”. In: Phys. Rev. A 64.1 (2001), p. 012310. [237] Nissim Ofek, Andrei Petrenko, Reinier Heeres, Philip Reinhold, Zaki Leghtas, Brian Vlastakis, Yehan Liu, Luigi Frunzio, Steven M Girvin, Liang Jiang, et al. “Extending the lifetime of a quantum bit with error correction in superconducting circuits”. In: Nature 536.7617 (2016), pp. 441–445. [238] Reinier W Heeres, Brian Vlastakis, Eric Holland, Stefan Krastanov, Victor V Albert, Luigi Frunzio, Liang Jiang, and Robert J Schoelkopf. “Cavity state manipulation using photon-number selective phase gates”. In: Phys. Rev. Lett. 115.13 (2015), p. 137002. [239] Stefan Krastanov, Victor V Albert, Chao Shen, Chang-Ling Zou, Reinier W Heeres, Brian Vlastakis, Robert J Schoelkopf, and Liang Jiang. “Universal control of an oscillator with dispersive coupling to a qubit”. In: Phys. Rev. A 92.4 (2015), p. 040303. [240] Philippe Campagne-Ibarcq, Alec Eickbusch, Steven Touzard, Evan Zalys-Geller, Nicholas E Frattini, Volodymyr V Sivak, Philip Reinhold, Shruti Puri, Shyam Shankar, Robert J Schoelkopf, et al. “Quantum error correction of a qubit encoded in grid states of an oscillator”. In: Nature 584.7821 (2020), pp. 368–372. [241] Alec Eickbusch, Volodymyr Sivak, Andy Z Ding, Salvatore S Elder, Shantanu R Jha, Jayameenakshi Venkatraman, Baptiste Royer, SM Girvin, Robert J Schoelkopf, and Michel H Devoret. “Fast universal control of an oscillator with weak dispersive coupling to a qubit”. In: Nat. Phys. 18 (2022), pp. 1464–1469. [242] Zheshen Zhang and Quntao Zhuang. “Distributed quantum sensing”. In: Quantum Sci. Technol. 6.4 (2021), p. 043001. [243] Anthony J. Brady, Christina Gao, Roni Harnik, Zhen Liu, Zheshen Zhang, and Quntao Zhuang. “Entangled Sensor-Networks for Dark-Matter Searches”. In: PRX Quantum 3 (3 Sept. 2022), p. 030333. [244] Yi Xia, Aman R Agrawal, Christian M Pluchar, Anthony J Brady, Zhen Liu, Quntao Zhuang, Dalziel J Wilson, and Zheshen Zhang. “Entanglement-enhanced optomechanical sensing”. In: Nat. Photonics (2023), pp. 1–8. 142 [245] Robin Blume-Kohout and Peter S Turner. “The curious nonexistence of gaussian 2-designs”. In: Commun. Math. Phys. 326.3 (2014), pp. 755–771. [246] Joseph T Iosue, Kunal Sharma, Michael J Gullans, and Victor V Albert. “Continuous-variable quantum state designs: theory and applications”. In: Phys. Rev. X 14.1 (2024), p. 011013. [247] Yuxuan Du, Zhuozhuo Tu, Xiao Yuan, and Dacheng Tao. “Efficient measure for the expressivity of variational quantum algorithms”. In: Phys. Rev. Lett. 128.8 (2022), p. 080506. [248] Christa Flühmann, Thanh Long Nguyen, Matteo Marinelli, Vlad Negnevitsky, Karan Mehta, and JP Home. “Encoding a qubit in a trapped-ion mechanical oscillator”. In: Nature 566.7745 (2019), pp. 513–517. [249] Christian Weedbrook, Stefano Pirandola, Raúl García-Patrón, Nicolas J Cerf, Timothy C Ralph, Jeffrey H Shapiro, and Seth Lloyd. “Gaussian quantum information”. In: Rev. Mod. Phys. 84.2 (2012), p. 621. [250] Mile Gu, Christian Weedbrook, Nicolas C Menicucci, Timothy C Ralph, and Peter van Loock. “Quantum computing with continuous-variable clusters”. In: Phys. Rev. A 79.6 (2009), p. 062318. [251] Asaf A Diringer, Eliya Blumenthal, Avishay Grinberg, Liang Jiang, and Shay Hacohen-Gourgy. “Conditional-not Displacement: Fast Multioscillator Control with a Single Qubit”. In: Physical Review X 14.1 (2024), p. 011055. [252] Kosuke Mitarai, Makoto Negoro, Masahiro Kitagawa, and Keisuke Fujii. “Quantum circuit learning”. In: Phys. Rev. A 98.3 (2018), p. 032309. [253] M Tse, Haocun Yu, N Kijbunchoo, A Fernandez-Galiana, P Dupej, L Barsotti, CD Blair, DD Brown, SE Dwyer, A Effler, et al. “Quantum-enhanced advanced ligo detectors in the era of gravitational-wave astronomy”. In: Phys. Rev. Lett. 123.23 (2019), p. 231107. [254] Junaid Aasi, J Abadie, BP Abbott, Richard Abbott, TD Abbott, MR Abernathy, Carl Adams, Thomas Adams, Paolo Addesso, RX Adhikari, et al. “Enhanced sensitivity of the LIGO gravitational wave detector by using squeezed states of light”. In: Nat. Photonics 7.8 (2013), pp. 613–619. [255] J Abadie, Benjamin P Abbott, R Abbott, Thomas D Abbott, M Abernathy, Carl Adams, R Adhikari, Christoph Affeldt, B Allen, GS Allen, et al. “A gravitational wave observatory operating beyond the quantum shot-noise limit”. In: Nat. Phys. 7.12 (2011), p. 962. [256] KM Backes, DA Palken, S Al Kenany, BM Brubaker, SB Cahn, A Droster, Gene C Hilton, Sumita Ghosh, H Jackson, SK Lamoreaux, et al. “A quantum enhanced search for dark matter axions”. In: Nature 590.7845 (2021), pp. 238–242. [257] Joseph T Iosue, Adam Ehrenberg, Dominik Hangleiter, Abhinav Deshpande, and Alexey V Gorshkov. “Page curves and typical entanglement in linear optics”. In: Quantum 7 (2023), p. 1017. 143 [258] Edward Grant, Leonard Wossnig, Mateusz Ostaszewski, and Marcello Benedetti. “An initialization strategy for addressing barren plateaus in parametrized quantum circuits”. In: Quantum 3 (2019), p. 214. [259] Bobak Toussi Kiani, Giacomo De Palma, Milad Marvian, Zi-Wen Liu, and Seth Lloyd. “Learning quantum data with the quantum earth mover’s distance”. In: Quantum. Sci. Technol. 7.4 (2022), p. 045002. [260] Stefan H Sack, Raimel A Medina, Alexios A Michailidis, Richard Kueng, and Maksym Serbyn. “Avoiding barren plateaus using classical shadows”. In: PRX Quantum 3.2 (2022), p. 020365. [261] Simon Cichy, Paul K Faehrmann, Sumeet Khatri, and Jens Eisert. “A perturbative gadget for delaying the onset of barren plateaus in variational quantum algorithms”. In: arXiv preprint arXiv:2210.03099 (2022). [262] Xia Liu, Geng Liu, Hao-Kai Zhang, Jiaxin Huang, and Xin Wang. “Mitigating barren plateaus of variational quantum eigensolvers”. In: IEEE Transactions on Quantum Engineering (2024). [263] CL Mehta. “Diagonal coherent-state representation of quantum operators”. In: Phys. Rev. Lett. 18.18 (1967), p. 752. [264] A Vourdas. “Analytic representations in quantum mechanics”. In: J. Phys. A Math. Gen. 39.7 (2006), R65. [265] Eric R Anschuetz and Bobak T Kiani. “Quantum variational algorithms are swamped with traps”. In: Nat. Commun 13.1 (2022), p. 7760. [266] Bobak Toussi Kiani, Seth Lloyd, and Reevu Maity. “Learning unitaries by gradient descent”. In: arXiv preprint arXiv:2001.11897 (2020). [267] Joonho Kim, Jaedeok Kim, and Dario Rosa. “Universal effectiveness of high-depth circuits in variational eigenproblems”. In: Phys. Rev. Res. 3.2 (2021), p. 023203. [268] Tyler J Volkoff. “Efficient trainability of linear optical modules in quantum optical neural networks”. In: Journal of Russian Laser Research 42.3 (2021), pp. 250–260. [269] Paulina Marian and Tudor A Marian. “Uhlmann fidelity between two-mode Gaussian states”. In: Phys. Rev. A 86.2 (2012), p. 022340. [270] Gaetana Spedalieri, Christian Weedbrook, and Stefano Pirandola. “A limit formula for the quantum fidelity”. In: J. Phys. A Math. 46.2 (2012), p. 025304. [271] Leonardo Banchi, Samuel L Braunstein, and Stefano Pirandola. “Quantum fidelity for arbitrary Gaussian states”. In: Phys. Rev. Lett. 115.26 (2015), p. 260501. [272] Samuel L Braunstein and Peter Van Loock. “Quantum information with continuous variables”. In: Rev. Mod. Phys. 77.2 (2005), p. 513. 144 [273] Immanuel Bloch, Jean Dalibard, and Sylvain Nascimbene. “Quantum simulations with ultracold quantum gases”. In: Nat. Phys. 8.4 (2012), pp. 267–276. [274] Paul Magnard, Simon Storz, Philipp Kurpiers, Josua Schär, Fabian Marxer, Janis Lütolf, Theo Walter, J-C Besse, Mihai Gabureac, Kevin Reuer, et al. “Microwave quantum link between superconducting circuits housed in spatially separated cryogenic systems”. In: Phys. Rev. Lett. 125.26 (2020), p. 260502. [275] Youpeng Zhong, Hung-Shen Chang, Audrey Bienfait, Étienne Dumur, Ming-Han Chou, Christopher R Conner, Joel Grebel, Rhys G Povey, Haoxiong Yan, David I Schuster, et al. “Deterministic multi-qubit entanglement in a quantum network”. In: Nature 590.7847 (2021), pp. 571–575. [276] Xu Han, Wei Fu, Chang-Ling Zou, Liang Jiang, and Hong X Tang. “Microwave-optical quantum frequency conversion”. In: Optica 8.8 (2021), pp. 1050–1064. [277] Reed W Andrews, Robert W Peterson, Tom P Purdy, Katarina Cicak, Raymond W Simmonds, Cindy A Regal, and Konrad W Lehnert. “Bidirectional and efficient conversion between microwave and optical light”. In: Nat. Phys. 10.4 (2014), pp. 321–326. [278] Joerg Bochmann, Amit Vainsencher, David D Awschalom, and Andrew N Cleland. “Nanomechanical coupling between microwave and optical photons”. In: Nat. Phys. 9.11 (2013), pp. 712–716. [279] Linran Fan, Chang-Ling Zou, Risheng Cheng, Xiang Guo, Xu Han, Zheng Gong, Sihao Wang, and Hong X Tang. “Superconducting cavity electro-optics: a platform for coherent photon conversion between superconducting and photonic circuits”. In: Sci. Adv. 4.8 (2018), eaar4994. [280] Yuntao Xu, Ayed Al Sayem, Linran Fan, Changling Zou, and Hong X Tang. “Bidirectional electro-optic conversion reaching 1% efficiency with thin film lithium niobate”. In: CLEO: Science and Innovations. Optica Publishing Group. 2021, SM4L–4. [281] Ryusuke Hisatomi, Alto Osada, Yutaka Tabuchi, Toyofumi Ishikawa, Atsushi Noguchi, Rekishu Yamazaki, Koji Usami, and Yasunobu Nakamura. “Bidirectional conversion between microwave and light via ferromagnetic magnons”. In: Phys. Rev. B 93.17 (2016), p. 174427. [282] Lewis A Williamson, Yu-Hui Chen, and Jevon J Longdell. “Magneto-optic modulator with unit quantum efficiency”. In: Phys. Rev. Lett. 113.20 (2014), p. 203601. [283] John G Bartholomew, Jake Rochman, Tian Xie, Jonathan M Kindem, Andrei Ruskuc, Ioana Craiciu, Mi Lei, and Andrei Faraon. “On-chip coherent microwave-to-optical transduction mediated by ytterbium in YVO4”. In: Nat. Commun. 11.1 (2020), p. 3266. [284] Stefan Krastanov, Hamza Raniwala, Jeffrey Holzgrafe, Kurt Jacobs, Marko Lončar, Matthew J Reagor, and Dirk R Englund. “Optically heralded entanglement of superconducting systems in quantum networks”. In: Phys. Rev. Lett. 127.4 (2021), p. 040503. [285] Jing Wu, Chaohan Cui, Linran Fan, and Quntao Zhuang. “Deterministic microwave-optical transduction based on quantum teleportation”. In: Phys. Rev. Appli. 16.6 (2021), p. 064044. 145 [286] Charles H Bennett, Herbert J Bernstein, Sandu Popescu, and Benjamin Schumacher. “Concentrating partial entanglement by local operations”. In: Phys. Rev. A 53.4 (1996), p. 2046. [287] Charles H Bennett, Gilles Brassard, Sandu Popescu, Benjamin Schumacher, John A Smolin, and William K Wootters. “Purification of noisy entanglement and faithful teleportation via noisy channels”. In: Phys. Rev. Lett. 76.5 (1996), p. 722. [288] David Deutsch, Artur Ekert, Richard Jozsa, Chiara Macchiavello, Sandu Popescu, and Anna Sanpera. “Quantum privacy amplification and the security of quantum cryptography over noisy channels”. In: Phys. Rev. Lett. 77.13 (1996), p. 2818. [289] Filip Rozpędek, Thomas Schiet, David Elkouss, Andrew C Doherty, Stephanie Wehner, et al. “Optimizing practical entanglement distillation”. In: Physical Review A 97.6 (2018), p. 062333. [290] Xuanqiang Zhao, Benchi Zhao, Zihe Wang, Zhixin Song, and Xin Wang. “Practical distributed quantum information processing with LOCCNet”. In: npj Quantum Inf. 7.1 (2021), p. 159. [291] Jian-Wei Pan, Christoph Simon, Časlav Brukner, and Anton Zeilinger. “Entanglement purification for quantum communication”. In: Nature 410.6832 (2001), pp. 1067–1070. [292] Paul G Kwiat, Salvador Barraza-Lopez, Andre Stefanov, and Nicolas Gisin. “Experimental entanglement distillation and ‘hidden’non-locality”. In: Nature 409.6823 (2001), pp. 1014–1017. [293] Takashi Yamamoto, Masato Koashi, Şahin Kaya Özdemir, and Nobuyuki Imoto. “Experimental extraction of an entangled photon pair from two identically decohered pairs”. In: Nature 421.6921 (2003), pp. 343–346. [294] Rainer Reichle, Dietrich Leibfried, Emanuel Knill, Joseph Britton, R Bradford Blakestad, John D Jost, Christopher Langer, Roee Ozeri, Signe Seidelin, and David J Wineland. “Experimental purification of two-atom entanglement”. In: Nature 443.7113 (2006), pp. 838–841. [295] Timothy C Ralph and AP Lund. “Nondeterministic noiseless linear amplification of quantum systems”. In: AIP Conference Proceedings 1110 (2009), pp. 155–160. [296] David T Pegg, Lee S Phillips, and Stephen M Barnett. “Optical state truncation by projection synthesis”. In: Phys. Rev. Lett. 81.8 (1998), p. 1604. [297] Hiroki Takahashi, Jonas S Neergaard-Nielsen, Makoto Takeuchi, Masahiro Takeoka, Kazuhiro Hayasaka, Akira Furusawa, and Masahide Sasaki. “Entanglement distillation from Gaussian input states”. In: Nat. Photonics 4.3 (2010), pp. 178–181. [298] ShengLi Zhang and Peter van Loock. “Local Gaussian operations can enhance continuous-variable entanglement distillation”. In: Phys. Rev. A 84.6 (2011), p. 062309. [299] Animesh Datta, Lijian Zhang, Joshua Nunn, Nathan K Langford, Alvaro Feito, Martin B Plenio, and Ian A Walmsley. “Compact continuous-variable entanglement distillation”. In: Phys. Rev. Lett. 108.6 (2012), p. 060502. 146 [300] Ondřej Černotík and Jaromír Fiurášek. “Displacement-enhanced continuous-variable entanglement concentration”. In: Phys. Rev. A 86.5 (2012), p. 052339. [301] Mingjian He, Robert Malaney, and Benjamin A Burnett. “Noiseless linear amplifiers for multimode states”. In: Phys. Rev. A 103.1 (2021), p. 012414. [302] Liyun Hu, Zeyang Liao, and M Suhail Zubairy. “Continuous-variable entanglement via multiphoton catalysis”. In: Phys. Rev. A 95.1 (2017), p. 012310. [303] Yasamin Mardani, Ali Shafiei, Milad Ghadimi, and Mehdi Abdi. “Continuous-variable entanglement distillation by cascaded photon replacement”. In: Phys. Rev. A 102.1 (2020), p. 012407. [304] Earl T Campbell, Marco G Genoni, and Jens Eisert. “Continuous-variable entanglement distillation and noncommutative central limit theorems”. In: Physical Review A 87.4 (2013), p. 042330. [305] Alexander E Ulanov, Ilya A Fedorov, Anastasia A Pushkina, Yury V Kurochkin, Timothy C Ralph, and AI Lvovsky. “Undoing the effect of loss on quantum entanglement”. In: Nat. Photonics 9.11 (2015), pp. 764–768. [306] Mankei Tsang. “Cavity quantum electro-optics”. In: Phys. Rev. A 81.6 (2010), p. 063837. [307] Mankei Tsang. “Cavity quantum electro-optics. II. Input-output relations between traveling optical and microwave fields”. In: Phys. Rev. A 84.4 (2011), p. 043845. [308] Ryszard Horodecki, Paweł Horodecki, Michał Horodecki, and Karol Horodecki. “Quantum entanglement”. In: Rev. Mod. Phys. 81.2 (2009), p. 865. [309] Raúl García-Patrón, Stefano Pirandola, Seth Lloyd, and Jeffrey H Shapiro. “Reverse coherent information”. In: Phys. Rev. Lett. 102.21 (2009), p. 210501. [310] Changchun Zhong, Zhixin Wang, Changling Zou, Mengzhen Zhang, Xu Han, Wei Fu, Mingrui Xu, Shyam Shankar, Michel H Devoret, Hong X Tang, et al. “Proposal for heralded generation and detection of entangled microwave–optical-photon pairs”. In: Physical review letters 124.1 (2020), p. 010511. [311] Changchun Zhong, Xu Han, and Liang Jiang. “Quantum transduction with microwave and optical entanglement”. In: arXiv preprint arXiv:2202.04601 (2022). [312] Joan Agustí, Yuri Minoguchi, Johannes M Fink, and Peter Rabl. “Long-distance distribution of qubit-qubit entanglement using Gaussian-correlated photonic beams”. In: Phys. Rev. A 105.6 (2022), p. 062454. [313] Jacob Hastrup, Kimin Park, Jonatan Bohr Brask, Radim Filip, and Ulrik Lund Andersen. “Universal unitary transfer of continuous-variable quantum states into a few qubits”. In: Phys. Rev. Lett. 128.11 (2022), p. 110503. 147 [314] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Leo Zhou. “The quantum approximate optimization algorithm and the sherrington-kirkpatrick model at infinite size”. In: arXiv:1910.08187 (2019). [315] Yasunari Suzuki, Yoshiaki Kawase, Yuya Masumura, Yuria Hiraga, Masahiro Nakadai, Jiabao Chen, Ken M Nakanishi, Kosuke Mitarai, Ryosuke Imai, Shiro Tamiya, et al. “Qulacs: a fast and versatile quantum circuit simulator for research purpose”. In: arXiv:2011.13524 (2020). [316] Charles George Broyden. “The convergence of a class of double-rank minimization algorithms 1. general considerations”. In: IMA J. Appl. Math. 6.1 (1970), pp. 76–90. [317] Roger Fletcher. “A new approach to variable metric algorithms”. In: Comput. J. 13.3 (1970), pp. 317–322. [318] Donald Goldfarb. “A family of variable-metric methods derived by variational means”. In: Math. Comput. 24.109 (1970), pp. 23–26. [319] David F Shanno. “Conditioning of quasi-Newton methods for function minimization”. In: Math. Comput. 24.111 (1970), pp. 647–656. [320] Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. “SciPy 1.0: fundamental algorithms for scientific computing in Python”. In: Nat. Methods 17.3 (2020), pp. 261–272. [321] Gabor Csardi, Tamas Nepusz, et al. “The igraph software package for complex network research”. In: Int. J. complex Syst. 1695.5 (2006), pp. 1–9. [322] Yannet Interian. “Approximation algorithm for random MAX-kSAT”. In: International Conference on Theory and Applications of Satisfiability Testing. Springer. 2004, pp. 173–182. [323] Motohisa Fukuda, Robert König, and Ion Nechita. “RTNI—A symbolic integrator for Haar-random tensor networks”. In: J. Phys. A: Math. Theor. 52.42 (2019), p. 425303. [324] Quntao Zhuang, Biao Wu, et al. “Equilibration of quantum chaotic systems”. In: Physical Review E 88.6 (2013), p. 062147. [325] Farrokh Vatan and Colin Williams. “Optimal quantum circuits for general two-qubit gates”. In: Phys. Rev. A 69.3 (2004), p. 032315. [326] Brian DO Anderson. “Reverse-time diffusion equation models”. In: Stochastic Processes and their Applications 12.3 (1982), pp. 313–326. [327] Tailen Hsing and Randall Eubank. Theoretical foundations of functional data analysis, with an introduction to linear operators. Vol. 997. John Wiley & Sons, 2015. [328] Gabriel Peyré, Marco Cuturi, et al. “Computational optimal transport: With applications to data science”. In: Foundations and Trends® in Machine Learning 11.5-6 (2019), pp. 355–607. 148 [329] Shouvanik Chakrabarti, Huang Yiming, Tongyang Li, Soheil Feizi, and Xiaodi Wu. “Quantum Wasserstein generative adversarial networks”. In: Adv. Neural Inf. Process. 32 (2019). [330] Giacomo De Palma, Milad Marvian, Dario Trevisan, and Seth Lloyd. “The quantum Wasserstein distance of order 1”. In: IEEE Trans. Inf. Theory 67.10 (2021), pp. 6627–6643. [331] John A Smolin and David P DiVincenzo. “Five two-bit quantum gates are sufficient to implement the quantum Fredkin gate”. In: Phys. Rev. A 53.4 (1996), p. 2855. [332] Shi-Xin Zhang, Jonathan Allcock, Zhou-Quan Wan, Shuo Liu, Jiace Sun, Hao Yu, Xing-Han Yang, Jiezhong Qiu, Zhaofeng Ye, Yu-Qin Chen, et al. “Tensorcircuit: a quantum software framework for the nisq era”. In: Quantum 7 (2023), p. 912. [333] Rémi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T.H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander Tong, and Titouan Vayer. “POT: Python Optimal Transport”. In: J. Mach. Learn. Res. 22.78 (2021), pp. 1–8. [334] Kaixuan Huang, Zheng-An Wang, Chao Song, Kai Xu, Hekang Li, Zhen Wang, Qiujiang Guo, Zixuan Song, Zhi-Bo Liu, Dongning Zheng, et al. “Quantum generative adversarial networks with multiple superconducting qubits”. In: npj Quantum Inf. 7.1 (2021), p. 165. [335] Murphy Yuezhen Niu, Alexander Zlokapa, Michael Broughton, Sergio Boixo, Masoud Mohseni, Vadim Smelyanskyi, and Hartmut Neven. “Entangling quantum generative adversarial networks”. In: Phys. Rev. Lett. 128.22 (2022), p. 220505. [336] Vlatko Vedral and Martin B Plenio. “Entanglement measures and purification procedures”. In: Phys. Rev. A 57.3 (1998), p. 1619. [337] Fernando GSL Brandao and Martin B Plenio. “Entanglement theory and the second law of thermodynamics”. In: Nat. Phys. 4.11 (2008), pp. 873–877. [338] Spyros Tserkis, Jayne Thompson, Austin P Lund, Timothy C Ralph, Ping Koy Lam, Mile Gu, and Syed M Assad. “Maximum entanglement of formation for a two-mode Gaussian state over passive operations”. In: Phys. Rev. A 102.5 (2020), p. 052418. [339] William K Wootters. “Entanglement of formation of an arbitrary state of two qubits”. In: Phys. Rev. Res. 80.10 (1998), p. 2245. 149 Appendix A Supplemental Material for Chapter 2 A.1 Distribution of Hamiltonian coefficients In this section, we analytically derive the distribution of coefficients in the problem Hamiltonian HC,k+ for both k = 3 and k = 2 (see Eqs. (2.8)), and as well as the mean and variance. For 1-3-SAT+, the probability that AiaAja = 1 for arbitrary two different i and j is p3 = 3/ n 2 . According to the definition Jij = Pm a=1 AiaAja, the probability that Jij = J is P3 (Jij = J) = m J p J 3 (1 − p3) m−J , (A.1) where we can directly see that the distribution of J is dependent on the number of variables n and number of clauses m. With the distribution of Jij , the mean and variance are E3 (Jij ) = 6m n (n − 1), (A.2a) Var3 (Jij ) = 6m n 2 − n − 6 n2 (n − 1)2 . (A.2b) 1 Similarly, for 1-2-SAT+, the probability that AiaAja = 1 for arbitrary two different i and j is p2 = 1/ n 2 , and the probability that Jij = J is P2 (Jij = J) = m J p J 2 (1 − p2) m−J , (A.3) which is also size-dependent. The mean and variance are E2 (Jij ) = 2m n (n − 1), (A.4a) Var3 (Jij ) = 2m n 2 − n − 2 n2 (n − 1)2 . (A.4b) As the ratio between standard deviation to the mean of Jij decreases with the clause-to-variable ratio m/n for fixed n, we expect in the limit of large m/n, the coefficients Jij approach uniform for all i, j. The same applies to hi ’s for HC,3+ . At the end of the discussion, we want to address the difference between 1-kSAT+ and the well-known Sherrington-Kirkpatrick (SK) model in spin glass with the Hamiltonian HSK = P i<j Jijσ z i σ z j where Jij is independently sampled from the standard normal distribution N (0, 1) [314]. 1 A.2 Gate-based implementation of QAOA To implement the Hamiltonian dynamics in QAOA with a quantum circuit, one can decompose the unitary evolution into parallel Pauli-X and Pauli-Z gates as the following exp{−iγkHC,3} = exp i γk 8 X i hiσ z i ! exp −i γk 8 X i<j Jijσ z i σ z j exp −i γk 8 X i<j<ℓ Kijℓσ z i σ z j σ z ℓ = Y i exp i γkhi 8 σ z i Y i<j exp −i γkJij 8 σ z i σ z j Y i<j<ℓ exp −i γkKijℓ 8 σ z i σ z j (A.5) for 3-SAT and it is similar for HC,2 by taking Kijℓ = 0 and a factor of 2 in the denominator of exponents. For 1-3-SAT+, the problem Hamiltonian layer is exp −iγkHC,3+ = exp −i γk 2 X i hiσ z i ! exp −i γk 2 X i<j Jijσ z i σ z j = Y i exp −i γkhi 2 σ z i Y i<j exp −i γkJij 2 σ z i σ z j . (A.6) The case of HC,2+ is similar, with all hi ’s equaling zero. The first and second product in Eq. (A.5) and (A.6) correspond to parallel Pauli-Z rotation (RZ) rotation with ZZ interaction (RZZ) gates); and the unique third product in Eq. (A.5) correspond to rotaion with ZZZ interaction (RZZZ gates). Similarly, exp (−iβkHB) is implemented by Pauli-X rotation (RX) gates. 152 Numerically, we implement the QAOA with Qulacs [315], a high-performance quantum computing platform for both Python and C++. We employ the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm [316, 317, 318, 319], a gradient-based quasi-Newton method implemented in Scipy [320], to find the optimal parameters ⃗γ∗ , β⃗∗ . The classical optimization stops when either the difference of cost function between steps or gradient norm is smaller than 10−6 . Our numerical simulations are performed on the Puma HPC from University of Arizona with 50 cores of AMD Zen2 CPU and 250GB of RAM. A.3 Details of the classical approximate algorithms Variants of greedy algorithms are proposed for approximate MWIS problems [75, 76]. Before applying those algorithms for benchmark, we introduce some notations to avoid any confusion. Given a weighted graph G(Q, E, w), where Q, E, w represent the set of vertices, edges and weights of vertices, we use w(qi) denotes the weight of vertex qi and w(S) denotes the sum of weight for vertex set S. N(qi) represents the set of vertices that are adjacent to vertex qi and N +(qi) = N(qi) ∪ {qi}. We denote the degree of vertex q in graph Gi as dGi (q). We briefly summarize the four greedy algorithms that are used for benchmark in this paper, GWMIN, GWMAX, GWMIN2 [75] and WG [76] algorithms. We also list their corresponding guaranteed lower bounds on maximum weight estimation. Algorithm 11 GWMIN Begin S = ∅, i = 0, Gi = G while Q(Gi) ̸= ∅ do Choose a vertex q s.t. q = argmaxu∈Q(Gi) w(u) dGi (u)+1 S = S ∪ {q}; remove N + Gi (q) from Gi ; i = i + 1 end while 153 Output S Algorithm 12 GWMAX Begin S = ∅, i = 0, Gi = G while E(Gi) ̸= ∅ do Choose a vertex q s.t. q = argminu∈Q(Gi) w(u) dGi (u)(dGi (u)+1) remove q from Gi ; i = i + 1 end while Output S = Q(Gi) Algorithm 13 GWMIN2 Begin S = ∅, i = 0, Gi = G while Q(Gi) ̸= ∅ do Choose a vertex q s.t. q = argmaxu∈Q(Gi) P w(u) q∈N+ Gi (u) w(q) S = S ∪ {q}; remove N + Gi (q) from Gi ; i = i + 1 end while Output S Algorithm 14 WG Begin S = ∅, i = 0, Gi = G while Q(Gi) ̸= ∅ do Choose a vertex q s.t. q = argminu∈Q(Gi) P q∈NGi (u) w(q) w(u) S = S ∪ {q}; remove N + Gi (q) from Gi ; i = i + 1 154 end while Output S It is also shown that the lower bounds of approximate ratio for those approximate algorithms are rGWMIN ≥ X q∈Q(G) w(q) dG(q) + 1 , (A.7a) rGWMAX ≥ X q∈Q(G) w(v) dG(q) + 1 , (A.7b) rGWMIN2 ≥ X q∈Q(G) w(q) 2 P u∈N + G (q) w(u) , (A.7c) rWG ≥ W(G) P q∈Q(G) w(NG(q))/w(G) + 1. (A.7d) To obtain benchmarks, we reduce the 1-k-SAT+ instances to MWIS instances with igraph [321]. As all the four algorithms are variants of greedy algorithms, their performances are similar. The final benchmark presented in Chapter 2 are obtained from the best approximation ratio among the four algorithms for every fixed clause-to-variable ratio m/n separately. A.3.1 On approximation ratio Here we provide a brief summary of known facts of the approximation ratio of problems related to the 1-k-SAT+, as there are not much known results for 1-k-SAT+ itself. Note that these results do not carry over to the 1-k-SAT+, as we explain below. One can reduce a 1-k-SAT+ instance to a k-SAT instance. There is a simple polynomial-time algorithm that provides a (1 − 1/2 k ) approximation ratio for Max-k-SAT (in this paper we always mean exact k variables in each clause). This is also equal to the expected approximation ratio of a random assignment. Ref. [322] shows that the lower bound of approximation ratio for polynomial-time algorithm in solving 155 Max-3-SAT to be r ≥ 0.95. Ref. [55] also shows that it is NP-hard to approximate Max-2-SAT with any approximation ratio above 21/22 ≃ 0.955. However, it is important to note that the above are worst case results and does not directly apply to 1-k-SAT+ due to the reduction. There are also results on approximate algorithms guaranteeing certain approximate ratios for random instances of Max-k-SAT. However, as instances generated by reducing the random instances of 1-k-SAT+ to k-SAT are by no means random, these results also do not apply to random instances of Max-1-k-SAT+. A.4 Initialization strategy of QAOA For a p-layer QAOA, the pre-optimization strategy initializes the first p ′ layers (1 ≤ p ′ < p) by an optimized p ′ -layer QAOA ∗ , and sample the rest of the parameters {γk} p k=p ′+1, {βk} p k=p ′+1 randomly uniformly in [0, ϵ]. With the initialization, further training gives the optimal parameters. Here we choose ϵ = 0.1 to take advantage of the p ′ -layer QAOA results without being trapped in local minima. ∗ In practice, the number of layers p ′ is chosen to be comparable to p to obtain better performance. 156 Appendix B Supplemental Material for Chapter 3 In this chapter, we will omit ‘ˆ’ on operators, as long as it is not confusing. B.1 Derivation of Eq. (3.8) of Chapter 3 In this section, we provide details on the derivation of Eq. (3.8) in Chapter 3. The time difference of QNTK is δK ≡ K(t + 1) − K(t) = X ℓ δ ∂ϵ ∂θℓ ∂ϵ ∂θℓ (B.1) = X ℓ ∂ϵ ∂θℓ (t + 1) ∂ϵ ∂θℓ (t + 1) − ∂ϵ ∂θℓ (t) ∂ϵ ∂θℓ (t) (B.2) = X ℓ ∂ϵ ∂θℓ (t + 1) ∂ϵ ∂θℓ (t + 1) − ∂ϵ ∂θℓ (t + 1) ∂ϵ ∂θℓ (t) + ∂ϵ ∂θℓ (t + 1) ∂ϵ ∂θℓ (t) − ∂ϵ ∂θℓ (t) ∂ϵ ∂θℓ (t) (B.3) = X ℓ ∂ϵ ∂θℓ (t + 1)δ ∂ϵ ∂θℓ (t) + δ ∂ϵ ∂θℓ (t) ∂ϵ ∂θℓ (t) (B.4) = X ℓ 2δ ∂ϵ ∂θℓ (t) ∂ϵ ∂θℓ (t) + ∂ϵ ∂θℓ (t + 1)δ ∂ϵ ∂θℓ (t) − δ ∂ϵ ∂θℓ (t) ∂ϵ ∂θℓ (t) (B.5) = X ℓ 2δ ∂ϵ ∂θℓ (t) ∂ϵ ∂θℓ (t) + δ ∂ϵ ∂θℓ (t + 1) δ ∂ϵ ∂θℓ (t) . (B.6) 157 The second term in the last formula has two δ, so it is in higher orders in η, and we only focus on the first term. We utilize the leading order Taylor expansion on δ∂ϵ/∂θℓ as δ ∂ϵ ∂θℓ (t) = X ℓ1 ∂ 2 ϵ ∂θℓ1 ∂θℓ δθℓ1 + O(η 2 ) = −η X ℓ1 ∂ 2 ϵ ∂θℓ1 ∂θℓ ∂ϵ ∂θℓ1 ϵ + O(η 2 ). (B.7) So we have X ℓ δ ∂ϵ ∂θℓ (t) ∂ϵ ∂θℓ (t) = −η X ℓ,ℓ1 ∂ 2 ϵ ∂θℓ∂θℓ1 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ ϵ + O(η 2 ) = −ηµϵ + O(η 2 ), (B.8) which leads to the gradient descent dynamical equation of K as δK = −2ηϵµ + O(η 2 ), (B.9) recovering Eq. (3.8) of Chapter 3. B.2 Details of autocorrelators In this section, we provide a mean-field approach to provide an insight to the scaling of autocorrelators. For any time-dependent quantity F(t) that has ensemble fluctuations, we define the late-time autocorrelator as AF (τ ) ≡ E [(F(t) − F(∞)) (F(t + τ ) − F(∞))] , (B.10) where the average is over the ensemble of trajectories and we will consider t ≫ 1 region. Here F(∞) = limT→∞ R 2T T dtF(t)/T is the smoothed late-time value of the function. For ϵ(t), the definition in Eq. (B.10) leads to Aϵ(τ ) ≡ E [ε(t)ε(t + τ )], which is the one adopted in Chapter 3. 158 When O0 > Omin with C > 0, utilizing the solution Eq. (3.15) of Chapter 3, the mean-field approximated autocorrelator in Eq. (B.10) becomes Aϵ(τ ) = Z dΓ(λ)dΓ(B1) C 2/λ2 (B1e ηCt − 2) B1e ηC(t+τ) − 2 (B.11) ≃ Z dΓ(λ)dΓ(B1) C 2λ 2 B2 1 e 2ηCte ηCτ (B.12) ∼ C 2/λ 2 B1 2 e 2ηCte ηCτ ∼ e −ηCτ , (B.13) where Γ(λ), Γ(B1)is the distribution of conserved quantity and fitting parameter in different initialization. Similarly, for O0 < Omin with C < 0, we have Aϵ(τ ) = Z dΓ(λ)dΓ(B1) C/λ B1e ηCt − 2 − R C/λ B1e ηC(t+τ) − 2 − R (B.14) = Z dΓ(λ)dΓ(B1) −2R B1e ηCt − 2 − R −2R B1e ηC(t+τ) − 2 − R (B.15) = R 2 Z dΓ(λ)dΓ(B1) 1 1 − 2B −1 1 e−ηCt 1 1 − 2B −1 1 e−ηC(t+τ) (B.16) ∼ R2B1 2 e 2ηCte ηCτ 4 ∼ e ηCτ . (B.17) We numerically show the decay of autocorrelators with different O0 in Fig. B.1(a), (d). In both cases, we see the exponential decay of autocorrelators, and the correlation length defined by AF (τ ) ∼ exp (−τ /ξ) is ξ ∼ 1/(η|C|) ∼ |O0 − Omin| −ν2 . (B.18) For O0 < Omin, we directly have |C| ∝ |O0 − Omin| which leads to ν2 = 1. This is verified in Fig. B.1(c), with the fitted exponent ν2 = 1.006. For O0 > Omin, we have |C| = |K| = GM, with GM being the 15 10−16 10−14 A ( τ ) (a) 10−13 10−11 A K ( τ ) (b) 102 ξ (c) ξ ξK 1 |O0−Omin| 0 1000 2000 τ 10−16 10−15 10−14 A ( τ ) (d) 0 1000 2000 τ 10−13 10−11 A K ( τ ) (e) 10−1 10 0 |O0 − Omin| 102 ξ (f) ξ ξK 1 |O0−Omin| Figure B.1: Decay of autocorrelators and corresponding correlation length with O0 away from critical points as O0 < Omin (top) and O0 > Omin (bottom). The first two columns plot the autocorrelators. In (c), (f), the overlapping red and green dots represent correlation length fitted from Eq. (B.18) and dashed lines with same color show the fitting results. Black dashed lines represent its scaling as 1/|O0 − Omin|. The observable is the Hamiltonian of XXZ model with J = 2, and circuit ansatz is n = 2 qubit RPA with L = 64 layers. spectrum gap of Hessian at late time, which indicates that ν2 = ν1. This is again verified in Fig. B.1(f) with the fitted exponent ν2 = 1.038. From the duality between ϵ and K, the autocorrelator of K also decays exponentially, when the system is not at the critical point, shown in Fig. B.1(b), (e). The corresponding correlation length exponent can also be found as ν2 = 1.006, 1.034 for O0 ≶ Omin. In general, we have ν2 = 1, (B.19) within our numerical precision. 160 103 105 τ 10−5 10−2 101 A ( τ ) A AK Aµ 1/τ Figure B.2: Decay of autocorrelators for ϵ(t), K(t), µ(t) at critical O0 = −6 (solid curves). Dashed lines with corresponding color show the fitting results with ∆[ϵ] = 0.494, ∆[K] = 0.499 and ∆[µ] = 0.506. Black dashed line represent the scaling 1/τ . On the other hand, at the critical point O0 = Omin we have Aϵ(τ ) = Z dΓ(λ)dΓ(B2) 1/λ2 B −1 2 + 2ηt B −1 2 + 2η(t + τ ) (B.20) ≃ Z dΓ(λ)dΓ(B2) 1/λ2 4η 2t(t + τ ) (B.21) ∼ 1/λ 2 4η 2t(t + τ ) ∼ 1 τ , (B.22) which decays polynomially with τ (see Fig. B.2). Note that the scaling of Aϵ(τ ) ∼ 1/τ holds only if τ ≫ t, otherwise it is nearly a constant. As λ = µ/K approaches a constant, we have that µ ∼ K ∼ 1/t, and thus Aµ(τ ) ∼ 1/τ . From the definition of scaling dimension, AF (τ ) = 1/τ 2∆[F] , one can find that ∆[ϵ] = ∆[K] = ∆[µ] = 1/2, (B.23) which is verified in Fig. B.2, and ∆[µ] = 0. 1 B.3 Observable trace properties In this work, we focus on the traceless observables, where a typical example is the spin Hamiltonian. In general, a n-qubit observable can always be written in the form of linear combinations of nontrivial Paulis O = PN i=1 ciPi with Pi ∈ {I, σx , σy , σz} ⊗n/{I ⊗n}, where 1 ≤ N ≤ 4 n − 1 is the number of unqiue Paulis in the observable. We discuss the scaling of the trace of its powers up to four with respect to Hilbert space dimension d and number of terms N. To begin with, tr(O) = 0, (B.24) tr O 2 = X N i1,i2=1 ci1 ci2 tr(Pi1Pi2 ) = X i c 2 i tr P 2 i + X i1̸=i2 ci1 ci2 tr(Pi1Pi2 ) ∼ N d. (B.25) For higher orders, we focus on some typical cases of observables to provide an insight to its scaling. B.3.1 One-body observable For the simplest case, a linear combination of 1-local Paulis O = P i ciPi , where Pi is nontrivially supported on only one qubit and ci ∈ R, the trace of its third and fourth power is tr O 3 1−local = X i1,i2,i3 ci1 ci2 ci3 tr(Pi1Pi2Pi3 ) = 0, (B.26) tr O 4 1−local = X i1,i2,i3,i4 ci1 ci2 ci3 ci4 tr(Pi1Pi2Pi3Pi4 ) (B.27) = X i1,i2 c 2 i1 c 2 i2 tr P 2 i1 P 2 i2 + 2 X i1,i2̸=i3 c 2 i1 ci2 ci3 tr P 2 i1 Pi2Pi3 + X i1̸=i2 i3̸=i4 ci1 ci2 ci3 ci4 tr(Pi1Pi2Pi3Pi4 ) (B.28) = X i1,i2 c 2 i1 c 2 i2 tr(I) + 0 + 2 X i1̸=i2 c 2 i1 c 2 i2 tr P 2 i1 P 2 i2 (B.29) ∼ N 2 d + 2N(N − 1)d ∼ 3N 2 d, (B.3 where the contribution from Paulis nontrivially supported on the same qubit is overestimated. In a special case where O incorporates all possible 1-local Pauli with equal weights, the rigorous result is tr O4 = (3N2 − 6N)d ∼ 3N2d, leading to a sub-order correction to the estimation in Eq. (B.30). B.3.2 Two-body observable: XXZ model In the 2-local Pauli case, we consider the Hamiltonian consists of Paulis at most non-trivially supported on two qubit, and specifically, the two qubit are nearest neighbors. Here we take the Heisenberg model as an example, OHM = − nX−1 i=1 Jxσ x i σ x i+1 + Jyσ y i σ y i+1 + Jzσ z i σ z i+1 − h Xn i=1 σ z i . (B.31) Specifically, when Jx = Jy and Jz = h but Jz ̸= Jx, the general Heisenberg model is reduced to the XXZ model as OXXZ = − nX−1 i=1 σ x i σ x i+1 + σ y i yi+1 + Jσz i σ z i+1 − J Xn i=1 σ z i . (B.32) which is studied in Chapter 3. The trace of its power from second to fourth can be exactly solved as tr O 2 XXZ = nX−1 i,j=1 tr σ x i σ x i+1σ x j σ x j+1 + tr σ y i σ y i+1σ y j σ y j+1 + J 2 tr σ z i σ z i+1σ z j σ z j+1 + Xn i,j=1 J 2 tr σ z i σ z j (B.33) = nX−1 i=1 2 tr(I) + J 2 tr(I) + Xn i=1 J 2 tr(I) (B.34) = (J 2 + 2)(n − 1) + J 2n d (B.35) ≃ 2(J 2 + 1)nd, (B. where in the second line we only keep the nonzero terms and omit the zero contributions. With one more step, we can find the trace of its third power as tr O 3 XXZ = −6J nX−1 i,j,k=1 tr σ x i σ x i+1σ y j σ y j+1σ z kσ z k+1 − 3J 3 nX−1 i,j,k=1 tr σ z i σ z i+1σ z j σ z k (B.37) = 6(n − 1)Jd − 6J 3 (n − 1)d (B.38) = 6J(1 − J 2 )(n − 1)d, (B.39) ≃ 6J(1 − J 2 )nd, (B.40) where again the first equation is an effective equation for all non-zero contributions. When J ≶ 1, we have tr O3 XXZ ∼ ∓N d, and at the critical J = 1, we have tr O3 XXZ = 0. The trace of fourth power is tr O 4 XXZ = nX−1 i,j, k,l=1 2 tr σ x i σ x i+1σ x j σ x j+1σ x k σ x k+1σ x l σ x l+1 + 4 tr σ x i σ x i+1σ x j σ x j+1σ y k σ y k+1σ y l σ y l+1 + 8J 2 tr σ x i σ x i+1σ x j σ x j+1σ z kσ z k+1σ z l σ z l+1 +2 tr σ x i σ x i+1σ y j σ y j+1σ x k σ x k+1σ y l σ y l+1 + 4J 2 tr σ x i σ x i+1σ z j σ z j+1σ z kσ z k+1σ z l σ z l+1 + J 4 tr σ z i σ z i+1σ z j σ z j+1σ z kσ z k+1σ z l σ z l+1 + nX−1 i,j=1 Xn k,l=1 8J 2 tr σ x i σ x i+1σ x j σ x j+1σ z kσ z l + 8J 2 tr σ x i σ x i+1σ y j σ y j+1σ z kσ z l + 6J 4 tr σ z i σ z i+1σ z j σ z j+1σ z kσ z l + nX−1 i,k=1 Xn j,l=1 4J 2 tr σ x i σ x i+1σ z j σ x k σ x k+1σ z l + 4J 2 tr σ x i σ x i+1σ z j σ y k σ y k+1σ z l + Xn i,j,k,l=1 J 4 tr σ z i σ z j σ z kσ z l (B.41) = 2(n − 1)(3n − 5)d + 4(n − 1)2 d + 8J 2 (n − 1)2 d + 2(n − 3)2 d + 4J 2 (n − 3)2 d + J 4 (n − 1)(3n − 5)d + 8J 2n(n − 1)d + 8J 2 (−2(n − 1)d) + 6J 4 (n 2 + 3n − 8)d + 4J 2 (n − 1)(n − 4)d + 4J 2 (2(n − 1)d) + J 4n(3n − 2)d (B.42) = h 12 J 2 + 12 n 2 + 4 2J 4 − 19J 2 − 9 n − 43J 4 + 68J 2 + 32i d (B.43) ≃ 12(J 2 + 1)2n 2 d ∼ N where in the first equation we only show the nonzero unique contributions and the coefficient ahead of each term counts its repetitions. We leave observables with more body interaction for future work though it does not change the major conclusion/scaling of this work. B.4 Method in ensemble average calculation To assist the following discussion, we present the expression of first order gradient of residual error by commutators as ∂ϵ ∂θℓ = ∂θℓ ⟨ψ0|U † (θ)OUˆ(θ)|ψ0⟩ = i 2 ⟨ψ0|U † ℓ− h Xℓ , U† ℓ+ OUℓ+ i Uℓ− |ψ0⟩ = i 2 ⟨ψ0|U † ℓ− [Xℓ , Oℓ+ ]Uℓ− |ψ0⟩, (B.45) where |ψ0⟩ is the initial pure state of system. Here we define the unitary notations Uℓ− as Uℓ− = ℓ Y−1 k=1 WkVk(θk), Uℓ+ = Y L k=ℓ WkVk(θk), (B.46) and Oℓ+ = U † ℓ+ OUℓ for simplicity. The second order gradient assuming ℓ1 < ℓ2 and ℓ1 = ℓ2 = ℓ can be written in a similar way as ∂ 2 ϵ ∂θℓ1 ∂θℓ2 = − 1 4 ⟨ψ0|U † ℓ − 1 [Xℓ1 , U† ℓ1ℓ2 [Xℓ2 , U† ℓ + 2 OUℓ + 2 ]Uℓ1ℓ2 ]Uℓ − 1 |ψ0⟩ = − 1 4 ⟨ψ0|U † ℓ − 1 [Xℓ1 , U† ℓ1ℓ2 [Xℓ2 , Oℓ + 2 ]Uℓ1ℓ2 ]Uℓ − 1 |ψ0⟩ (B.47) ∂ 2 ϵ ∂θ2 ℓ = − 1 4 ⟨ψ0|U † ℓ− [Xℓ , [Xℓ , Oℓ+ ]]Uℓ− |ψ0⟩, (B.48) 165 where Uℓ1ℓ2 = ℓ Y2−1 k=ℓ1 WkVk(θk). (B.49) B.5 Frame potential with restricted Haar ensemble B.5.1 Frame potential applied to QNN To quantify the randomness of an ensemble of unitaries, we evaluate the kth frame potential F (k) . For an arbitrary unitary ensemble E, F (k) E = 1 |E|2 X U,U′∈E |tr U †U ′ | 2k ≥ F(k) Haar = k!, (B.50) where the minimum is achieved by Haar ensemble [61]. To provide insight into the ensemble in the case of O0 ≤ Omin, we evaluate the frame potential of the restricted Haar ensemble F (k) RH = X k k1,k2=0 k1+2k2≤k k! k1!(k2!)2(k − k1 − 2k2)!F (k1+k2) Haar (B.51) ≥ F(k+1) Haar = (k + 1)!. (B.52) We verify the above Eq. (B.51) and lower bound (k + 1)! in Fig. B.3(a). To verify that the ensemble distribution of unitaries of the QNN satisfy the restricted Haar ensemble in Eq. (3.29), ideally we want to consider the different unitaries from random initialization that lead to the same converged state, so that the ensemble averaged values can provide insight into a specific training dynamics where a specific converged state is observed. However, this is in general challenging as random 166 2 4 6 8 k 102 105 108 F ( k ) (a) theory LB numerical F (k) Haar 0 200 400 t 2 4 6 7 F(2) ( t ) (b) O0 = −8 O0 = −6 O0 = −4 Figure B.3: (a) Frame potential F (k) for the restricted Haar ensemble with dimension 2 n = 4. Red dashed line is the exact theory prediction in Eq. (B.59) and magenta dashed line is its lower bound (k + 1)!. (b) The evolution of 2-order frame potential F (2) for ensemble of RPA circuit unitary with different target O0. The observable is OXXZ with J = 2, with Omin = −6. Black dashed line is exact value of F (k) in Eq. B.51. initialization in general will lead to convergence to different local optimums, unless in the case of O0 ≤ Omin, where the converged state will be the ground state, up to a finite degeneracy. In this case, we can directly evaluate the frame potential over late time unitaries from different initializations. We numerically evaluate the dynamics of 2-order frame potential F (2) in Fig. B.3(b) where different O0 is considered. For k = 2, the frame potential F (k) over the restricted Haar unitary ensemble is F (2) = 7 according to Eq. (B.51) and F (2) = 2 for Haar random ensemble. Indeed, we see that for O0 ≤ Omin, the ensemble frame potential approaches to the prediction of restricted Haar ensemble. When O0 > Omin, F (2) is far away from it, this is because the converged state is not unique for O0 > Omin and different random initializations fail to provide the ensemble of unitaries with a fixed converged state: When one consider different random initialization, each training trajectory converges to a different state and the entire unitary ensemble under random initialization does not capture the restrictions and in fact approach Haar random. While in each single trajectory, the convergence still places a restriction on the typical unitary that maps the initial state to the final state. 167 B.5.2 Details of formula For simplicity, we assume V is a Haar random unitary in Eq. (B.223). The kth frame potential of the unitary ensemble is thus F (k) RH = 1 |E|2 X U,U′∈E |tr U †U ′ | 2k (B.53) = 1 |E|2 X U,U′∈E |1 + tr V †V ′ | 2k (B.54) = Z Haar dV dV ′ 1 + tr V †V ′ + tr V †V ′ ∗ + |tr V †V ′ | 2 k . (B.55) For simplicity, we denote tr V †V ′ ≡ z and then have F (k) RH = Z Haar dV dV ′ 1 + z + z ∗ + |z| 2 k (B.56) = X k k1,k2,k3=0 k1+k2+k3≤k k k1, k2, k3 Z Haar dV dV ′ |z| 2k1 z ∗k2 z k3 (B.57) = X k k1,k2=0 k1+2k2≤k k! k1!(k2!)2(k − k1 − 2k2)! Z Haar dV dV ′ |z| 2k1 |z| 2k2 (B.58) = X k k1,k2=0 k1+2k2≤k k! k1!(k2!)2(k − k1 − 2k2)!F (k1+k2) Haar (B.59) ≥ F(k) Haar + k 2F (k−1) Haar = (k + 1)F (k) Haar = F (k+1) Haar . (B.60) B.6 Results with Haar random ensemble In this section, we present results evaluated from Haar random unitary ensemble, which provides characterizations of QNN dynamics at early time. The rest of the contents are regarding ensemble averaging 1 over Haar (App. B.6) or restricted Haar ensemble (App. B.7), where we have utilized symbolic tools from Ref. [323]. B.6.1 Average QNTK under Haar random ensemble With random initialization, the circuit forms a Haar random ensemble, and the ensemble average of QNTK is K0 = P ℓ E ∂ϵ ∂θℓ 2 , where the ensemble average inside the summation is E " ∂ϵ ∂θℓ 2 # = − 1 4 Z dUℓ− dUℓ+ ⟨ψ0|U † ℓ− [Xℓ , Oℓ+ ]Uℓ− |ψ0⟩ ⟨ψ0|U † ℓ− [Xℓ , Oℓ+ ]Uℓ− |ψ0⟩ (B.61) = − 1 4 Z dUℓ− dUℓ+ tr ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− (B.62) = − Z dUℓ+ tr [Xℓ , Oℓ+ ] 2 4(d 2 + d) , (B.63) where ρ0 = |ψ0⟩⟨ψ0|. Using the trace identity tr [A, B] 2 = 2 tr(ABAB) − 2 tr(ABBA), we then have E " ∂ϵ ∂θℓ 2 # = − Z dUℓ+ tr [Xℓ , Oℓ+ ] 2 4(d 2 + d) = − 2 4(d 2 + d) Z dUℓ+ tr (XℓOℓ+ XℓOℓ+ ) − tr O 2 (B.64) = d tr O2 − tr(O) 2 2(d − 1)(d + 1)2 . (B.65) Note that here we assume both Uℓ− and Uℓ+ form a Haar random ensemble (2-design) separately. Therefore, the ensemble average of QNTK is K0 = X ℓ E " ∂ϵ ∂θℓ 2 # = L d tr O2 − tr(O) 2 2(d − 1)(d + 1)2 ≃ L 2d 3 d tr O 2 − tr(O) 2 , (B.66) where we approximate it by d ≫ 1 in the last equation. Specifically, for the traceless operator we considered in Chapter 3, we have K0 ≃ L 2d 2 tr O 2 . (B.67) B.6.2 Average relative dQNTK under Haar random ensemble We define the average λ as λ = µ/K, (B.68) where dQNTK µ is defined as µ = X ℓ1,ℓ2 ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 . (B.69) In the following, we calculate the Haar ensemble average of dQNTK as µ0 = 2 X ℓ1<ℓ2 E ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + X ℓ E " ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # (B.70) = L(L − 1)E ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + LE " ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # . (B.71) Calculation of ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 with Haar random ensemble 17 We first evaluate the summation of ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 E " ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # = 1 16 Z dUℓ− dUℓ+ ⟨ψ0|U † ℓ− [Xℓ , [Xℓ , Oℓ+ ]]Uℓ− |ψ0⟩ ⟨ψ0|U † ℓ− [Xℓ , Oℓ+ ]Uℓ− |ψ0⟩ 2 (B.72) = 1 16 Z dUℓ− dUℓ+ tr ρ0U † ℓ− [Xℓ , [Xℓ , Oℓ+ ]]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− (B.73) = Z dUℓ+ tr [Xℓ , [Xℓ , Oℓ+ ]] [Xℓ , Oℓ+ ] 2 8d(2 + 3d + d 2) (B.74) = 0, (B.75) where the last equation is obtained from the trace cyclic identity. Calculation of ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 with Haar random ensemble We next consider the term ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 assuming ℓ1 < ℓ2 as E ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 ⟨ψ0|U † ℓ − 1 [Xℓ1 , U† ℓ1ℓ2 [Xℓ2 , Oℓ + 2 ]Uℓ1ℓ2 ]Uℓ − 1 |ψ0⟩ ⟨ψ0|U † ℓ − 1 [Xℓ1 , Oℓ + 1 ]Uℓ − 1 |ψ0⟩ × ⟨ψ0|U † ℓ − 1 U † ℓ1ℓ2 [Xℓ2 , Oℓ + 2 ]Uℓ1ℓ2Uℓ − 1 |ψ0⟩ (B.76) = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 (B.77) tr ρ0U † ℓ − 1 [Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 ]Uℓ − 1 ρ0U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 (B.78) = 1 16(d 3 + 3d 2 + 2d) Z dUℓ1ℓ2 dUℓ + 2 h tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 + tr hXℓ1 , Oℓ + 1 i [Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 ]U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i , (B.79) 171 where in the last equation we do the Haar integration (4-design) over Uℓ − 1 . Next we evaluation the Haar integral over Uℓ1ℓ2 , Uℓ + 2 separately. The first term tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 is further integrated as Z dUℓ1ℓ2 dUℓ + 2 tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 = Z dUℓ1ℓ2 dUℓ + 2 h tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 + tr h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 − tr h Xℓ2 , Oℓ + 2 i2 Oℓ + 2 (B.80) = Z dUℓ + 2 tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 1 − d 2 + d trh Xℓ2 , Oℓ + 2 i2 tr(O) − tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 d 2 − 1 − tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 1 − d 2 − tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 (B.81) = Z dUℓ + 2 d trh Xℓ2 , Oℓ + 2 i2 tr(O) − d tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 d 2 − 1 (B.82) = d d 2 − 1 tr(O) 2d h tr(O) 2 − d tr O2 i d 2 − 1 − d d tr(O) tr O2 − d tr O3 d 2 − 1 (B.83) = d 2 h d 2 tr O3 − 3d tr O2 tr(O) + 2 tr(O) 3 i (d 2 − 1)2 . (B.84) The second term tr hXℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 becomes Z dUℓ1ℓ2 dUℓ + 2 tr hXℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 = Z dUℓ1ℓ2 dUℓ + 2 tr h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 + tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 − tr h Xℓ2 , Oℓ + 2 i2 Oℓ + 2 (B.85) = Z dUℓ + 2 d trh Xℓ2 , Oℓ + 2 i2 tr(O) − tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 d 2 − 1 + tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 1 − d 2 − tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 1 − d 2 − tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 (B.86) = Z dUℓ + 2 d trh Xℓ2 , Oℓ + 2 i2 tr(O) − d tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 d 2 − 1 (B.87) = d 2 h d 2 tr O3 − 3d tr O2 tr(O) + 2 tr(O) 3 i (d 2 − 1)2 . (B.88) Combine Eq. (B.84) and (B.88), we then have the average of ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 as E ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 (B.89) = 1 16(d 3 + 3d 2 + 2d) h tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 + tr hXℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i (B.90) = d h d 2 tr O3 − 3d tr O2 tr(O) + 2 tr(O) 3 i 8(d − 1)2(d + 1)3(d + 2) . (B.91) Summary of average relative dQNTK λ0 under Haar random ensemble Summarizing from Eq. (B.75) and (B.91), the mean of dQNTK µ0 is µ0 = L(L − 1)E ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + LE " ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # (B.92) = L(L − 1) d h d 2 tr O3 − 3d tr O2 tr(O) + 2 tr(O) 3 i 8(d − 1)2(d + 1)3(d + 2) (B.93) ≃ L 2 8d 5 h d 2 tr O 3 − 3d tr O 2 tr(O) + 2 tr(O) 3 i , (B.94) where the last line is approximated in the wide QNN limit with d ≫ 1. Since tr O3 can be nonzero depending on specific choice of O, our result characterize more of µ0’s scaling compared to the result of µ0 = 0 from Ref. [25] where Uℓ − 1 , Uℓ + 1 , Uℓ − 2 , Uℓ + 2 is considered to be independent Haar random unitaries. According to the definition of λ0 = µ0/K0, we have λ0 = µ0/K0 = (L − 1) d h d 2 tr O3 − 3d tr O2 tr(O) + 2 tr(O) 3 i 4(d − 1)(d + 1)(d + 2) h d tr(O2) − tr(O) 2 i (B.95) ≃ L 4d 2 d 2 tr O3 − 3d tr O2 tr(O) + 2 tr(O) 3 d tr(O2) − tr(O) 2 . (B.96) Specifically for traceless observable, it is further reduced to λ0 ≃ L 4d tr O3 tr(O2) . (B.97) B.6.3 Average dynamical index with Haar random ensemble We define the average ζ as ζ = ϵµ/K 2 . ( The Haar ensemble average of ϵ0µ0 becomes ϵ0µ0 = X ℓ1,ℓ2 E ϵ0 ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 (B.99) = X ℓ1,ℓ2 ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 − O0µ0 (B.100) = 2 X ℓ1<ℓ2 E ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + X ℓ E " ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # − O0µ0 (B.101) = L(L − 1)E ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + LE " ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # − O0µ0. (B.102) As µ0 is already solved above, we only need to evaluate the first two parts. Calculation of ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 with Haar random ensemble The ensemble average of ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 is E " ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # = 1 16 Z dUℓ− dUℓ+ ⟨ψ0|U † ℓ− Oℓ+ Uℓ− |ψ0⟩ ⟨ψ0|U † ℓ− [Xℓ, [Xℓ, Oℓ+ ]]Uℓ− |ψ0⟩ ⟨ψ0|U † ℓ− [Xℓ, Oℓ+ ]Uℓ− |ψ0⟩ 2 (B.103) = 1 16 Z dUℓ− dUℓ+ tr ρ0U † ℓ− Oℓ+ Uℓ− ρ0U † ℓ− [Xℓ, [Xℓ, Oℓ+ ]]Uℓ− ρ0U † ℓ− [Xℓ, Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ, Oℓ+ ]Uℓ− (B.104) = Z dUℓ+ 1 d(d + 1)(d + 2)(d + 3) tr [Xℓ, Oℓ+ ] 2 tr([Xℓ, [Xℓ, Oℓ+ ]]Oℓ+ ) + 2 tr [Xℓ, Oℓ+ ] 2 [Xℓ, [Xℓ, Oℓ+ ]]Oℓ+ +2 tr([Xℓ, Oℓ+ ][Xℓ, [Xℓ, Oℓ+ ]][Xℓ, Oℓ+ ]Oℓ+ ) + 2 tr [Xℓ, [Xℓ, Oℓ+ ]][Xℓ, Oℓ+ ] 2Oℓ+ , (B.105) where the last equation we do the Haar integral (4-design) over Uℓ− . The integration over Uℓ+ of the first term tr [Xℓ , Oℓ+ ] 2 tr([Xℓ , [Xℓ , Oℓ+ ]]Oℓ+ ) is Z dUℓ+ tr [Xℓ, Oℓ+ ] 2 tr([Xℓ, [Xℓ, Oℓ+ ]]Oℓ+ ) = Z dUℓ+ h 8 tr(XℓOℓ+ XℓOℓ+ ) tr O 2 − 4 tr(XℓOℓ+ XℓOℓ+ ) 2 − 4 tr O 2 2 i = 8 tr O2 h d tr(O) 2 − tr O2 i d 2 − 1 − 4 tr O 2 2 − 4 (d 2 − 1) (d 2 − 4) (d 2 − 9) h d 4 − d 3 − 9d 2 + 4d + 20 tr(O) 4 + 2d −3d 2 + 5d + 7 tr O 2 tr(O) 2 + d 5 + d 4 − 12d 3 − 5d 2 + 12d + 24 tr O 2 2 − 2d d 3 + 2d 2 − 14d + 2 tr O 4 i (B.106) = 4 (d 2 − 9) (d 2 − 4) (d 2 − 1) h − d 4 − d 3 − 9d 2 + 4d + 20 tr(O) 4 + 2d d 4 − 10d 2 − 5d + 29 tr O 2 tr(O) 2 − 8 3d 2 − 5d − 7 tr O 3 tr(O) − d 6 + d 5 − 11d 4 − 12d 3 + 18d 2 + 12d + 60 tr O 2 2 +2d d 3 + 2d 2 − 14d + 2 tr O 4 . (B.107) The integration of the second term tr [Xℓ , Oℓ+ ] 2 [Xℓ , [Xℓ , Oℓ+ ]]Oℓ+ becomes Z dUℓ+ tr [Xℓ , Oℓ+ ] 2 [Xℓ , [Xℓ , Oℓ+ ]]Oℓ+ = Z dUℓ+ 8 tr XℓOℓ+ XℓO 3 ℓ+ − 4 tr XℓO 2 ℓ+ XℓO 2 ℓ+ − 2 tr(XℓOℓ+ XℓOℓ+ XℓOℓ+ XℓOℓ+ ) − 2 tr O 4 = 2d h 4 d 2 − 7 tr O3 tr(O) + 21 − 2d 2 tr O2 2 − d d 2 − 7 tr O4 − 2d tr O2 tr(O) 2 + tr(O) 4 i ( The third term tr([Xℓ , Oℓ+ ][Xℓ , [Xℓ , Oℓ+ ]][Xℓ , Oℓ+ ]Oℓ+ ) becomes Z dUℓ+ tr([Xℓ , Oℓ+ ][Xℓ , [Xℓ , Oℓ+ ]][Xℓ , Oℓ+ ]Oℓ+ ) = Z dUℓ+ 4 tr XℓO 2 ℓ+ XℓO 2 ℓ+ + 2 tr(XℓOℓ+ XℓOℓ+ XℓOℓ+ XℓOℓ+ ) + 2 tr O 4 − 8 tr XℓOℓ+ XℓO 3 ℓ+ (B.109) = − 2d h 4 d 2 − 7 tr O3 tr(O) + 21 − 2d 2 tr O2 2 − d d 2 − 7 tr O4 − 2d tr O2 tr(O) 2 + tr(O) 4 i (d 2 − 1)(d 2 − 9) . (B.110) The last term tr [Xℓ , [Xℓ , Oℓ+ ]][Xℓ , Oℓ+ ] 2Oℓ+ becomes Z dUℓ+ tr [Xℓ , [Xℓ , Oℓ+ ]][Xℓ , Oℓ+ ] 2Oℓ+ = Z dUℓ+ 8 tr XℓOℓ+ XℓO 3 ℓ+ − 4 tr XℓO 2 ℓ+ XℓO 2 ℓ+ − 2 tr(XℓOℓ+ XℓOℓ+ XℓOℓ+ XℓOℓ+ ) − 2 tr O 4 (B.111) = 2d h 4 d 2 − 7 tr O3 tr(O) + 21 − 2d 2 tr O2 2 − d d 2 − 7 tr O4 − 2d tr O2 tr(O) 2 + tr(O) 4 i (d 2 − 1)(d 2 − 9) . (B.112) Therefore, we have the ensemble average over ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 as E " ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # = − 1 4d(d − 1)(d + 1)2(d − 2)(d + 2)2(d − 3)(d + 3)2 h d 4 − 2d 3 − 9d 2 + 8d + 20 tr(O) 4 + 2d −d 4 + d 3 + 10d 2 + d − 29 tr O 2 tr(O) 2 − 4 d 5 − 11d 3 − 6d 2 + 38d + 14 tr O 3 tr(O) + d 6 + 3d 5 − 11d 4 − 41d 3 + 18d 2 + 96d + 60 tr O 2 2 + d d 5 − 13d 3 − 4d 2 + 56d − 4 tr O 4 i . (B.113) Calculation ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 The ensemble average of ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 assuming ℓ1 < ℓ2 can be written as E ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 h ⟨ψ0|U † ℓ − 1 U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Uℓ − 1 |ψ0⟩ ⟨ψ0|U † ℓ − 1 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i Uℓ − 1 |ψ0⟩ × ⟨ψ0|U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 |ψ0⟩ ⟨ψ0|U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 |ψ0⟩ i (B.114) = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0U † ℓ − 1 U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Uℓ − 1 ρ0U † ℓ − 1 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i Uℓ − 1 ρ0 ·U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 . (B.115) The integration over Uℓ − 1 becomes Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0U † ℓ − 1 U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Uℓ − 1 ρ0U † ℓ − 1 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i Uℓ − 1 ρ0 ·U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 = Z dUℓ1ℓ2 dUℓ + 2 1 d 4 + 6d 3 + 11d 2 + 6d h tr(O) tr hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i + tr(O) tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i + tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 + tr hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i + tr hXℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 + tr hXℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 + tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 + tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 + tr hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 + tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 + tr hXℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i trhXℓ2 , Oℓ + 2 i Oℓ + 2 i . (B.116) 178 We will evaluate the integration over Uℓ1ℓ2 , Uℓ + 2 on each item in the following. The first two are already solved before (see Eqs. (B.84) and (B.88)). Z dUℓ1ℓ2 dUℓ + 2 tr(O) tr hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i = tr(O) d 2 h d 2 tr O3 − 3d tr O2 tr(O) + 2 tr(O) 3 i (d 2 − 1)2 (B.117) Z dUℓ1ℓ2 dUℓ + 2 tr(O) tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i = tr(O) d 2 h d 2 tr O3 − 3d tr O2 tr(O) + 2 tr(O) 3 i (d 2 − 1)2 . (B.118) The integral of the third one becomes Z dUℓ1ℓ2 dUℓ + 2 tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 = X i1,i2 Z dUℓ1ℓ2 dUℓ + 2 h 2 tr Pi2,i1Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Pi1,i2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 − tr Pi2,i1Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Pi1,i2Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 − tr Pi2,i1Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2Pi1,i2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 i (B.119) = Z dUℓ + 2 2 d tr O2 ℓ + 2 h Xℓ2 , Oℓ + 2 i2 − tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 d 2 − 1 + tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 − d tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i d 2 − 1 (B.120) = Z dUℓ + 2 2d tr O2 ℓ + 2 h Xℓ2 , Oℓ + 2 i2 − tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i d 2 − 1 (B.121) = 2d d 2 − 1 d d tr O4 + tr O2 2 − 2 tr(O) tr O3 1 − d 2 − 2d tr O2 2 − tr(O) tr O3 d 2 − 1 (B.122) = 2d 2 h 4 tr(O) tr O3 − d tr O4 − 3 tr O2 2 i (d 2 − 1)2 . (B.123) The fourth one is simply Z dUℓ1ℓ2 dUℓ + 2 tr hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i = 0 (B.124) by the cyclic property of trace from expansi The fifth term through integration is Z dUℓ1ℓ2 dUℓ + 2 tr hXℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 = Z dUℓ1ℓ2 dUℓ + 2 tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i2 Oℓ + 2 Uℓ1ℓ2 − tr h Xℓ2 , Oℓ + 2 i2 O 2 ℓ + 2 + tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i O 2 ℓ + 2 Uℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 i (B.125) = Z dUℓ + 2 d tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 + trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 1 − d 2 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 1 − d 2 − trh Xℓ2 , Oℓ + 2 i2 O 2 ℓ + 2 (B.126) = Z dUℓ + 2 d d 2 − 1 tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − d trh Xℓ2 , Oℓ + 2 i2 O 2 ℓ + 2 (B.127) = Z dUℓ + 2 d d 2 − 1 tr(O) h tr Xℓ2Oℓ + 2 Xℓ2O 2 ℓ + 2 − tr O 3 i − d 2 d 2 − 1 h 2 tr Xℓ2O 3 ℓ + 2 Xℓ2Oℓ + 2 − tr Xℓ2O 2 ℓ + 2 Xℓ2O 2 ℓ + 2 − tr O 4 i (B.128) = d d 2 − 1 tr(O) d tr(O) tr O2 − d tr O3 d 2 − 1 + d 2 d 2 − 1 d d 2 − 1 h tr O 2 2 − 2 tr(O) tr O 3 + d tr O 4 i (B.129) = d 2 (d 2 − 1)2 h tr(O) 2 tr O 2 − 3d tr(O) tr O 3 + d tr O 2 2 + d 2 tr O 4 i . ( The sixth item is Z dUℓ1ℓ2 dUℓ + 2 tr hXℓ1 , Oℓ + 1 i hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 = Z dUℓ1ℓ2 dUℓ + 2 h tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 − tr hXℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 + tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i (B.131) = Z dUℓ + 2 d tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − trhXℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 d 2 − 1 + trhXℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 1 − d 2 − trhXℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 1 − d 2 − trhXℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 (B.132) = Z dUℓ + 2 d d 2 − 1 tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − d trhXℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 (B.133) = Z dUℓ + 2 d d 2 − 1 tr(O) h tr Xℓ2Oℓ + 2 Xℓ2O 2 ℓ + 2 − tr O 3 i − 2d 2 d 2 − 1 h tr Xℓ2O 2 ℓ + 2 Xℓ2O 2 ℓ + 2 − tr Xℓ2O 3 ℓ + 2 Xℓ2Oℓ + 2 i (B.134) = d d 2 − 1 tr(O) d tr(O) tr O2 − d tr O3 d 2 − 1 − d 2 d 2 − 1 2d d 2 − 1 h tr O 2 2 − tr(O) tr O 3 i (B.135) = d 2 (d 2 − 1)2 h tr(O) 2 tr O 2 − 2d tr O 2 2 + d tr(O) tr O 3 i . (B.1 The seventh item is Z dUℓ1ℓ2 dUℓ + 2 tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 = Z dUℓ1ℓ2 dUℓ + 2 h tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 − tr hXℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 + tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 i (B.137) = d 2 (d 2 − 1)2 h tr(O) 2 tr O 2 − 2d tr O 2 2 + d tr(O) tr O 3 i . (B.138) The eighth item is Z dUℓ1ℓ2 dUℓ + 2 tr hXℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 = Z dUℓ1ℓ2 dUℓ + 2 tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 O 2 ℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 − tr h Xℓ2 , Oℓ + 2 i2 O 2 ℓ + 2 + tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i (B.139) = Z dUℓ + 2 trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 1 − d 2 + d tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 1 − d 2 − trh Xℓ2 , Oℓ + 2 i2 O 2 ℓ + 2 (B.140) = Z dUℓ + 2 d d 2 − 1 tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − d trh Xℓ2 , Oℓ + 2 i2 O 2 ℓ + 2 (B.141) = d 2 (d 2 − 1)2 h tr(O) 2 tr O 2 − 3d tr(O) tr O 3 + d tr O 2 2 + d 2 tr O 4 i . (B.14 The ninth item is Z dUℓ1ℓ2 dUℓ + 2 tr hXℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 [Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 ]U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 = Z dUℓ1ℓ2 dUℓ + 2 h tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 + tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 O 2 ℓ + 2 Uℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i O 2 ℓ + 2 Uℓ1ℓ2 i (B.143) = Z dUℓ + 2 trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 1 − d 2 + d tr O2 trh Xℓ2 , Oℓ + 2 i2 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 − d tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 1 − d 2 (B.144) = Z dUℓ + 2 d d 2 − 1 tr O 2 trh Xℓ2 , Oℓ + 2 i2 − tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 (B.145) = d d 2 − 1 tr O 2 2d[tr(O) 2 − d tr O2 ] d 2 − 1 − d d 2 − 1 tr(O) d[tr(O) tr O2 − d tr O3 ] d 2 − 1 (B.146) = d 2 (d 2 − 1)2 h tr O 2 tr(O) 2 − 2d tr O 2 2 + d tr(O) tr O 3 i . (B. The tenth item is Z dUℓ1ℓ2 dUℓ + 2 tr [Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 ]U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 = Z dUℓ1ℓ2 dUℓ + 2 tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 O 2 ℓ + 2 Uℓ1ℓ2 + tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i2 Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2 − tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 O 2 ℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i (B.148) = Z dUℓ + 2 d tr O2 trh Xℓ2 , Oℓ + 2 i2 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 + trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 1 − d 2 − d tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 1 − d 2 (B.149) = d 2 (d 2 − 1)2 h tr O 2 tr(O) 2 − 2d tr O 2 2 + d tr(O) tr O 3 i . (B.150) The eleventh (last) item is Z dUℓ1ℓ2 dUℓ + 2 trhXℓ1 , Oℓ + 1 i [Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 ] trhXℓ2 , Oℓ + 2 i Oℓ + 2 = 0 (B.151) by the cyclic property of trace. Combing the above eleven terms, we have ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 E ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 = d 8(d − 1)2(d + 1)3(d + 2)(d + 3) h 2 tr(O) 4 − 3(d − 1) tr(O) 2 tr O 2 − 3(d + 1) tr O 2 2 +(d 2 − d + 4) tr(O) tr O 3 + (d 2 − d) tr O 4 . (B.152) Summary of average dynamical index ζ0 under Haar random ensemble Summarizing from Eq. (B.113) and (B.152), combined with the result from Eq. (B.93) we thus have ϵ0µ0 as ϵ0µ0 = L(L − 1)E ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + LE " ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # − O0µ0 (B.153) = − L 4d(d − 1)(d + 1)2(d − 2)(d + 2)2(d − 3)(d + 3)2 h d 4 − 2d 3 − 9d 2 + 8d + 20 tr(O) 4 + 2d −d 4 + d 3 + 10d 2 + d − 29 tr O 2 tr(O) 2 − 4 d 5 − 11d 3 − 6d 2 + 38d + 14 tr O 3 tr(O) + d 6 + 3d 5 − 11d 4 − 41d 3 + 18d 2 + 96d + 60 tr O 2 2 + d d 5 − 13d 3 − 4d 2 + 56d − 4 tr O 4 i + L(L − 1)d 8(d − 1)2(d + 1)3(d + 2)(d + 3) h −2(d + 3)O0 tr(O) 3 + 3d(d + 3)O0 tr(O) tr O 2 +(d 2 − d + 4) tr(O) tr O 3 + d(d − 1) tr O 4 − d 2 (d + 3)O0 tr O 3 − 3(d − 1) tr O 2 tr(O) 2 −3(d + 1) tr O 2 2 + 2 tr(O) 4 i . (B.154) One can then find ζ0 = ϵ0µ0/K0 2 , (B.155) where ϵ0µ0 and K0 can be found in Eq. (B.15 In the asymptotic limit of L ≫ 1, d ≫ 1, Eq. (B.154) can be reduced to ϵ0µ0 ≃ − L 4d 6 h tr(O) 4 − 2d tr O 2 tr(O) 2 − 4d tr O 3 tr(O) + d 2 tr O 2 2 + d 2 tr O 4 i + L 2 8d 6 h −2dO0 tr(O) 3 + 3d 2O0 tr(O) tr O 2 + d 2 tr(O) tr O 3 + d 2 tr O 4 − d 3O0 tr O 3 −3d tr O 2 tr(O) 2 − 3d tr O 2 2 + 2 tr(O) 4 i . (B.156) We then have the ratio ζ0 as ζ0 = ϵ0µ0 K0 2 (B.157) ≃ − 1 L tr(O) 4 − 2d tr O2 tr(O) 2 − 4d tr O3 tr(O) + d 2 tr O2 2 + d 2 tr O4 d tr(O2) − tr(O) 2 2 + 1 2 d tr(O2) − tr(O) 2 2 h −2dO0 tr(O) 3 + 3d 2O0 tr(O) tr O 2 + d 2 tr(O) tr O 3 + d 2 tr O 4 − d 3O0 tr O 3 −3d tr O 2 tr(O) 2 − 3d tr O 2 2 + 2 tr(O) 4 i . (B.158) Specifically, for traceless observable, the ζ0 can be further simplified as ζ0 = ϵ0µ0 K0 2 (B.159) ≃ − 1 L tr O2 2 + tr O4 tr(O2) 2 + 1 2d tr(O2) 2 h d tr O 4 − d 2O0 tr O 3 − 3 tr O 2 2 i . (B.160) B.6.4 Fluctuations of error and QNTK under Haar random ensemble At the end of discussion about Haar ensemble result, we discuss the standard deviation of total error SD [ϵ0] and QNTK SD [K0]. To calculate the standard deviation, we first fo B.6.4.1 Relative fluctuation of total error under Haar random ensemble The variance of total error is Var [ϵ0] = ϵ 2 0 − ϵ0 2 (B.161) = E h (⟨O⟩ − O0) 2 i − (E [⟨O⟩] − O0) 2 (B.162) = E h ⟨O⟩ 2 i − E [⟨O⟩] 2 (B.163) = tr(O) 2 + tr O2 d 2 + d − tr(O) 2 d 2 (B.164) = d tr O2 − tr(O) 2 d 2(d + 1) . (B.165) Then the standard deviation of error is SD [ϵ0] = p Var [ϵ0] = s d tr(O2) − tr(O) 2 d 2(d + 1) , (B.166) and the relative fluctuation can be obtained directly as SD [ϵ0] ϵ0 = 1 tr(O)/d − O0 s d tr(O2) − tr(O) 2 d 2(d + 1) . (B.167) Specifically, for traceless observable we have SD [ϵ0] = p tr(O2)/d(d + 1) and SD [ϵ0] /ϵ0 = − p tr(O2)/d(d + 1)/O0. 1 B.6.4.2 Relative fluctuation of QNTK under Haar random ensemble The variance of K is Var[K0] = K2 − K 2 , which can be written as Var[K0] = X ℓ1,ℓ2 E " ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 # − K 2 (B.168) = 2 X ℓ1<ℓ2 E " ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 # + X ℓ E " ∂ϵ ∂θℓ 4 # − K 2 (B.169) = L(L − 1)E " ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 # + LE " ∂ϵ ∂θℓ 4 # − K 2 . (B.170) Under the random initialization of circuit parameters, we can calculate the variance via Haar integral as following. Calculation of E ∂ϵ ∂θℓ 4 with Haar random ensemble We first evaluate E ∂ϵ ∂θℓ 4 , which can be expanded as E " ∂ϵ ∂θℓ 4 # = 1 16 E ⟨ψ0|U † ℓ− [Xℓ , Oℓ+ ]Uℓ− |ψ0⟩ 4 (B.171) = 1 16 Z dUℓ− dUℓ+ tr ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− (B.172) = Z dUℓ+ 3 tr [Xℓ , Oℓ+ ] 2 2 + 6 tr [Xℓ , Oℓ+ ] 4 16d(d + 2)(d 2 + 4d + 3) , (B.173) where we evaluate the integral over Uℓ− by considering Uℓ− forms Haar random (4-design) ensemble. From the expansion of tr [Xℓ , Oℓ+ ] 4 , we have Z dUℓ+ tr [Xℓ , Oℓ+ ] 4 = Z dUℓ+ h 2 tr (XℓOℓ+ XℓOℓ+ XℓOℓ+ XℓOℓ+ ) − 8 tr XℓO 3 ℓ+ XℓOℓ+ + 4 tr XℓO 2 ℓ+ XℓU † ℓ+ O 2 ℓ+ i + 2 tr O 4 . (B.174 The expansion follows from Supplementary notes of [25]. With the help of RTNI [323], we have the integrals as Z dUℓ+ tr (XℓOℓ+ XℓOℓ+ XℓOℓ+ XℓOℓ+ ) = 2d 2 tr O2 tr(O) 2 + d 2 + 9 tr O4 − d tr(O) 4 − 8d tr O3 tr(O) − 3d tr O2 2 d 4 − 10d 2 + 9 (B.175) Z dUℓ+ tr XℓO 3 ℓ+ XℓOℓ+ = d tr(O) tr O3 − tr O4 d 2 − 1 (B.176) Z dUℓ+ tr XℓO 2 ℓ+ XℓO 2 ℓ+ = d tr O2 2 − tr O4 d 2 − 1 , (B.177) where Uℓ+ forms as least 4-design. The ensemble average of E tr [Xℓ , Oℓ+ ] 4 is thus E tr [Xℓ , Oℓ+ ] 4 = 2d h 28 − 4d 2 tr O3 tr(O) + 2d 2 − 21 tr O2 2 + d 3 − 7d tr O4 + 2d tr O2 tr(O) 2 − tr(O) 4 i d 4 − 10d 2 + 9 . (B.178) On the other hand, tr [Xℓ , Oℓ+ ] 2 2 can be expanded as Z dUℓ+ tr [Xℓ , Oℓ+ ] 2 2 = X i1,i2 Z dUℓ+ tr Pi2,i1 [Xℓ , Oℓ+ ] 2Pi1,i2 [Xℓ , Oℓ+ ] 2 (B.179) = X i1,i2 Z dUℓ+ [4 tr (Pi2,i1XℓOℓ+ XℓOℓ+ Pi1,i2XℓOℓ+ XℓOℓ+ ) −8 tr Pi2,i1XℓOℓ+ XℓOℓ+ Pi1,i2O 2 ℓ+ + 4 tr Pi2,i1O 2 ℓ+ Pi1,i2O where we introduce Pi1,i2 = |i1⟩⟨i2| such that it can be evaluated by RTNI [323]. Similar to above calculation, we then can find X i1,i2 Z dUℓ+ tr (Pi2,i1XℓOℓ+ XℓOℓ+ Pi1,i2XℓOℓ+ XℓOℓ+ ) = d 2 − 6 tr(O) 4 + 2d 2 − 9 tr O2 2 − 6d tr O2 tr(O) 2 − 6d tr O4 + 24 tr O3 tr(O) d 4 − 10d 2 + 9 (B.181) X i1,i2 Z dUℓ+ tr Pi2,i1XℓOℓ+ XℓOℓ+ Pi1,i2O 2 ℓ+ = tr O2 d tr(O) 2 − tr O2 d 2 − 1 (B.182) X i1,i2 Z dUℓ+ tr Pi2,i1O 2 ℓ+ Pi1,i2O 2 ℓ+ = tr O 2 2 . (B.183) Combining the above three results, the average over tr [Xℓ , Oℓ+ ] 2 2 is E h tr [Xℓ , Oℓ+ ] 2 2 i = 4 h d 2 − 6 tr(O) 4 − 2 d 3 − 6d tr O2 tr(O) 2 + d 4 − 6d 2 − 18 tr O2 2 − 6d tr O4 + 24 tr O3 tr(O) i d 4 − 10d 2 + 9 . (B.184) Therefore, the ensemble average of ∂ϵ ∂θl 4 is E " ∂ϵ ∂θl 4 # = Z dUℓ+ 3 tr [Xℓ , Oℓ+ ] 2 2 + 6 tr [Xℓ , Oℓ+ ] 4 16d(d + 2)(d 2 + 4d + 3) (B.185) = 3 h (d 2 + 3d + 3) tr O2 2 + d(d + 1) tr O4 + tr(O) 4 − 2d tr O2 tr(O) 2 − 4(d + 1) tr O3 tr(O) i 4(d − 1)d(d + 1)2(d + 3)2 (B.186) ≃ 3 tr O2 2 4d 4 + 3 tr O4 4d 4 + 3 tr(O) 4 4d 6 − 3 tr O2 tr(O) 2 2d 5 − 3 tr O3 tr(O) d 5 , (B.187) where we still approximate by d ≫ 1. Calculation of E ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 For the evaluation of E ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 , there are four different unitaries Uℓ − 1 , Uℓ − 2 , Uℓ + 1 , Uℓ + 2 assuming ℓ1 < ℓ2. Note that Uℓ − 2 = Uℓ1ℓ2Uℓ − 1 and Uℓ + 1 = Uℓ + 2 Uℓ1ℓ2 are not fully independent, so the Haar average has to be performed in with respect to Uℓ− , Uℓ1ℓ2 , Uℓ + 2 individually only. E " ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 # = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 ⟨ψ0|U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 |ψ0⟩ 2 ⟨ψ0|U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 |ψ0⟩ 2 (B.188) = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 ρ0U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 ·ρ0U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 (B.189) = 1 16d(d + 2) (d 2 + 4d + 3) × Z dUℓ1ℓ2 dUℓ + 2 tr h Xℓ1 , Oℓ + 1 i2 tr h Xℓ1 , Oℓ + 2 i2 + 2 tr U † ℓ1ℓ2 h Xℓ1 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i2 + 4 tr h Xℓ1 , Oℓ + 1 i2 U † ℓ1ℓ2 h Xℓ1 , Oℓ + 2 i2 Uℓ1ℓ2 + tr U † ℓ1ℓ2 h Xℓ1 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ1 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 ii . (B.190) Next we evaluate the average over Uℓ1ℓ2 and Uℓ + 2 of each term separately. The average of first term, trh Xℓ1 , Oℓ + 1 i2 trh Xℓ2 , Oℓ + 2 i2 , becomes Z dUℓ1ℓ2 dUℓ + 2 trh Xℓ1 , Oℓ + 1 i2 trh Xℓ2 , Oℓ + 2 i2 (B.191) = Z dUℓ + 2 2 trh Xℓ2 , Oℓ + 2 i2 " d tr(O) 2 − tr O2 d 2 − 1 − tr O 2 # (B.192) = 4 " d tr(O) 2 − tr O2 d 2 − 1 − tr O 2 # "d tr(O) 2 − tr O2 d 2 − 1 − tr O 2 # (B.193) = 4d 2 h tr(O) 2 − d tr O2 i2 (d 2 − 1)2 . (B.19 The average of second term, tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i2 is Z dUℓ1ℓ2 dUℓ + 2 tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i2 = X i1,i2 Z dUℓ1ℓ2 dUℓ + 2 tr Pi2,i1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i Pi1,i2U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i (B.195) = X i1,i2 Z dUℓ1ℓ2 dUℓ + 2 tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 2 + tr hXℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 2 −2 tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 tr hXℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 i (B.196) = Z dUℓ + 2 2 d tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i − tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 d 2 − 1 + tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 − d trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 (B.197) = Z dUℓ + 2 2d tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 (B.198) = 2d d 2 − 1 Z dUℓ + 2 h 2 tr O 2 ℓ + 2 Xℓ2O 2 ℓ + 2 Xℓ2 − 2 tr Oℓ + 2 Xℓ2O 3 ℓ + 2 Xℓ2 − 2 tr Xℓ2Oℓ + 2 Xℓ2O 3 ℓ + 2 + tr Xℓ2O 2 ℓ + 2 Xℓ2O 2 ℓ + 2 + tr O 4 i (B.199) = 2d d 2 − 1 Z dUℓ + 2 h 3 tr O 2 ℓ + 2 Xℓ2O 2 ℓ + 2 Xℓ2 − 4 tr Oℓ + 2 Xℓ2O 3 ℓ + 2 Xℓ2 + tr O 4 i (B.200) = 2d 2 h d tr O4 + 3 tr O2 2 − 4 tr(O) tr O3 i (d 2 − 1)2 . (B.201) The average of third term, trh Xℓ1 , Oℓ + 1 i2 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2 is Z dUℓ1ℓ2 dUℓ + 2 trh Xℓ1 , Oℓ + 1 i2 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2 = Z dUℓ1ℓ2 dUℓ + 2 tr Xℓ1Oℓ + 1 Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2 + tr Xℓ1Oℓ + 1 Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i2 Oℓ + 2 Uℓ1ℓ2 − tr Xℓ1O 2 ℓ + 1 Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i2 Uℓ1ℓ2 − tr O 2 ℓ + 2 h Xℓ2 , Oℓ + 2 i2 (B.202) = Z dUℓ + 2 2 d tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 − d trh Xℓ2 , Oℓ + 2 i2 tr O2 − trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 d 2 − 1 − tr O 2 ℓ + 2 h Xℓ2 , Oℓ + 2 i2 (B.203) = Z dUℓ + 2 d −d trh Xℓ2 , Oℓ + 2 i2 O2 ℓ + 2 + 2 tr(O) trh Xℓ2 , Oℓ + 2 i2 Oℓ + 2 − trh Xℓ2 , Oℓ + 2 i2 tr O2 d 2 − 1 (B.204) = d d 2 − 1 d d h d tr O4 + tr O2 2 − 2 tr(O) tr O3 i d 2 − 1 + 2 tr(O) d[tr(O) tr O2 − d tr O3 ] d 2 − 1 − tr O 2 2d h tr(O) 2 − d tr O2 i d 2 − 1 (B.205) = d 3 h d tr O4 + 3 tr O2 2 − 4 tr(O) tr O3 i (d 2 − 1)2 The average overUℓ1ℓ2 in the last term, tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i is Z dUℓ1ℓ2 dUℓ + 2 tr U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 h Xℓ1 , Oℓ + 1 i = Z dUℓ1ℓ2 dUℓ + 2 h tr Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 Oℓ + 2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 + tr Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2 −2 tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Xℓ1U † ℓ1ℓ2 i (B.207) = Z dUℓ + 2 2 d trhXℓ2 , Oℓ + 2 i Oℓ + 2 2 − trhXℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 d 2 − 1 + tr Oℓ + 2 h Xℓ2 , Oℓ + 2 i Oℓ + 2 h Xℓ2 , Oℓ + 2 i d 2 − 1 (B.208) = Z dUℓ + 2 2d trhXℓ2 , Oℓ + 2 i Oℓ + 2 2 d 2 − 1 (B.209) = 0, (B.210) where the last line can be found by the cyclic property of trace directly. Therefore, we can conclude on E ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 is E " ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 # = 1 16d(d + 2) (d 2 + 4d + 3) 4d 2 h tr(O) 2 − d tr O2 i2 (d 2 − 1)2 + 2 2d 2 h d tr O4 + 3 tr O2 2 − 4 tr(O) tr O3 i (d 2 − 1)2 +4 d 3 h d tr O4 + 3 tr O2 2 − 4 tr(O) tr O3 i (d 2 − 1)2 + 0 (B.211) = d h (d 2 + 3d + 3) tr O2 2 + d(d + 1) tr O4 + tr(O) 4 − 2d tr O2 tr(O) 2 − 4(d + 1) tr O3 tr(O) i 4(d − 1)2(d + 1)3(d + 2)(d + 3) (B.212) ≃ tr O2 2 4d 4 + tr O4 4d 4 + tr(O) 4 4d 6 − tr O2 tr(O) 2 2d 5 − tr O3 tr(O) d 5 Summary of relative fluctuation SD[K0]/K0 under random initialization To summarize from Eq. (B.187) and (B.213), the ensemble average of Var[K0] is Var[K0] = L(L − 1)E " ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 # + LE " ∂ϵ ∂θℓ 4 # − K0 2 (B.214) = L(L − 1) d h (d 2 + 3d + 3) tr O2 2 + d(d + 1) tr O4 + tr(O) 4 − 2d tr O2 tr(O) 2 − 4(d + 1) tr O3 tr(O) i 4(d − 1)2(d + 1)3(d + 2)(d + 3) + L 3 h (d 2 + 3d + 3) tr O2 2 + d(d + 1) tr O4 + tr(O) 4 − 2d tr O2 tr(O) 2 − 4(d + 1) tr O3 tr(O) i 4(d − 1)d(d + 1)2(d + 3)2 − L d tr O2 − tr(O) 2 2(d − 1)(d + 1)2 !2 . (B.215) The relative fluctuation is then SD[K0]/K0 = p Var[K0]/K0, (B.216) where Var[K0] and K0 can be found in Eq. (B.215) and (B.66). In the asymptotic limit of L, d ≫ 1, we have Var[K0] ≃ L 2 + 3L 4d 6 h d 2 tr O 2 2 + d 2 tr O 4 + tr(O) 4 − 2d tr O 2 tr(O) 2 − 4d tr O 3 tr(O) i − L 2 d tr O2 − tr(O) 2 2 4d 6 . (B.217) Thus we have the standard deviation of QNTK as SD[K0] ≃ 1 2d 3 (L 2 + 3L) h d 2 tr O 2 2 + d 2 tr O 4 + tr(O) 4 − 2d tr O 2 tr(O) 2 − 4d tr O 3 tr(O) i −L 2 d tr O 2 − tr(O) 2 2 1 and the relative fluctuation is SD[K0] K0 ≃ (L 2 + 3L) h d 2 tr O 2 2 + d 2 tr O 4 + tr(O) 4 − 2d tr O 2 tr(O) 2 − 4d tr O 3 tr(O) i − L 2 d tr O 2 − tr(O) 2 2 1/2 × 1 L d tr(O2) − tr(O) 2 . (B.219) For traceless observable O, the standard deviation and relative fluctuation are reduced to SD[K0] ≃ 1 2d 2 q L2 tr(O4) + 3Ltr(O2) 2 , (B.220) SD[K0] K0 ≃ 1 √ L s L tr(O4) tr(O2) 2 + 3. (B.221) Remark A further simplification is considered in [25] by treating the four unitaries Uℓ − 1 , Uℓ1,L, U1,ℓ2−1, Uℓ + 2 that appears in E ∂ϵ ∂θℓ1 2 ∂ϵ ∂θℓ2 2 are independent sampled from Haar random, which leads to the scaling of SD[K0] ∼ √ L only. B.7 Results with restricted Haar ensemble In this section, we calculate average K, λ and ζ under the restricted Haar ensemble for state preparation. To prepare a target state O = |Φ⟩⟨Φ|, we can adopt the cost function Eq. (3.1) with a target value O0 ≥ 0. As fidelity is bounded by unity, we have R = 1 − O0 for O0 ≥ 1 and R = 0 for O0 ≤ 1. To capture late-time dynamics, we consider the scenario where the input state |ψ0⟩ is already close to the target state with fidelity F0 = | ⟨Φ|ψ0⟩ |2 = min[1, O0] − κ = O0 − R − κ, (B.222) where κ ∼ o(1). The unitary in the restricted Haar ensemble is defined as URH = 1 0 0 V , (B.223) where V is a unitary of dimension d − 1. As the overall circuit unitary URH = Uℓ− Uℓ+ = Uℓ− Uℓ1ℓ2Uℓ + 2 (see definitions around Eqs. (B.46), (B.49)), the form of Eq. (B.223) will lead to constraint on the unitaries Uℓ− , Uℓ+ , Uℓ− , Uℓ1ℓ2 , Uℓ + 2 utilized in evaluation. It turns out that the specific distribution of V for the overall unitary ensemble {URH} does not affect the ensemble averages and thus we do not specify the distribution of V . To implement the constraint, we can assume that the unitaries of a part of circuit including Uℓ− , Uℓ − 1 , Uℓ1ℓ2 ∼ UHaar(d) follow independent Haar random distribution; while Uℓ+ , Uℓ + 2 are directly determined by Uℓ+ = URHU † ℓ− and Uℓ + 2 = URHU † ℓ− U † ℓ1ℓ2 due to the constraint. 198 B.7.1 Average QNTK under restricted Haar ensemble We first evaluate average of QNTK K∞ under restricted Haar ensemble. Recall that the QNTK is defined as K = P ℓ E ∂ϵ ∂θℓ 2 , thus we have E " ∂ϵ ∂θℓ 2 # = − 1 4 Z dUℓ− dUℓ+ tr ρ0U † ℓ− [Xℓ, Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ, Oℓ+ ]Uℓ− (B.224) = − 1 4 Z dUℓ− dURH h tr ρ0U † ℓ− XℓUℓ− ORHρ0U † ℓ− XℓUℓ− ORH + tr ρ0ORHU † ℓ− XℓUℓ− ρ0ORHU † ℓ− XℓUℓ− − tr ρ0U † ℓ− XℓUℓ− ORHρ0ORHU † ℓ− XℓUℓ− − tr ρ0ORHU † ℓ− XℓUℓ− ρ0U † ℓ− XℓUℓ− ORHi (B.225) = − 2 4 Z dURH " d tr(ρ0ORH) 2 − tr(ρ0ORHρ0ORH) d 2 − 1 − d tr(ORHρ0ORH) − tr(ρ0ORHρ0ORH) d 2 − 1 # (B.226) = Z dURH 2d h tr(ρ0ORH) 2 − tr(ORHρ0ORH) i 4(d 2 − 1) (B.227) = dF0(1 − F0) 2(d 2 − 1) , (B.228) where ORH = U † RHOURH is defined for simplicity. In the last line, we utilize the fact that tr(ρ0ORH) 2 = F 2 0 and tr(ORHρ0ORH) = F0. Thus the QNTK is K∞ = LE " ∂ϵ ∂θℓ 2 # = LdF0(1 − F0) 2(d 2 − 1) (B.229) = Ld 2(d 2 − 1) (O0 − R − κ) (1 − O0 + R + κ) ≃ L 2d (O0 − R) (1 − O0 + R) + O(κ), (B.230) where in the last equation we utilize the relation between F0 and O0, R in Eq. (B.222). When O0 > 1 with R = O0 − 1, we directly have K∞ = 0; when O0 = 1 with R = 0, we also have K∞ = 0; however when O0 < 1 with R = 0, we have a finite nonzero QNTK as K∞ ∝ O0(1 − O0) and specifically, K∞ ∝ O0 in the limit of O0 close to unity. 199 B.7.2 Average relative dQNTK under restricted Haar ensemble In this section, we evaluate the factor λ∞ = µRH/K∞ under restricted Haar ensemble. As K∞ is already calculated above, we focus on dQNTK µRH in the following. Recall that µ = L(L−1)E h ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 i + LE ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 , we evaluate the two ensemble average separately. E ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 under restricted Haar ensemble We first consider E ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 . E " ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # = 1 16 Z dUℓ− dUℓ+ tr ρ0U † ℓ− Oℓ+ Uℓ− ρ0U † ℓ− [Xℓ, [Xℓ, Oℓ+ ]]Uℓ− ρ0U † ℓ− [Xℓ, Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ, Oℓ+ ]Uℓ− (B.231) = 2 16 Z dUℓ− dURH h tr ORHρ0ORHρ0U † ℓ+ XℓUℓ+ ORHρ0U † ℓ+ XℓUℓ+ + tr ρ0ORHρ0ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ − tr ρ0ORHρ0U † ℓ+ XℓUℓ+ ORHρ0ORHU † ℓ+ XℓUℓ+ − tr ORHρ0ORHρ0ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ + tr ρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ ORHρ0ORHU † ℓ+ XℓUℓ+ + tr ORHρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ − tr ORHρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ ORHρ0U † ℓ+ XℓUℓ+ − tr ρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ i . (B.232) The first two are Z dUℓ− dURH h tr ORHρ0ORHρ0U † ℓ+ XℓUℓ+ ORHρ0U † ℓ+ XℓUℓ+ + tr ρ0ORHρ0ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ i = Z dURH 2 d 2 − 1 [d tr(ρ0ORH) tr(ρ0ORHρ0ORH) − tr(ρ0ORHρ0ORHρ0ORH)] = 2(d − 1)F 3 0 d 2 − 1 . (B.233) 200 The third and third terms are Z dUℓ− dURH h tr ρ0ORHρ0U † ℓ+ XℓUℓ+ ORHρ0ORHU † ℓ+ XℓUℓ+ + tr ORHρ0ORHρ0ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ i = Z dURH 1 d 2 − 1 h d tr(ρ0ORH) 2 − tr(ρ0ORHρ0ORHρ0ORH) + d tr(ρ0ORHρ0ORH) − tr(ρ0ORHρ0ORHρ0ORH) i (B.234) = 2(dF2 0 − F 3 0 ) d 2 − 1 . (B.235) The fifth term is Z dUℓ− dURH tr ρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ ORHρ0ORHU † ℓ+ XℓUℓ+ = Z dURH 1 (d 2 − 9)(d 2 − 1) d 2 − 3 tr(ρ0ORHρ0ORHρ0ORH) + 2(d − 3) tr(ρ0ORH)(− tr(ρ0ORH) + d + 2) −2 tr(ρ0ORHρ0ORH)(d tr(ρ0ORH) + 3d − 9)) (B.236) = F0 d F 2 0 + 2 + F 2 0 − 8F0 + 4 d 3 + 3d 2 − d − 3 . (B.237) The sixth term can be found to be equal to the fifth one above. The seventh term is Z dUℓ− dURH tr ORHρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ ORHρ0U † ℓ+ XℓUℓ+ = 1 (d 2 − 9)(d 2 − 1) h tr(ρ0ORH) 2 (3 tr(ρ0ORH) + d(2d − 5) − 6) + d tr(ρ0ORHρ0ORH)(−3 tr(ρ0ORH) + d − 4) +6(tr(ρ0ORHρ0ORH) + tr(ρ0ORHρ0ORHρ0ORH))] (B.238) = 3F 2 0 (d − F0) d 3 + 3d 2 − d − 3 . (B.239) The eighth term is also equal to the seventh one above. Conclude from the above calculation, we have E " ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # = 1 16 Z dUℓ− dUℓ+ tr ρ0U † ℓ− Oℓ+ Uℓ− ρ0U † ℓ− [Xℓ , [Xℓ , Oℓ+ ]]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− (B.240) = 2 8 " (d − 1)F 3 0 d 2 − 1 − (dF2 0 − F 3 0 ) d 2 − 1 + F0 d F 2 0 + 2 + F 2 0 − 8F0 + 4 d 3 + 3d 2 − d − 3 − 3F 2 0 (d − F0) d 3 + 3d 2 − d − 3 # (B.241) = (d + 2)(F0 − 1)F0 [(d + 2)F0 − 2] 4(d − 1)(d + 1)(d + 3) . (B.242) E h ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 i under restricted Haar ensemble 20 The other part E h ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 i is E ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0U † ℓ − 1 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i Uℓ − 1 ρ0U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dURH h tr ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ORHρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ORHρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Xℓ1Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Xℓ1Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 i . (B.243) 203 The integral over Uℓ − 1 , Uℓ1ℓ2 of the first term is Z dUℓ − 1 dUℓ1ℓ2 dURH tr ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 = Z dURH 1 (d 2 − 1)2 (d 2 + 1) tr(ρ0ORHρ0ORHρ0ORH) − 2d tr(ρ0ORH) tr(ρ0ORHρ0ORH) (B.244) = F 3 0 (d + 1)2 . (B.245) The second term is Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 = F 2 0 (F0 + d 2 − 2d) (d 2 − 1)2 . (B.246) The third term is Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 = F 2 0 (d − F0) (d + 1)2(d − 1). (B.247) The fourth one equals the third one above. The fifth one is Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 = F0(F0 − d) 2 (d 2 − 1)2 . (B.248) The sixth term equals the first; the seventh and eighth equals the third and fourth; the ninth equals the fifth; the tenth equals the sixth; the eleventh and twelfth equals the seventh and eighth; the thirteenth equals the second; the fourteenth equals the first; the fifteenth and sixteenth equals the third and fourth. 204 Conclude from the above calculation, we have E ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0U † ℓ − 1 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i Uℓ − 1 ρ0U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 = 1 16 4F 3 0 (d + 1)2 + 2F 2 0 (F0 + d 2 − 2d) (d 2 − 1)2 − 8F 2 0 (d − F0) (d + 1)2(d − 1) + 2F0(F0 − d) 2 (d 2 − 1)2 (B.249) = d 2F0(F0 − 1)(2F0 − 1) 8(d 2 − 1)2 . (B.250) Summary of average relative dQNTK λ with restricted Haar ensemble Combining Eq. (B.242) and (B.250), dQNTK µ under restricted Haar ensemble is µRH = L(L − 1)E ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + LE " ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # = L(L − 1)d 2F0(F0 − 1)(2F0 − 1) 8(d 2 − 1)2 + L (d + 2)(F0 − 1)F0 [(d + 2)F0 − 2] 4(d − 1)(d + 1)(d + 3) . (B.251) Combining with the average QNTK calculated above, we have λ∞ = µRH K∞ (B.252) = (L − 1)d(1 − 2F0) 4(d 2 − 1) − (d + 2) [(d + 2)F0 − 2] 2d(d + 3) (B.253) = (L − 1)d 4(d 2 − 1)(1 − 2O0 + 2R + 2κ) − (d + 2) 2d(d + 3) [(d + 2)(O0 − R − κ) − 2] (B.254) ≃ L 4d (1 − 2O0 + 2R) − O0 − R 2 + O(κ), (B.255) which is a constant ∝ L/d regardless of O0 ⋚ 1. 205 B.7.3 Average dynamical index under restricted Haar ensemble Recall the definition of average ζ = ϵµ/K 2 , we evaluate ϵµ in the following. ϵµ = E [⟨O⟩ µ] − O0µ = L(L − 1)E ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + LE " ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # − O0µ. (B.256) E ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 under restricted Haar ensemble We first consider E ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 . E " ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # = 1 16 Z dUℓ− dUℓ+ tr ρ0U † ℓ− Oℓ+ Uℓ− ρ0U † ℓ− [Xℓ , [Xℓ , Oℓ+ ]]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− ρ0U † ℓ− [Xℓ , Oℓ+ ]Uℓ− = 2 16 Z dUℓ− dURH h tr ORHρ0ORHρ0ORHρ0U † ℓ+ XℓUℓ+ ORHρ0U † ℓ+ XℓUℓ+ + tr ρ0ORHρ0ORHρ0ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ − tr ρ0ORHρ0ORHρ0U † ℓ+ XℓUℓ+ ORHρ0ORHU † ℓ+ XℓUℓ+ − tr ORHρ0ORHρ0ORHρ0ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ + tr ρ0ORHρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ ORHρ0ORHU † ℓ+ XℓUℓ+ + tr ORHρ0ORHρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ − tr ORHρ0ORHρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ ORHρ0U † ℓ+ XℓUℓ+ − tr ρ0ORHρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ i . (B.257) 206 The first and second terms are Z dUℓ− dURH h tr ORHρ0ORHρ0ORHρ0U † ℓ+ XℓUℓ+ ORHρ0U † ℓ+ XℓUℓ+ + tr ρ0ORHρ0ORHρ0ORHU † ℓ+ XℓUℓ+ ρ0ORHU † ℓ+ XℓUℓ+ i = Z dURH 2 d 2 − 1 [d tr(ρ0ORH) tr(ρ0ORHρ0ORHρ0ORH) − tr(ρ0ORHρ0ORHρ0ORHρ0ORH)] = 2(d − 1)F 4 0 d 2 − 1 . (B.258) The third and fourth terms are Z dUℓ− dURH h tr ρ0ORHρ0ORHρ0U † ℓ+ XℓUℓ+ ORHρ0ORHU † ℓ+ XℓUℓ+ + tr ORHρ0ORHρ0ORHρ0ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ i = Z dURH 1 d 2 − 1 [d tr(ρ0ORH) tr(ρ0ORHρ0ORH) − tr(ρ0ORHρ0ORHρ0ORHρ0ORH) + d tr(ρ0ORHρ0ORHρ0ORH) (B.259) − tr(ρ0ORHρ0ORHρ0ORHρ0ORH)] (B.260) = 2(dF3 0 − F 4 0 ) d 2 − 1 . (B.261) The fifth term is Z dUℓ− dURH tr ρ0ORHρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ ORHρ0ORHU † ℓ+ XℓUℓ+ = Z dURH 1 (d 2 − 9)(d 2 − 1) h d 2 − 3 tr(ρ0ORHρ0ORHρ0ORHρ0ORH) + 2(d − 3)(d + 2) tr(ρ0ORH) 2 + 3 tr(ρ0ORH) 3 − tr(ρ0ORH) [(4d − 6) tr(ρ0ORHρ0ORH) + d tr(ρ0ORHρ0ORHρ0ORH)] −d tr(ρ0ORHρ0ORH) 2 + (15 − 4d) tr(ρ0ORHρ0ORHρ0ORH) i (B.262) = F 2 0 d F 2 0 + 2 + F 2 0 − 8F0 + 4 d 3 + 3d 2 − d − 3 . (B.263) 2 The sixth term can be found to be equal to the fifth one above. The seventh term is Z dUℓ− dURH tr ORHρ0ORHρ0U † ℓ+ XℓUℓ+ ORHU † ℓ+ XℓUℓ+ ρ0U † ℓ+ XℓUℓ+ ORHρ0U † ℓ+ XℓUℓ+ = 1 (d 2 − 9)(d 2 − 1) [tr(ρ0ORHρ0ORH) (tr(ρ0ORH) [3 tr(ρ0ORH) + d(2d − 5) − 6] − d tr(ρ0ORHρ0ORH)) +d tr(ρ0ORHρ0ORHρ0ORH)(−2 tr(ρ0ORH) + d − 4) + 6(tr(ρ0ORHρ0ORHρ0ORH) + tr(ρ0ORHρ0ORHρ0ORHρ0ORH))] (B.264) = 3F 3 0 (d − F0) d 3 + 3d 2 − d − 3 . (B.265) The eighth term is also equal to the seventh one above. Summarizing from above, we have E " ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # = 2 8 " (d − 1)F 4 0 d 2 − 1 − (dF3 0 − F 4 0 ) d 2 − 1 + F 2 0 d F 2 0 + 2 + F 2 0 − 8F0 + 4 d 3 + 3d 2 − d − 3 − 3F 3 0 (d − F0) d 3 + 3d 2 − d − 3 # (B.266) = (d + 2)(F0 − 1)F 2 0 [(d + 2)F0 − 2] 4(d − 1)(d + 1)(d + 3) . (B.267) E h ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 i under restricted Haar ensemble 20 The other item E h ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 i can be expanded as E ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0U † ℓ − 1 U † ℓ1ℓ2 Oℓ + 2 Uℓ1ℓ2Uℓ − 1 ρ0U † ℓ − 1 h Xℓ1 , U† ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2 i Uℓ − 1 ρ0U † ℓ − 1 h Xℓ1 , Oℓ + 1 i Uℓ − 1 ·ρ0U † ℓ − 1 U † ℓ1ℓ2 h Xℓ2 , Oℓ + 2 i Uℓ1ℓ2Uℓ − 1 (B.268) = 1 16 Z dUℓ − 1 dUℓ1ℓ2 dURH h tr ORHρ0ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ρ0ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ρ0ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ORHρ0ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ORHρ0ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ρ0ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ORHρ0ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ρ0ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ρ0ORHρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ORHρ0ORHρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ORHρ0ORHρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ρ0ORHρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ORHρ0ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Xℓ1Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ρ0ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 − tr ρ0ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Xℓ1Uℓ − 1 ρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 + tr ORHρ0ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 i . (B.269) 209 The integral over Uℓ − 1 , Uℓ1ℓ2 of first term is Z dUℓ − 1 dUℓ1ℓ2 dURH tr ORHρ0ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 = Z dURH 1 (d 2 − 1)2 h (d 2 + 1) tr(ρ0ORHρ0ORHρ0ORHρ0ORH) − d tr(ρ0ORH) tr(ρ0ORHρ0ORHρ0ORH) − d tr(ρ0ORHρ0ORH) 2 i (B.270) = F 4 0 (d + 1)2 . (B.271) The second term is Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 = 1 (d 2 − 1)2 tr(ρ0ORHρ0ORHρ0ORHρ0ORH) + d 2 tr(ρ0ORHρ0ORHρ0ORH) − 2d tr(ρ0ORH) tr(ρ0ORHρ0ORH) (B.272) = F 3 0 (F0 + d 2 − 2d) (d 2 − 1)2 . (B.273) The third term is Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ρ0ORHρ0U † ℓ − 1 Xℓ1U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHρ0ORHU † ℓ − 1 U † ℓ1ℓ2 Xℓ2Uℓ1ℓ2Uℓ − 1 = 1 (d 2 − 1)2 tr(ρ0ORHρ0ORHρ0ORHρ0ORH) + d 2 tr(ρ0ORHρ0ORHρ0ORH) −d tr(ρ0ORH) (tr(ρ0ORHρ0ORH) + tr(ρ0ORHρ0ORHρ0ORH))] (B.274) = F 3 0 (d − F0) (d + 1)2(d − 1). (B.275) The fourth one equals the third one above. 210 The fifth one is Z dUℓ − 1 dUℓ1ℓ2 dUℓ + 2 tr ORHρ0ORHρ0U † ℓ − 1 Xℓ1Uℓ − 1 ORHU † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 ρ0ORHU † ℓ − 1 Xℓ1Uℓ − 1 ρ0U † ℓ − 1 U † ℓ1ℓ2Xℓ2Uℓ1ℓ2Uℓ − 1 = 1 (d 2 − 1)2 tr(ρ0ORHρ0ORHρ0ORHρ0ORH) + d 2 tr(ρ0ORHρ0ORH) − 2d tr(ρ0ORH) tr(ρ0ORHρ0ORH) (B.276) = F 2 0 (F0 − d) 2 (d 2 − 1)2 . (B.277) The sixth term equals the first; the seventh and eighth equals the third and fourth; the ninth equals the fifth; the tenth equals the sixth; the eleventh and twelfth equals the seventh and eighth; the thirteenth equals the second; the fourteenth equals the first; the fifteenth and sixteenth equals the third and fourth. Concluding from the above sixteenth terms, we have E ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 = 1 16 4F 4 0 (d + 1)2 + 2F 3 0 (F0 + d 2 − 2d) (d 2 − 1)2 − 8F 3 0 (d − F0) (d + 1)2(d − 1) + 2F 2 0 (F0 − d) 2 (d 2 − 1)2 (B.278) = d 2F 2 0 (F0 − 1)(2F0 − 1) 8(d 2 − 1)2 . (B.279) Summarizing from Eq. (B.267) and (B.279), E [⟨O⟩ µ] becomes E [⟨O⟩ µRH] = L(L − 1)E ⟨O⟩ ∂ 2 ϵ ∂θℓ1 ∂θℓ2 ∂ϵ ∂θℓ1 ∂ϵ ∂θℓ2 + LE " ⟨O⟩ ∂ 2 ϵ ∂θ2 ℓ ∂ϵ ∂θℓ 2 # (B.280) = L(L − 1)d 2F 2 0 (F0 − 1)(2F0 − 1) 8(d 2 − 1)2 + L (d + 2)(F0 − 1)F 2 0 [(d + 2)F0 − 2] 4(d − 1)(d + 1)(d + 3) . (B.281) Summary of average dynamical index ζ with restricted Haar ensemble 211 By subtracting O0µ solved in Eq. (B.251), the ensemble average of ϵµ under restricted Haar ensemble is ϵRHµRH = E [⟨O⟩ µRH] − O0µRH (B.282) = L(L − 1)d 2F 2 0 (F0 − 1)(2F0 − 1) 8(d 2 − 1)2 + L (d + 2)(F0 − 1)F 2 0 [(d + 2)F0 − 2] 4(d − 1)(d + 1)(d + 3) − O0 L(L − 1)d 2F0(F0 − 1)(2F0 − 1) 8(d 2 − 1)2 + L (d + 2)(F0 − 1)F0 [(d + 2)F0 − 2] 4(d − 1)(d + 1)(d + 3) (B.283) = L(L − 1)d 2F0(F0 − 1)(2F0 − 1) 8(d 2 − 1)2 (F0 − O0) + L (d + 2)(F0 − 1)F0 [(d + 2)F0 − 2] 4(d − 1)(d + 1)(d + 3) (F0 − O0) (B.284) ≃ L 2F0(F0 − 1)(2F0 − 1) 8d 2 (F0 − O0) + L (F0 − 1)F 2 0 4d (F0 − O0). (B.285) The ratio ζ becomes ζ∞ = ϵRHµRH K∞ 2 (B.286) = (L − 1) (2F0 − 1) 2LF0(F0 − 1) (F0 − O0) + (d + 2)(d 2 − 1) [(d + 2)F0 − 2] Ld2(d + 3)F0(F0 − 1) (F0 − O0) (B.287) = (L − 1) (2O0 − 2R − 2κ − 1) 2L(O0 − R − κ)(R + κ + 1 − O0) (R + κ) + (d + 2)(d 2 − 1) [(d + 2)(O0 − R − κ) − 2] Ld2(d + 3)(O0 − R − κ)(R + κ + 1 − O0) (R + κ) (B.288) ≃ R + κ R + κ + 1 − O0 1 − 1 2(O0 − R) + d L . (B.289) When O0 < 1 with R = 0, we directly have limκ→0 ζ∞ = 0. At the critical point of O0 = 1 with R = 0, we have limκ→0 ζ∞ = 1/2 + d/L, and in the large limit L ≫ d, we have ζ∞ = 1/2. However for O0 > 1 with R = O0 − 1, we have limκ→0 ζ∞ ∝ 1 + R/κ → +∞ which diverges to infinity. 212 Appendix C Supplemental Material for Section 4.1 C.1 Preliminary of state discrimination C.1.1 General Helstrom limit The minimum ‘Helstrom’ error probability [154, 151] for the discrimination between m states {ρi} m−1 i=0 with prior probability {pi} m−1 i=0 PH ({ρi , pi}) = 1 − P max i Πi=I X i piTr (ρiΠi), (C.1) where the POVM element Πi corresponds to the hypothesis that the state is ρi . For the binary pure-state case, Eq. (C.1) can be reduced to Eq. (4.2). C.1.2 MLE decision for general state discrimination We use the notation PA|B(a|b) for the probability of event A = a to happen conditioned on an event B = b already happened. For example, in Sec. 4.1 PH˜ |H (h˜|h) in Eq. (4.1) denotes the event of decision H˜ = h conditioned on the true label H = h. 213 2 4 6 Haar/2 n State preparation depth D0 0.004 0.006 0.008 0.010 0.012 0.014 Helstrom limit h PH i H(D0) S(D0) Figure C.1: Ensemble-averaged Helstrom limit ⟨PH⟩ for n = 6 qubits states sampled from H(D0) and S(D0). The blue dashed line represents ⟨PH⟩ ∼ 1/2 n+2 and the red dashed line is the average over red dots. Consider the general state discrimination described above, when the VQC implements the unitary UD on input ρi , the measurement result M is equal to j with the probability PM|H(j|i) = Tr |j⟩ ⟨j|UDρiU † D , (C.2) where {j} 2 n−1 j=0 forms the set of all possible measurement results. Conditioned on the measurement result j, MLE strategy minimizes the average error probability by making the decision on the state ρ˜i via ˜i(j) = argmaxipiPM|H(j|i), (C.3) leading to the minimum error probability for a fixed measurement choice as PE ({ρi , pi}) = 1 − mX−1 i=0 piPH˜ |H (i|i). (C.4) Here the conditional correct probability PH˜ |H (i|i) = P ˜i(j)=i PM|H(j|i). In this paper, we focus on the binary pure state case with equal prior and the MLE error probability is reduced to PE = 1 2 1 − X j:PM|H(j|0)≥PM|H(j|1) h PH˜ |H (j|0) − PH˜ |H (j|1)i . (C.5) 214 � � �� �Z �� � � Figure C.2: A decomposition of a general two-qubit gate. The R gate represents an arbitrary single qubit rotation with 3 independent parameters as R(θ1, θ2, θ3) = RZ(θ1)RY (θ2)RZ(θ3). C.2 Evaluation of ⟨PH (ψ0, ψ1)⟩H(D0) C.2.1 n ≫ 1 limit of finite D0 When n ≫ 1, we expect the typical state overlap | ⟨ψ0|ψ1⟩ |2 to be small [61], therefore PH (ψ0, ψ1) = 1 2 h 1 − p 1 − | ⟨ψ0|ψ1⟩ |2 i ∼ 1 4 | ⟨ψ0|ψ1⟩ |2 . (C.6) Below we show that regardless of D0, within the state ensemble H (D0), the typical Helstrom limit between states is⟨PH (ψ0, ψ1)⟩H(D0) ∼ 1/2 n+2, which simply follows from the typical overlap ⟨| ⟨ψ0|ψ1⟩ |2 ⟩H(D0) = 1/2 n . We also show it numerically in Fig. C.1. Consider an n-qubit system, suppose one considers an initial product state |0⟩ ⊗n , for two local quantum circuits implementing unitaries U0, U1, the overlap is | ⟨ψ0|ψ1⟩ |2 = ⟨0|U|0⟩ ⟨0|U|0⟩ where U = U † 0U1. As U and U † each appears only once, taking the ensemble average over a 1-design [169, 170, 61] suffices to produce the Haar value. In our case, we consider random local quantum circuits for U0 and U1, with each two-quit unitary Haar random. Regardless of the number of layers of the circuit, the ensemble {U † 0U1} forms a 1-design, therefore Z Haar dU ⟨0|U † |0⟩ ⟨0|U|0⟩ = 1 2 n , (C.7) where we utilized the Haar average identity for two elements of a U(d) matrix, ⟨| ⟨ψ0|ψ1⟩ |2 ⟩H(D0) = Z Haar dUUαaU ∗ βb = 1 d δαβδab. (C.8) 215 where d is its dimension. C.2.2 D0 ≫ 1 limit The exact ⟨PH (ψ0, ψ1)⟩H(D0) is not easy to compute for a finite D0. Here we consider the case of D0 ≫ 1, where the ensemble H (D0) is simply Haar random. In this case, the distribution of the overlap x = | ⟨ψ0|ψ1⟩ |2 can be analytically obtained [324]. Consider d = 2n complex numbers {αi} d i=1 as the complex amplitudes of the states ψ1 with normalization condition Pd i=1 |αi | 2 = 1, it indicates that the amplitudes {αi} d i=1 forms 2d-sphere. The other state ψ0 we can choose it to be |ψ0⟩ = |0⟩ with the freedom in choosing the Haar unitary. The probability distribution for x = | ⟨ψ0|ψ1⟩ |2 = | ⟨0|ψ1⟩ |2 = |α1| 2 = γ is P(x = γ) = R ∞ −∞ Q i d 2αiδ γ − |α1| 2 δ 1 − P i |αi | 2 R ∞ −∞ Q i d 2αiδ (1 − P i |ai | 2) = R ∞ −∞ Q i>1 d 2αiS1 √γ δ 1 − γ − P i>1 |αi | 2 2 √γS2d−1(1)/2 = S1 √γ S2d−3 √ 1 − γ 2 √γ √ 1 − γS2d−1(1) = (d − 1)(1 − γ) d−2 . (C.9) Here we utilize the N-dimensional sphere surface area formula SN (R) ≡ Z ∞ −∞ N Y +1 l=1 dxlδ R − vuut N X +1 l=1 x 2 l = 2R Z ∞ −∞ N Y +1 l=1 dxlδ R 2 − N X +1 l=1 x 2 l ! = 2π N+1 2 Γ N+1 2 R N , (C.1 where we use the expansion 2aδ(x 2 − a 2 ) = δ(x − a) + δ(x + a). One can also check that the probability R 1 0 dγP(x = γ) = 1 is normalized. From the probability distribution, we have the Haar average Helstrom limit as ⟨PH⟩Haar = Z 1 0 dγ 1 2 1 − p 1 − γ p(x = γ) = d − 1 2 (d 2 − 3d + 1) = 1 2 (2n+1 − 1). (C.11) When n ≫ 1, the Haar average Helstrom limit ⟨PH⟩Haar ∼ 1/2 n+2 . C.3 Local random gates construction As shown in Fig. 4.7, various architectures of VQC are constructed from local two-qubit gates. In general, a two-qubit gate, in the form of 4 × 4 unitary, includes 15 independent parameters (up to a global phase). Such a gate can be decomposed into single qubit rotations and up to 3 CNOT gate [325], as we show in Fig. C.2. 217 Appendix D Supplemental Material for Section 4.2 D.1 Review of Classical DDPM In classical DDPM, the forward diffusion process first gradually converts the observed data to a simple random noise based on non-equilibrium thermodynamics, and then an associated reverse-time process is learned to generate samples with target distribution from the noise [192, 37, 36, 193]. Classical DDPM can be viewed as a latent variational autoencoder (VAE) model with stochastic hidden layers of the same dimension as the input data. The forward diffusion process simply adds a sequence of small amount of Gaussian perturbations to the data sample x0 in T steps to produce the noisy samples x1, . . . , xT according to a linear Markov chain: q(xt | xt−1) = N xt | p 1 − βtxt−1, βtI , (D.1) where βt ∈ (0, 1)is a given noise schedule such that q(xt) converges to N(0, I). Usually the noise schedule satisfies β1 < β2 < · · · < βT so that larger step sizes are used when the samples become more noisy. In the reverse-time process, we would like to sample from q(xt−1 | xt), which allows us to generate new data 21 sample from the noise distribution q(xT ). However, the conditional distribution q(xt−1 | xt) is intractable and approximated by a decoder of the form: pθ(xt−1 | xt) = N xt−1 | µθ(xt , t), σ2 t I , (D.2) where the time-dependent conditional mean vector µθ is parameterized by a neural network. Then the training of µθ can be efficiently performed by maximizing an evidence lower bound (ELBO) for the loglikelihood function log pθ(x0). Refs. [192, 37] showed that the ELBO can be expressed as a linear combination of (relative) entropy terms for Gaussian distributions that can be evaluated analytically into a simple weighted L 2 loss function. In Ref. [36], it is shown that DDPM forward process is the discretized version of the following continuous stochastic differential equation (SDE): for t ∈ [0, T], dx(t) = − 1 2 β(t)x(t) dt + p β(t) dw(t), (D.3) where w(t) is the standard Brownian motion, and the DDPM decoder (D.2) corresponds to the discretization of a reverse-time SDE given by dx ←(t) = − 1 2 β(t)[x ←(t) + 2∇x log pt(x ←(t))] dt + p β(t) dw ←(t), (D.4) where w←(t) is the standard Brownian motion flowing back in time and pt(·) is the probability density of x(t). The forward process x(t) in (D.3) and the reverse-time process x←(T − t) in (D.4) have the same marginal probability densities [326]. Thus in the forward flow, estimation of the conditional mean vector µθ(·, t) in DDPM is equivalent to learn the time-dependent score ∇x log pt(·) that contains the full 21 information of data distribution p0. Such score estimate is subsequently used in the reverse time process for sampling new data from p0. D.2 Reproducing kernel Hilbert spaces and maximum mean discrepancy A bivariate function F : V × V → R is a positive definite kernel if Pm i,j=1 cicjF(|ϕ⟩ i , |ϕ⟩ j ) ≥ 0 for all m ≥ 1, |ϕ⟩ 1 , . . . , |ϕ⟩m ∈ V , and c1, . . . , cm ∈ R. From the Moore-Aronszajn theorem (see e.g., [327, Theorem 7.2.4]), for every symmetric and positive-definite kernel F, there is a unique Hilbert space H := H(F) of real-valued functions on V such that: 1. F(·, |ϕ⟩) ∈ H for each |ϕ⟩ ∈ V ; 2. g(|ϕ⟩) = ⟨g, F(·, |ϕ⟩)⟩H for each g ∈ H and |ϕ⟩ ∈ V . The space H of functions {g : V → R} is called the reproducing kernel Hilbert space (RKHS) associated with the kernel F. Property (i) defines a feature map (a.k.a. RKHS map) V → H via |ϕ⟩ 7→ F(·, |ϕ⟩), and property (ii) is the reproducing kernel property for the evaluation functional. In addition, we have for all |ϕ⟩, |ψ⟩ ∈ V , F(|ϕ⟩, |ψ⟩) = ⟨F(·, |ψ⟩), F(·, |ϕ⟩)⟩H = |⟨ϕ|ψ⟩|2 . (D.5) Based on the kernel F, the (squared) maximum mean discrepancy (MMD) loss between two state distributions is DMMD(E1, E2) := sup g∈B ⟨E|ϕ⟩∼E1 [F(·, |ϕ⟩)] − E|ψ⟩∼E2 [F(·, |ψ⟩)], g⟩H 2 , (D.6) where B is the unit ball in H centered at the origin and ⟨,⟩H denotes the inner product in the RKHS. 220 By duality of H, we have DMMD(E1, E2) = E|ϕ⟩∼E1 [F(·, |ϕ⟩)] − E|ψ⟩∼E2 [F(·, |ψ⟩)] 2 H . Therefore, we may calculate the MMD loss as following DMMD(E1, E2) = ⟨E|ϕ⟩∼E1 [F(·, |ϕ⟩)], E|ϕ′⟩∼E1 [F(·, |ϕ ′ ⟩)]⟩H + ⟨E|ψ⟩∼E2 [F(·, |ψ⟩)], E|ψ′⟩∼E2 [F(·, |ψ ′ ⟩)]⟩H − 2⟨E|ϕ⟩∼E1 [F(·, |ϕ⟩)], E|ψ⟩∼E2 [F(·, |ψ⟩)]⟩H = E|ϕ⟩,|ϕ′⟩∼E1 [F(|ϕ⟩, |ϕ ′ ⟩]) + E|ψ⟩,|ψ′⟩∼E2 [F(|ψ⟩, |ψ ′ ⟩)] − 2 E|ϕ⟩∼E1,|ψ⟩∼E2 [F(|ϕ⟩, |ψ⟩)], where all random states |ϕ⟩, |ϕ ′ ⟩, |ψ⟩, |ψ ′ ⟩ are drawn independently. Hence, we have established the connection DMMD(E1, E2) = F(E1, E1) + F(E2, E2) − 2F(E1, E2), where F(E1, E2) := E|ϕ⟩∼E1,|ψ⟩∼E2 [F(|ϕ⟩, |ψ⟩)]. In particular, if F = |⟨ϕ|ψ⟩|2 is state-wise fidelity, then the resulting MMD corresponds to the mean fidelity defined in Eq. (4.10) of Sec. 4.2. 221 D.3 Computation of Wasserstein distance. In the discrete or empirical cases, where S1 and S2 are supported over finite number of state vectors {|ϕi⟩}m i=1 and {|ψj ⟩}n j=1, computation of Wasserstein distance can be cast into a linear program [328]. To this end, let a and b be histograms representing S1 and S2, respectively. Define the m×n cost matrix C by Ci,j := D2 (|ϕi⟩, |ψj ⟩), where |ϕi⟩ and |ψj ⟩ are state vectors from sampled ensemble state S1 and S2, respectively. For pure states, the cost matrix reduces to a function of infidelity, i.e., Ci,j = 1−F 2 (|ϕi⟩, |ψj ⟩). Then we have W2(S1, S2) = min P ⟨P, C⟩, s.t. P1n = a, P ⊤1m = b, P ≥ 0. Furthermore, we note that the above generalization of Wasserstein distance to characterize separation between two ensembles of quantum states is different from the case of Refs. [329, 330, 259], where only a pair of quantum states are considered. D.4 SWAP test In the QuDDPM framework in Sec. 4.2, we need to evaluate the fidelities between states |ψ⟩ from real diffusion ensemble Sk and the ones |ψ˜⟩ from backward generated ensemble S˜ k. For any two pure states |ψ⟩ and |ψ˜⟩, one can perform the SWAP test to obtain the fidelity, which is illustrated in Fig. D.1. The 222 |0⟩ H H Z |ψ⟩ |ψ˜⟩ Figure D.1: Circuit implementation of swap test. We show an example of swap test between two 3-qubit state |ψ⟩, |ψ˜⟩. A Z-basis measurement is performed at the end. SWAP test consists of two Hadamard gates and a controlled-swap gate applied on 2n + 1 qubits. Given the input |0, ψ, ψ˜⟩, the output state ahead of measurement is |0, ψ, ψ˜⟩ → 1 2 |0⟩ |ψ, ψ˜⟩ + |ψ, ψ˜⟩ + 1 2 |1⟩ |ψ, ψ˜⟩ − |ψ, ψ˜⟩ , (D.7) then the probability of measure 0 on the first qubit is P(first qubit in |0⟩) = 1 4 ⟨ψ, ψ˜| + ⟨ψ, ψ˜| |ψ, ψ˜⟩ + |ψ, ψ˜⟩ = 1 2 + 1 2 |⟨ψ|ψ˜⟩|2 (D.8) which directly indicates the fidelity. To implement the swap test between two n-qubit state |ψ⟩ and |ψ˜⟩, in general we need n Fredkin gate between every pair of qubits of |ψ⟩ and |ψ˜⟩, where it is sufficient to implement a Fredkin gate among a control qubit and two target qubits via 5 two-qubit gate [331]. Therefore, in general O(5n) two-qubit gates are enough to perform the swap test in Fig. D.1 regardless of the locality of these gates. 223 D.5 Forward and backward circuits The forward noisy process is implemented by the fast scrambling model [213] with controllable parameters on n data qubits to mimic the diffusion process in the Hilbert space. The diffusion circuit is implemented as (see Fig. D.2 (a)) U(ϕt , gt) = Y T t=1 Wt(gt)Vt(ϕt), (D.9) where Vt consists of general single qubit rotations on each qubit as Vt(ϕt) = On k=1 e −iϕt,3k+2Zk/2 e −iϕt,3k+1Yk/2 e −iϕt,3kZk/2 , (D.10) and homogeneous entangling layer Wt consists of ZZ rotation on every pair of qubits, Wt(gt) = exp −i gt 2 √ n X k1<k2 Zk1Zk2 = Y k1<k2 exp −i gt 2 √ n Zk1Zk2 . (D.11) With a tunable range of random rotation angles ϕ and g, we can control the diffusion speed of original quantum states ensemble S0 in the Hilbert space towards Haar random states ensemble. The backward denoising process consists of T steps, where the operation at every step is implemented by a PQC U˜ t(θt) followed by measurements on ancillae. Φt |ψ˜(t) ⟩ = (ΠA ⊗ ID)U˜ t |ψ˜(t) ⟩ q ⟨ψ˜(t) |U˜† t (ΠA ⊗ ID)U˜ t |ψ˜(t)⟩ = |z(t)⟩A ⊗ |ψ˜(t−1)⟩ (D.12) 224 where ΠA = |z(t)⟩⟨z(t)|A is the POVM of measurement on ancillas in computational basis |z(t)⟩A . Note that we do not make any specific constraint on the measurement results z(t), instead, we simply perform the measurement on ancillas and collect post-measurement state on data qubits |ψ˜(t−1)⟩. In general, the backward PQC can utilize any architecture as long as its expressivity can guarantee for the backward transport from ensemble St+1 to St . In this work, we adopt the hard-efficient ansatz [27] which is universal and easy to implement in practical experiments. For a L-layer backward PQC U˜ t , in each layer it consists of single qubits rotations along X and Y axes on each qubit, followed by control-Z gate on nearest neighbors as (see Fig. D.2 (b)) U˜ t(θt) = Y L ℓ=1 W˜ tV˜ t(θt) (D.13) where V˜ t(θt) = On k=1 e −iθt,2k+1Yk/2 e −iθt,2kXk/2 (D.14) W˜ t = ⌊(nO−1)/2⌋ k=1 CZ2k,2k+1 ⌊ O n/2⌋ k=1 CZ2k−1,2k (D.15) with CZk1,k2 to be the control-Z gate on qubit k1 and k2. The whole backward denoising process can thus be represented as Φ = Φ1 ◦ Φ2 ◦ · · · ◦ ΦT −1 ◦ ΦT . (D.16) D.6 Additional details on distance metrics evaluation In Fig. D.3, we show a numerical comparison between the MMD distance (see Eq. 4.11) and Wasserstein distance (see Eq. 4.17) in different generation tasks. In the clustered state generation (Fig. D.3(a), (b)), 225 (a) Forward diffusion circuit . . . . . . . . . . . . . . . . . . . . . . . . RZ RY RZ 1 k <k Y 2 RZZk1,k2 RZ RY RZ RZ RY RZ RZ RY RZ for each diffusion step (b) Backward denoising circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . nA RX RY Z RX RY Z n RX RY RX RY RX RY RX RY repeat for L layers Figure D.2: Quantum circuit architectures. In (a) we show the forward diffusion circuit for one diffusion step on a system of n = 4 qubits. In (b) we show one-layer architecture of a L-layer backward denoising PQC on a system of n = 4 data and nA = 2 ancilla qubits. RX, RY and RZ are Pauli X, Y and Z rotations. RZZ is the two-qubit ZZ rotation. 226 100 10−1 10−2 10−3 LMMD (a) 10−2 10−3 10−4 10−5 (c) 0 10 20 t 100 10−1 10−2 10−3 LWass (b) 0 20 40 t 10−1 10−2 10−3 (d) Figure D.3: MMD and Wasserstein loss between ensemble S0 and ensemble through diffusion process at step t, St in generation of cluster state (left) and circular state ensemble (right). For cluster state ensemble, MMD (a) and Wasserstein loss (c) behaves similarly. For circular state ensemble, MMD loss (c) vanishes while Wasserstein loss characterize the diffusion of distribution. The data set size for cluster states is |S| = 100, and |S| = 500 for circular states. both distance measure behaves similarly. However, for the circular state generation in Fig. D.3(c), (d) , the Wasserstein distance can characterize the diffusion of ensemble while the MMD distance fails. Note that only relative shift of MMD or Wasserstein loss through diffusion matters, while a comparison between their magnitude is unfair. In the following, we provide the simple proof on Lemma 15. Lemma 15 Let E1 be a uniform circular distribution on the Block sphere and E2 be the Haar random state distribution. Then DMMD(E1, E2) = 0. 227 Proof of Lemma 15. We can calculate the ensemble average fidelity by Haar integral as F(E1, E2) = 1 2π Z 2π 0 dθ Z dU| ⟨0|U † e −iθY |0⟩ |2 = 1 2π Z dθ tr(|0⟩⟨0|) tr e −iθY |0⟩⟨0| e iθY 2 = 1 2 , F(E1, E1) = 1 (2π) 2 Z 2π 0 dθdθ′ | ⟨0|e iθ′Y e −iθY |0⟩ |2 = 1 (2π) 2 Z 2π 0 dθdθ′ cos2 θ ′ − θ = 1 2 , F(E2, E2) = Z dU dU′ | ⟨0|U †U ′ |0⟩ |2 = Z dU tr(|0⟩⟨0|) tr U |0⟩⟨0|U † 2 = 1 2 . Therefore, the MMD distance is DMMD(E1, E2) = F(E1, E1) + F(E2, E2) − 2F(E1, E2) = 0, (D.17) which indicates its incapability to discriminate the difference between the circular state ensemble and Haar random states. For the true distribution, when E1 = E2, we should have DMMD(E1, E2) = DWass(E1, E2) = 0 in theory, while in practice due to finite samples, both of them cannot vanish exactly (see Fig. D.3) left with a relatively small residual. 0 � () 0 ⊗ Z � (0) Min ℒ0 PQC ( layers) �DT (a) QuDT 0 � () 0 ⊗ Z � (0) Generator ( layers) �G (b) QuGAN Discriminator (D layers) �D Min ℒD Min ℒG Figure D.4: Schematic of QuDT (a) and QuGAN (b) in state ensemble generation. Since the forward of QuDT only has a single step from data to noise, it is not necessary to implement the diffusion for QuDT. In the backward process, there is also a single PQC U˜DT with depth LT. In the training, the samples generated via applying U˜DT on random states are directly compared to the target ensemble S0, shown in (a). The QuGAN is similar to QuDT, except that a discriminator circuit is added to evaluate the cost function. D.7 Details of simulation In this work, the simulation of QuDDPM is implemented with the Python library TensorCircuit [332] and Bloch sphere visualizations are plotted with the help from QuTip [78]. The computation of the Wasserstein distance, on the other hand, is performed by the Python library POT [333]. The major codes and data of the work are available in Ref. [219]. D.8 Benchmarks: QuDT and QuGAN In this section, we provide details on the benchmarks, including quantum direct transport (QuDT) and quantum generative adversarial network (QuGAN). First, we emphasize that none of the previous works actually solves the generation of an ensemble of quantum states, as we summarize in Table D.1. To perform the benchmark, we actually generalize the previous works as we detail below. 229 Model Task Ref QuGAN Generation of a quantum state towards a given target state [189, 334, 335] QuGAN Learning and loading a classical distribution to quantum state [206, 190, 191] QuCBM Generation of a quantum state towards a target state [203] QuCBM Generative learning of a classical distribution [205, 202] Table D.1: Summary of prior work results on quantum generative learning, including quantum generative adversarial network (QuGAN) and quantum circuit born machine (QuCBM). QuDT utilizes the same setup as QuDDPM, but attempts to generate the target state ensemble from a random state ensemble directly by only a single step of training on a quantum circuit with LDT layers (see Fig. D.4(a)), instead of discretized diffusion steps. Note that due to the single-step operation, it is not necessary to implement the forward noisy diffusion process. In the training, the loss function to be minimized is LDT(U˜DT, E0, E˜ T ) = D E0, E˜ 0[U˜DT, E˜ T ] , (D.18) which is the same as the one utilized in QuDDPM. The QuDT we proposed here can be regarded as a generalized version of quantum circuit Born machine (QuCBM) [205, 202], which is initially targeted at generating a classical distribution from its overlap with each basis state (for example, computational basis). Another quantum generative model to be compared with is QuGAN [39, 206, 189]. Since prior studies only focus on generating a single target state, to fit to our state ensemble generation tasks, we generalize the design of QuGAN, shown in Fig. D.4(b). The QuGAN consists of two PQCs, a generator U˜G and a discriminator U˜D. The generator takes the same input as the one in QuDT, while instead of minimizing a loss function directly, it tries to pass the evaluation of the discriminator that tells whether a given state comes from the real or fake state ensemble. Here we choose the discriminator having the same circuit architecture as the generator, and a single qubit Pauli-Z measurement is performed at the end to tell real/fake states apart. The training consists of several adversarial cycles, where each cycle includes the training for 230 the discriminator and for the generator. First, we train the discriminator U˜D while keeping the generator fixed, and the loss function to be minimized is LGAN−D = P(real|fake) − P(real|real). (D.19) where P(real|fake) is the average probability that the discriminator identifies inputs from generator as states from the true ensemble, and similar definition holds for P(real|real). With the optimal discriminator, the probability P(real|real) should approach unity. Next, we train the generator U˜G while keeping the discriminator fixed, and the loss function to be minimized is LGAN−G = −P(real|fake). (D.20) With the optimal generator, the probability P(real|fake) should approach unity. Therefore, the joint optimum of generator and discriminator should lead to a near zero discriminator loss LGAN−D ≃ 0. For a fair comparison among the three models, we choose the number of variational parameters in QDDPM, QuDT, and QuGAN’s generator U˜G to be the same, and the number of total training steps is also the same. In QuDDPM, we have T = 20 diffusion steps where in each step, the layer of PQC is L = 6. In QuDT, the layer of PQC is LDT = 120, and in QuGAN, the layer of generator is also LGAN−G = 120 and the layer of discriminator is LGAN−D = 16. We train the QuGAN with five adversarial cycles. 231 Appendix E Supplemental Material for Chapter 5 E.1 Supplemental Material for Section. 5.1 E.1.1 Representation of states: single-mode case In this section, we present the proof for the representation of the single-mode energy-regularized ensemble of states in Eq. (5.3) in the main paper. The generalization to the multimode case is presented in E.1.4. We first prove that the representation holds for all L-layer circuits, and then provide analysis on the energyregularization. To simplify the notation, we define β = (β1, . . . , βL) T , θ = (θ1, . . . , θL) T , ϕ = (ϕ1, . . . , ϕL) T and the overall parameters x = (β, θ, ϕ). Consider a single-mode qubit-qumode variational circuit consisting of ECD blocks, in the qubit |0⟩ and |1⟩ basis, each ECD block can be written in the matrix form as UECD(β)UR(θ, ϕ) = e iϕ sin θ 2D(−β) cos θ 2D(−β) cos θ 2D(β) e i(π−ϕ) sin θ 2D(β) , (E.1) 232 where for convenience, we relabel the variable as ϕ − π/2 → ϕ, and we use this definition through the following. The output state from unitary U = QL ℓ=1 UECD(βℓ)UR(θℓ , ϕℓ) on the input state |0⟩ q |0⟩m is |ψ(β, θ, ϕ)⟩ q,m = U(β, θ, ϕ)|0⟩ q |0⟩m = X 1 a=0 X s ws,a(θ, ϕ)|a⟩ q e iχs(β) |(−1)a s · β⟩m ≡ X 1 a=0 X s ws,a(θ, ϕ)|a⟩ q |Bs,a⟩m (E.2) where the length-L sign vector s is defined as s = (s1:L−1, −1) with s1:L−1 ∈ {−1, 1} L−1 . We absorb the extra phase in qumode ket vector as |Bs,a⟩ ≡ e iχs(β) |(−1)as · β⟩m for convenience, where the phase is defined to be χs(β) ≡ L X−1 ℓ=1 X L ℓ ′=ℓ+1 sℓsℓ ′(Re{βℓ} Im{βℓ ′} − Im{βℓ} Re{βℓ ′}), (E.3) Via defining vs,a(θ, ϕ, β) ≡ e iχs(β)ws,a(θ, ϕ), Eq. (E.2) is equivalent to Eq. (5.3) in the main paper. In Eq. (E.2), |a⟩ q is the qubit state in computational basis. The weight for each of the superpositions is ws,a(θ, ϕ) = e iΦs,a(ϕ)Ts,a(θ), (E.4) where Φs,a(ϕ) ≡ a(nsπ + ϕ1) + P(a) X L ℓ=1 (dsℓ − 1) (sℓϕℓ − δsℓ,1π), (E.5) Ts,a(θ) ≡ Y L ℓ=2 sin θℓ + dsℓπ 2 sin θ1 2 + (P(a)s1 + 1)π 4 . (E.6) 233 There are some notations to be explained. We define P(x) ≡ (−1)x as the parity of a variable x, which equals ±1 given x is even/odd. We define the difference sign vector ds to represent the change of signs in the vector s—the ℓth element of the difference sign vector is dsℓ ≡ |sℓ − sℓ−1|/2, (E.7) which is zero when sℓ+1 = sℓ but 1 otherwise. Note that (−1)dsℓ = sℓ × sℓ−1. As sL = −1 always, we assign ds1 = |s1 − sL|/2 = (s1 + 1)/2 such that it reflects the value of s1. We can find that the size of {dsℓ |dsℓ = 0}ℓ≥2 is ns = L − 1 − X L ℓ=2 dsℓ (E.8) since dsℓ = 0, 1. We would like to comment that ns also equals the number of {ϕℓ}ℓ≥2 appearing in Φs,a with nonzero coefficient and the number of {sin(θℓ/2)}ℓ≥2 appearing in Ts,a. From Eq. (E.5), as the coefficient of each ϕℓ in Φs,a with ℓ ≥ 2 is P(a)(dsℓ − 1)sℓ , since P(a) = (−1)a and sℓ = ±1, only dsℓ = 1 can make the coefficient be zero, and thus ns counts the number of {ϕℓ}ℓ≥2 with nonzero coefficient. Moreover, one can also see from Eq. (E.6) that if dsℓ = 0 with ℓ ≥ 2, the function of θℓ in Ts,a is in the form of sin(θℓ/2) but cos(θℓ/2) if dsℓ = 1. To count the total number of {ϕℓ} L ℓ=1 with nonzero coefficient in Φs,a, denoted as Ns,a, from Eq. (E.5) the coefficient of ϕ1 is a + P(a)(ds1 − 1)s1 = a + (−1)a (ds1 − 1)(2ds1 − 1) = 1 − a − (−1)ads1 ≥ 0, where in the first equation we utilize the definition s1 = 2ds1 − 1 and in the second equation we utilize (ds1) 2 = ds1 and a + (−1)a = 1 − a. As the coefficient of ϕ1 is either zero or one, the total number is just Ns,a = ns + 1 − a − (−1)ads1. (E.9) 234 Note that it is also the total number of {sin(θℓ/2)} L ℓ=1 in Ts,a, because sin(θ1/2) exists in Ts,a only if 1−(P(a)s1+1)/2 = 1. Using the identities s1 = 2ds1−1 and 1+(−1)a = 2a, we have 1−(−1)ads1+a, which is exactly the terms following ns in Ns,a we have above. To help understanding the notations, we provide an example of L = 2 as following |ψ(β, θ, ϕ)⟩ q,m = |0⟩ q e i(ϕ1+ϕ2+β R 1 β I 2−β I 1 β R 2 ) sin θ1 2 sin θ2 2 |−β1 − β2⟩m + e i(β I 1 β R 2 −β R 1 β I 2 ) cos θ1 2 cos θ2 2 |+β1 − β2⟩m + |1⟩ q e i(π−ϕ2+β R 1 β I 2−β I 1 β R 2 ) cos θ1 2 sin θ2 2 |+β1 + β2⟩m + e i(ϕ1−β R 1 β I 2+β I 1 β R 2 ) sin θ1 2 cos θ2 2 |−β1 + β2⟩m . (E.10) where β R/I 1,2 denotes the real or imaginary part of β1,2 to shorten the formula. One can check that it agrees with the representation of state in Eq. (E.2) with weight following the definitions from Eq. (E.4). Below we present the detailed proof of the state representation of Eq. (E.2) (equivalently Eq. (5.3) in the main paper) by mathematical induction. Proof. First, we start from L = 2: in this case it has already been shown in Eq. (E.10) that Eq. (E.2) is true. Then we suppose that for an L-layer circuit, it is also in the form of Eq. (E.2), and for the L + 1-layer circuit we have |ψL+1(β, θ, ϕ)⟩ = UECD(βL+1)UR(θL+1, ϕL+1)|ψL(β, θ, ϕ)⟩ = e iϕL+1 sin θL+1 2 D(−βL+1) cos θL+1 2 D(−βL+1) cos θL+1 2 D(βL+1) e i(π−ϕL+1) sin θL+1 2 D(βL+1) P s ws,0(θ, ϕ) QL ℓ=1 D(sℓβℓ)|0⟩m P s ws,1(θ, ϕ) QL ℓ=1 D(−sℓβℓ)|0⟩m = P s e i(Φs,0+ϕL+1) sinθL+1 2 Ts,0D(−βL+1) QL ℓ=1 D(sℓβℓ)|0⟩m + P s e iΦs,1 cosθL+1 2 Ts,1D(−βL+1) QL ℓ=1 D(−sℓβℓ)|0⟩m P s e iΦs,0 cosθL+1 2 Ts,0D(βL+1) QL ℓ=1 D(sℓβℓ)|0⟩m + P s e i(Φs,1+π−ϕL+1) sinθL+1 2 Ts,1D(βL+1) QL ℓ=1 D(−sℓβℓ)|0⟩m . (E.11) 235 Note that the product of displacement operator above is evaluated from β1 to βL. To compare with the representation in Eq. (E.2), we define s ′ to be (±s, −1), which actually covers all possible cases in the definition of sign vector of length L + 1. To prove the result, all we need to do is to show that Eq. (E.11) agrees with the representation in Eq. (E.2). We start from the displacement on the qumode. The total displacement on qumode in Eq. (E.11) can be directly seen that it satisfies (−1)as ′ · β with s = (±s, −1). There is an phase generated due to the braiding relation of displacement operators D(α) = e αm†−α ∗m = e −|α| 2/2 e αm† e −α ∗m, and we can directly prove that it is in the form of Eq. (E.3) by D(αL)· · · D(α1) = e − 1 2 PL ℓ=1 |αℓ| 2 e αLm† e −α ∗ Lm · · · e α1m† e −α ∗ 1m = e − 1 2 PL ℓ=1 |αℓ| 2 Y M ℓ=1 e αℓm† ! Y L ℓ=1 e −α ∗ ℓm ! L Y−1 ℓ=1 e −αℓ( PL ℓ ′=ℓ+1 α ∗ ℓ ′ ) ! (E.12) = e − 1 2 PL ℓ=1 |αℓ| 2 e PL ℓ=1 αℓm† e − PL ℓ=1 α ∗ ℓme − PL−1 ℓ=1 PL ℓ ′=ℓ+1 αℓα ∗ ℓ ′ (E.13) = e − 1 2 PL ℓ=1 |αℓ| 2 e − PL−1 ℓ=1 PL ℓ ′=ℓ+1 αℓα ∗ ℓ ′ e 1 2 | PL ℓ=1 αℓ| 2 D X L ℓ=1 αℓ ! (E.14) = e i PL−1 ℓ=1 PL ℓ ′=ℓ+1(Re{αℓ} Im{αℓ ′}−Im{αℓ} Re{αℓ ′})D X L ℓ=1 αℓ ! , (E.15) where in the second line we perform a reordering to the qumode operators such that all annihilation operators follows creation ones, introducing extra phase by the Baker–Campbell–Hausdorff identity e Ae B = e Be Ae [A,B] (when higher-order commutators are zero). The last line is obtained from expanding every αℓ = Re{αℓ} + iIm{αℓ}. By letting αℓ = (−1)asℓβℓ , we have the formula in Eq. (E.3). 236 Next we need to show the weight satisfying the form in Eq. (E.4). Starting from the first item in the first line of Eq. (E.11) which corresponds to s ′ = (s, −1) and a = 0, the difference in sign vector is ds ′ = (ds, 0), then the phase and amplitude in the weight can be reduced to Φs,0 + ϕL+1 = X L ℓ=1 (dsℓ − 1)sℓϕℓ − (dsℓ − 1)δsℓ,1π + ϕL+1 = L X +1 ℓ=1 (ds ′ ℓ − 1)s ′ ℓϕℓ − (ds ′ ℓ − 1)δs ′ ℓ ,1π = Φs ′ ,0, (E.16) sin θL+1 2 Ts,0 = sin θL+1 2 Y L ℓ=2 sin θℓ + dsℓπ 2 sin θ1 2 + (s1 + 1)π 4 = L Y +1 ℓ=2 sin θℓ + ds ′ ℓ π 2 sin θ1 2 + (s ′ 1 + 1)π 4 = Ts ′ ,0, (E.17) where in Eqs. (E.16) and (E.17) we utilize the fact that ds ′ L+1 = 0. The last item in the second line of Eq. (E.11) also corresponds to s ′ = (s, −1) but a = 1 as the total displacement in qumode is (−1)as · β, therefore the weight becomes Φs,1 + π − ϕL+1 = nsπ + ϕ1 − Φs,0 + π − ϕL+1 = (ns + 1)π + ϕ1 − Φs ′ ,0 = ns ′π + ϕ1 − Φs ′ ,0 = Φs ′ ,1, (E.18) sin θL+1 2 Ts,1 = sin θL+1 2 Y L ℓ=2 sin θℓ + dsℓπ 2 sin θ1 2 + (−s1 + 1)π 4 = L Y +1 ℓ=2 sin θℓ + ds ′ ℓ π 2 sin θ1 2 + (−s ′ 1 + 1)π 4 = Ts ′ ,1, (E.19) where in the second equation of Eq. (E.18) we apply result from Eq. (E.16), and the last equation of Eq. (E.18) is obtained by directly utilizing the definition of ns ′ = L − PL+1 ℓ=2 ds ′ ℓ = ns + 1 given ds ′ = (ds, 0). 237 The second item in the first line of Eq. (E.11) corresponds to s ′ = (−s, −1) and a = 0 where the difference is ds ′ = (1 − ds1,(ds)2:L, 1). The phase and amplitude in weight are Φs,1 = nsπ + ϕ1 − X L ℓ=1 (dsℓ − 1)(sℓϕℓ − δsℓ,1π) = L − 1 − X L ℓ=2 dsℓ ! π + ϕ1 − X L ℓ=1 (dsℓ − 1)(sℓϕℓ − δsℓ,1π) = − X L ℓ=2 (dsℓ − 1)sℓϕℓ + [1 − (ds1 − 1)s1] ϕ1 − X L ℓ=2 (dsℓ − 1)(1 − δsℓ,1)π + (ds1 − 1)δs1,1π = X L ℓ=2 (ds ′ ℓ − 1)s ′ ℓϕℓ + (1 − ds ′ 1 · s ′ 1 )ϕ1 − X L ℓ=2 (ds ′ ℓ − 1)δs ′ ℓ ,1π − ds ′ 1 (1 − δs ′ ℓ ,1 )π = L X +1 ℓ=1 (ds ′ ℓ − 1)s ′ ℓϕℓ − L X +1 ℓ=1 (ds ′ ℓ − 1)δs ′ ℓ ,1π = Φs ′ ,0, (E.20) and cos θL+1 2 Ts,1 = cos θL+1 2 Y L ℓ=2 sin θℓ + dsℓπ 2 sin θ1 2 + (1 − s1)π 4 = L Y +1 ℓ=2 sin θℓ + ds ′ ℓ π 2 sin θ1 2 + (s ′ 1 + 1)π 4 = Ts ′ ,0, (E.21) where in the last line of Eq. (E.20), we apply identities 1 − ds ′ 1 · s ′ 1 = (ds ′ 1 − 1)s ′ 1 and ds ′ 1 (1 − δs ′ 1 ,1 ) = (ds ′ 1 − 1)δs ′ 1 ,1 . One can easily check those identities by substituting s ′ 1 = ±1 separately. Similarly, the first item in the second line of Eq. (E.11) with s ′ = (−s, −1) but a = 1 has the weight as Φs,0 = nsπ + ϕ1 − Φs,1 = ns ′π + ϕ1 − Φs ′ ,0 = Φs ′ ,1 (E.22) cos θL+1 2 Ts,0 = cos θL+1 2 Y L ℓ=2 sin θℓ + dsℓπ 2 sin θ1 2 + (s1 + 1)π 4 = L Y +1 ℓ=2 sin θℓ + ds ′ ℓ π 2 sin θ1 2 + (1 − s ′ 1 )π 4 = Ts ′ ,1, (E.23) where in the second equation of Eq. (E.22) we utilize the fact that ns ′ = L− PL+1 ℓ=2 dsℓ = L− PL ℓ=2 dsℓ− 1 = ns given ds ′ = (1 − ds1,(ds)2:L, 1). 238 Therefore, the weight in Eq. (E.4) is proved, and combined with the qumode displacement, we prove that the output state from an (L + 1)-layer circuit satisfies the form in Eq. (E.2). We now introduce the energy regularization. Without losing generality, we consider the displacement on each step following a complex Gaussian distribution, βℓ ∼ N C E/L, or equivalently Re{βℓ},Im{βℓ} ∼ NE/2L with zero mean and variance E/2L such that the ensemble-averaged energy of the output state is E. To see that, from Eq. (E.2), the ensemble-averaged energy of the output state is E h ⟨ψ|m†m|ψ⟩ i = X 1 a,a′=0 X s,s ′ E ws,aw ∗ s ′ ,a′ ⟨a ′ |a⟩ E h e i(χs,a−χs′ ,a′ ) ⟨(−1)a ′ s ′ · β|m†m|(−1)a s · β⟩ i = X 1 a=0 X s E |ws,a| 2 E |s · β| 2 = 2 · 2 L−1 · 1 2 L E = E, (E.24) where we have applied the identity ⟨a|a ′ ⟩ = δa,a′, E[w ∗ s ′ ,aws,a] = E[|ws,a| 2 ]δs,s ′ = 1/2 Lδs,s ′ and E |s · β| 2 = E h Re{s · β} 2 i + E h Im{s · β} 2 i = 2LE h Re{βℓ} 2 i = E, (E.25) utilizing the independence of each βℓ in β and the symmetry of real and imaginary parts of βℓ . E.1.2 Methods for gradient evaluation In this section, we provide some preparation materials for the eventual evaluation of the variance of the gradient in E.1.3. 239 As stated in the main text, the cost function in general can be written as C = Tr OU ρ0U † where ρ0 is the initial state and O is the observable. Consider the gradient with respect to the kth qubit rotation angle θk, then the gradient becomes ∂θk C = ∂θk Tr h OU ρ0U † i = −i 2 Tr h OUright[G(ϕk), Uleftρ0U † left]U † righti = 1 2 ⟨O⟩k (+1) − ⟨O⟩k (−1) , (E.26) where we denote G(ϕk) ≡ cos ϕkσ x+sin ϕkσ y , Uleft = Qk−1 j=1 UECD(βj )UR(θj , ϕj ) as the circuit ahead of kth layer and Uright as the complement circuit from kth to Lth layer. The last equation is obtained by applying the parameter-shift rule [252], [G(ϕk), ρ] = i h UR π 2 , ϕk ρU† R π 2 , ϕk − UR − π 2 , ϕk ρU† R − π 2 , ϕk i , and ⟨O⟩k (±1) corresponds to expectation of O with output state from the L-layer circuit, where θk is shifted as θk → θk ± π/2. For convenience, in the following discussion, we denote ws,a,k(µ) as the weight defined in Eq. (E.4), where θk is shifted by µπ/2 with µ = ±1. It is easy to check that E [∂θk C] = 1 2 (E [⟨O⟩k (+1) ] − E [⟨O⟩k (−1) ]) = 0, (E.27) due to the fact that the ensemble average is performed over θj ∈ [0, 2π). From the definition of variance, the variance is reduced to Var [∂θk C] = E (∂θk C) 2 = 1 2 E h ⟨O⟩ 2 k (+1) i − E [⟨O⟩k (+1) ⟨O⟩k (−1) ] , (E.28) where again we take E[⟨O⟩ 2 k (+) ] = E[⟨O⟩ 2 k (−) ]. For the state-preparation task being considered in this paper, operator O = |ϕ⟩⟨ϕ| q ⊗ |ψ⟩⟨ψ|m, the two items in the variance above can be expanded via the output state representation in Eq. (E.2) as E [⟨O⟩k (+1) ⟨O⟩k (µ) ] = X 1 a,a′ , b,b′=0 ⟨ϕ|a⟩ ⟨a ′ |ϕ⟩ ⟨ϕ|b⟩ ⟨b ′ |ϕ⟩ X s,s ′ , r,r ′ E h ws,a,k(+1)w ∗ s ′ ,a′ ,k(+1)wr,b,k(µ)w ∗ r ′ ,b′ ,k(µ) i (E.29) ×E ⟨ψ|Bs,a⟩ ⟨Bs ′ ,a′|ψ⟩ ⟨ψ|Br,b⟩ ⟨Br ′ ,b′|ψ⟩ . (E.30) The exact calculation of variance above is hard due to the fact that ws,a,k(µ) depends on ds while Bs,a depends on s, instead we consider the lower and upper bounds built from the following basic inequalities. For two sets of N real numbers in increasing order x1 ≤ x2 ≤ · · · ≤ xN and y1 ≤ y2 ≤ · · · ≤ yN , there is a well-known rearrangement inequality Xn j=1 xjyn+1−j ≤ Xn j=1 xσ(j)yj ≤ Xn j=1 xjyj , (E.31) where σ(j) ∈ Sn is an arbitrary permutation of n elements. In general xi can be either positive or negative, so we consider a relaxed version of the bounds only under the assumption that yj > 0 for all j, which decouples the index dependence between x and y. The lower and upper bounds are Xn j=1 xσ(j)yj ≥ Xn j=1 xj≥0 xj y1 + Xn j=1 xj<0 xj yn, (E.32a) Xn j=1 xσ(j)yj ≤ Xn j=1 xj≥0 xj yn + Xn j=1 xj<0 xj y1. (E.32b) 241 Proof. We only show the proof for the lower bound and the upper bound is a natural extension by swapping the minimum y1 and maximum yn. The lower bound in Eq. (E.32a) is obvious from the first inequality in Eq. (E.31) as X j=1 xσ(j)yj ≥ Xn j=1 xjyn+1−j = Xn j=1 xj>0 xjyn+1−j + Xn j=1 xj<0 xjyn+1−j ≥ Xn j=1 xj>0 xjy1 + Xn j=1 xj<0 xjyn. (E.33) E.1.2.1 Preliminary In this part, we introduce some prerequisite lemmas and propositions which are necessary in the evaluation of gradient variance in E.1.3. Recall the definition of the difference sign vector dsℓ = |sℓ − sℓ−1|/2 in Eq. (E.7), it has the following property. Lemma 16 The sum of all elements in difference sign vectors is always even, P PL ℓ=1 dsℓ = +1. Proof. As (−1)dsℓ = sℓ × sℓ−1 for ℓ ≥ 2 and (−1)ds1 = s1 × sL, we have Y L ℓ=1 (−1)dsℓ = (−1) PL ℓ=1 dsℓ = s1 × sL × Y L ℓ=2 (sℓ × sℓ−1) = Y L ℓ=1 s 2 ℓ = 1, (E.34) because sℓ = ±1. Therefore, PL ℓ=1 dsℓ is even and P PL ℓ=1 dsℓ = +1 A direct result from Lemma 16 is about the number of ϕj s in Φs,a satisfies Corollary 17 The number of ϕℓs with nonzero coefficient in Φs,a is Ns,a = ns + 1 − a − (−1)ads1, whose parity satisfies P(Ns,a) = P(a)P(L). Proof. As Ns,a = ns + 1 − a − (−1)ads1, we can see that the parity of Ns,a follows P (Ns,a) = (−1)ns (−1)(−1)ads1 (−1)1−a = (−1)L−1− PL ℓ=2 dsℓ (−1)−ds1 (−1)1−a = (−1)L−a (−1)− PL ℓ=1 dsℓ = P(a)P(L), (E.35) 242 Ts,a Tr,a sin(θℓ/2) cos(θℓ/2) sin(θℓ/2) nsin Ns,a − nsin cos(θℓ/2) Nr,a − nsin NT − nsin Table E.1: The number of θℓ in Ts,a and Tr,a with corresponding form. where in the second equation we rewrite (−1)(−1)ads1 = (−1)−ds1 as the sign of exponent does not change the value, and the last equality is obtained from the Lemma 16. Proposition 18 For two arbitrary different sign vectors s, r uniformly random sampled, the number of elements that dsℓ = drℓ is NT = L − PL ℓ=1 |dsℓ − drℓ |, and the distribution probability is p(NT ) = L NT /(2L−1 − 1) with constraint 0 ≤ NT ≤ L − 2 and P(NT ) = P(L). Proof. For arbitrary two sign vectors s ̸= r, we have 0 ≤ NT ≤ L − 2 due to parity constraint in Lemma 16. As it is an equal prior of dsℓ = 0, 1, the distribution probability is L NT /(2L−1 − 1). Suppose the number of θℓ in Ts,a and Tr,a that are both in the form of sin(θℓ/2) is nsin, and the number of θℓ that are both in the form of cos(θℓ/2) is Ns,a − nsin. On the other hand, the number of θℓ that are in the form of sin(θℓ/2) in Ts,a but cos(θℓ/2) in Tr,a is NT − nsin, and the number for opposite correspondence is Nr,a − nsin. The above statement is summarized in Table. E.1. As the total number θℓ for both Ts,a and Tr,a is L, thus the summation of Table. E.1 should equal to L, Ns,a + Nr,a + NT − 2nsin = L, and the parity relation is 1 = P(L − Ns,a − Nr,a − NT + 2nsin) = P(L)P(Ns,a)P(Nr,a)P(NT ) = P(L)P(NT ), (E.36) where in the second equality we utilize P(Ns,a) = P(Nr,a) from Corollary 17. Proposition 19 For two arbitrary different sign vectors s, r, the number of elements that sℓ = rℓ is ℓ = L − PL ℓ=1 |sℓ − rℓ |/2 with distribution p(ℓ) = L−1 ℓ−1 /(2L−1 − 1) under the constraint 1 ≤ ℓ ≤ L − 1. E.1.2.2 Ensemble average of four-fold product of weights In this part, we evaluate the ensemble average of the four-fold weight which occurs in Eq. (E.30), and discuss its properties. The four-fold weight product in general is ws,a,k(+1)w ∗ s ′ ,a′ ,k(+1)wr,b,k(µ)w ∗ r ′ ,b′ ,k(µ) with µ = ±1, and the ensemble average over all θ and ϕ is Eθ,ϕ h ws,a,k(+1) (θ, ϕ)w ∗ s ′ ,a′ ,k(+1) (θ, ϕ)wr,b,k(µ) (θ, ϕ)w ∗ r ′ ,b′ ,k(µ) (θ, ϕ) i = Eϕ h e i(Φs,a(ϕ)−Φs′ ,a′ (ϕ)+Φr,b(ϕ)−Φr′ ,b′ (ϕ)) i Eθ h Ts,a,k(+1) (θ)Ts ′ ,a′ ,k(+1) (θ)Tr,b,k(µ) (θ)Tr ′ ,b′ ,k(µ) (θ) i . (E.37) The average over ϕ’s is simply zero if there is any ϕℓ left in the phase Φs,a−Φs ′ ,a′+Φr,b−Φr ′ ,b′, otherwise it can be ±1 depending on whether an even number of π presents in the phase. The average with respect to θ is Proposition 20 The ensemble average over θℓs in the four-fold weight product is Eθ h Ts,a,k(+1) (θ)Ts ′ ,a′ ,k(+1) (θ)Tr,b,k(µ) (θ)Tr ′ ,b′ ,k(µ) (θ) i = 1 8 L Y L ℓ=2 ℓ̸=k 2δdsℓ,ds ′ ℓ δdrℓ,dr ′ ℓ + cos h π 2 (dsℓ + ds ′ ℓ − drℓ − dr ′ ℓ ) i × 2δdsk,ds ′ k δdrk,dr ′ k + µ cos h π 2 (dsk + ds ′ k − drk − dr ′ k ) i1−δk,1 × 2δP(a)s1,P(a ′)s ′ 1 δP(b)r1,P(b ′)r ′ 1 + (1 + (µ − 1)δk,1) cos h π 4 (P(a)s1 + P(a ′ )s ′ 1 − P(b)r1 − P(b ′ )r ′ 1 ) i . (E.38) 244 Proof. As the θℓ ’s are independent to each other, it is allowed to handle the average over each θℓ independently, and according to Eq. (E.6), we only need to calculate the average over θ1 and θℓ with ℓ ≥ 2 separately. The ensemble average over θℓ with ℓ ≥ 2 in E h Ts,a,k(+1)Ts ′ ,a′ ,k(+1)Tr,b,k(µ)Tr ′ ,b′ ,k(µ) i is Eθ sin θℓ + πdsℓ 2 + π 4 sin θℓ + πds ′ ℓ 2 + π 4 sin θℓ + πdrℓ 2 + µπ 4 sin θℓ + πdr ′ ℓ 2 + µπ 4 = 1 8 cos h π 2 (dsℓ − ds ′ ℓ + drℓ − dr ′ ℓ ) i + cos h π 2 (dsℓ − ds ′ ℓ − drℓ + dr ′ ℓ ) i + µ cos h π 2 (dsℓ + ds ′ ℓ − drℓ − dr ′ ℓ ) i = 1 8 2 cos π(dsℓ − ds ′ ℓ ) 2 cos π(drℓ − dr ′ ℓ ) 2 + µ cos h π 2 (dsℓ + ds ′ ℓ − drℓ − dr ′ ℓ ) i = 1 8 2δdsℓ,ds ′ ℓ δdrℓ,dr ′ ℓ + µ cos h π 2 (dsℓ + ds ′ ℓ − drℓ − dr ′ ℓ ) i , (E.39) where in the last line we apply the identity that cos(π(x − y)/2) = δx,y for x, y ∈ {0, 1}. For θ1, the average is Eθ sin θ1 2 + (P(a)s1 + 2)π 4 sin θ1 2 + (P(a ′ )s ′ 1 + 2)π 4 sin θ1 2 + (P(b)r1 + µ + 1)π 4 sin θ1 2 + (P(b ′ )r ′ 1 + µ + 1)π 4 = 1 8 cos h π 4 (P(a)s1 − P(a ′ )s ′ 1 + P(b)r1 − P(b ′ )r ′ 1 ) i + cos h π 4 (P(a)s1 − P(a ′ )s ′ 1 − P(b)r1 + P(b ′ )r ′ 1 ) i +µ cos h π 4 (P(a)s1 + P(a ′ )s ′ 1 − P(b)r1 − P(b ′ )r ′ 1 ) i = 1 8 2 cos π(P(a)s1 − P(a ′ )s ′ 1 ) 4 cos π(P(b)r1 − P(b ′ )r ′ 1 ) 4 + µ cos h π 4 (P(a)s1 + P(a ′ )s ′ 1 − P(b)r1 − P(b ′ )r ′ 1 ) i = 1 8 2δP(a)s1,P(a′)s ′ 1 δP(b)r1,P(b ′)r ′ 1 + µ cos h π 4 (P(a)s1 + P(a ′ )s ′ 1 − P(b)r1 − P(b ′ )r ′ 1 ) i , (E.40) 245 where in the last line we apply cos(π(x − y)/4) = δx,y for x, y = ±1. Note that the average over all θℓ ’s depends on the choice of k, more specifically, whether k > 1 or not. One can write out the ensemble average for those two separately, and figure out that the ensemble average over all θℓ ’s can be unified as Eθ h Ts,a,k(+1) (θ)Ts ′ ,a′ ,k(+1) (θ)Tr,b,k(µ) (θ)Tr ′ ,b′ ,k(µ) (θ) i = 1 8 L Y L ℓ=2 ℓ̸=k 2δdsℓ,ds ′ ℓ δdrℓ,dr ′ ℓ + cos h π 2 (dsℓ + ds ′ ℓ − drℓ − dr ′ ℓ ) i × 2δdsk,ds ′ k δdrk,dr ′ k + µ cos h π 2 (dsk + ds ′ k − drk − dr ′ k ) i1−δk,1 × 2δP(a)s1,P(a ′)s ′ 1 δP(b)r1,P(b ′)r ′ 1 + (1 + (µ − 1)δk,1) cos h π 4 (P(a)s1 + P(a ′ )s ′ 1 − P(b)r1 − P(b ′ )r ′ 1 ) i . (E.41) A direct corollary drawn from Eq. (E.38) is the following Corollary 21 If three of the sign vectors s, s ′ , r, r ′ are equal, then the ensemble average in Eq. (E.37) is zero. Proof. If three of the sign vectors are the same, then one can check that 2δdsj ,ds ′ j δdrj ,dr ′ j = 0 and cos h π 2 (dsj + ds ′ j − drj − dr ′ j ) i = cos(±π/2) = 0, which makes the average in Eq. (E.38) be zero. E.1.3 Variance of gradient in state preparation of single-mode CV state In this section, we provide the detailed proof for the bounds of variance of gradient (Ineqs. (5.6) and (5.7) with M = 1 in the main paper) in the preparation of a single-mode CV state |ψ⟩ with target energy ⟨m†m⟩ = Et , while the target state of the ancilla qubit is simply chosen as |0⟩ q . 246 With the qubit target state |0⟩ q , the expansion of items in variance of gradient (see Eq. (E.30)) is reduced to E [⟨O⟩k (+1) ⟨O⟩k (µ) ] = X s,s ′ ,r,r ′ Eθ,ϕ h ws,k(+1)w ∗ s ′ ,k(+1)wr,k(µ)w ∗ r ′ ,k(µ) i Eβ [⟨ψ|Bs⟩ ⟨Bs ′|ψ⟩ ⟨ψ|Br⟩ ⟨Br ′|ψ⟩] , (E.42) where we omit the subscript related to qubit state a, a′ , b, b′ in the expression for simplicity since a = a ′ = b = b ′ = 0. The summation over s, s ′ , r, r ′ can be nonzero in the following four cases: (i) s = s ′ = r = r ′ ; (ii) s − s ′ = r − r ′ = 0 but s ̸= r; (iii) s − r ′ = r − s ′ = 0 but s ̸= r; (iv) s, s ′ , r, r ′unequal. However s − r = s ′ − r ′ = 0 with s ̸= s ′ does not contribute as the average over ϕ becomes E[e 2i(Φs,a−Φs′ ,a) ] = 0 with s ̸= s ′ . From the above analysis, the variance of the gradient shown in Eq. (E.28) becomes Var [∂θk C] = 1 2 E h ⟨O⟩ 2 k (+1) i − E [⟨O⟩k (+1) ⟨O⟩k (−1) ] = 1 2 X s ∆µ n Eθ,ϕ h |ws,k(+1) | 2 |ws,k(µ) | 2 io Eβ | ⟨ψ|Bs⟩ |4 + X s̸=r ∆µ n Eθ,ϕ h |ws,k(+1) | 2 |wr,k(µ) | 2 i + Eθ,ϕ h ws,k(+1)w ∗ r,k(+1)wr,k(µ)w ∗ s,k(µ) io Eβ | ⟨ψ|Bs⟩ |2 | ⟨ψ|Br⟩ |2 + X s,s ′ ,r,r ′ unequal ∆µ n Eθ,ϕ h ws,k(+1)w ∗ s ′ ,k(+1)wr,k(µ)w ∗ r ′ ,k(µ) io Eβ [⟨ψ|Bs⟩ ⟨Bs ′|ψ⟩ ⟨ψ|Br⟩ ⟨Br ′|ψ⟩] (E.43) ≡ 1 2 (S1 + S2 + S3). (E.44) The notation ∆µ{X} ≡ X|µ=1−X|µ=−1 represents the difference of quantity X with µ = 1 and µ = −1, and the difference is only evaluated on the average over θℓ ’s. We introduce {S1, S2, S3} in the last line 247 to denote the three summations in the large parenthesis above for convenience. In the following, we will evaluate them term by term. For S1, the ensemble average of weight from Eq. (E.38) is E h |ws,k(+1) | 2 |ws,k(µ) | 2 i = E h T 2 s,k(+1)T 2 s,k(µ) i = 1 8 L 3 L−2+δk,1 (2 + µ) 1−δk,1 (3 + (µ − 1)δk,1) = 3 L−1 8 L (2 + µ). (E.45) The difference with µ = ±1 is ∆µ n E h |ws,k(+1) | 2 |ws,k(µ) | 2 io = 2 · 3 L−1 8 L . (E.46) For the displacement part, recall that |Bs⟩ = e iχs |s · β⟩ where χs is a pure phase and s · β is a complex Gaussian variable with zero mean and variance E as (s · β) ∼ N C E , and the average over β becomes E | ⟨ψ|Bs⟩ |4 = E | ⟨ψ|s · β⟩ |4 = Eα∼N C E | ⟨ψ|α⟩ |4 ≡ C1, (E.47) where in the second equation |α⟩ is a coherent state with displacement α ∼ N C E , and we denote the average to be correlator C1. As Eqs (E.46) and (E.47) are both independent of s, the summation S1 is simply S1 = X s ∆µ n E h |ws,k(+1) | 2 |ws,k(µ) | 2 io E | ⟨ψ|Bs⟩ |4 = 2L−1 2 · 3 L−1 8 L C1 = 3 L−1 4 L C1. (E.48) 248 The average over qubit rotation angles ϕ, θ in S2 utilizing Eq. E.38 is E h |ws,k(+1) | 2 |wr,k(µ) | 2 i + E h ws,k(+1)w ∗ r,k(+1)wr,k(µ)w ∗ s,k(µ) i = E h T 2 s,k(+1)T 2 r,k(µ) i + E h Ts,k(+1)Tr,k(+1)Tr,k(µ)Ts,k(µ) i = 1 8 L Y L ℓ=2 ℓ̸=k (1 + 2δdsℓ,drℓ ) (2 − µ + 2µδdsk,drk ) 1−δk,1 [2 + (1 + (µ − 1)δk,1) (2δds1,dr1 − 1)] + 1 8 L Y L ℓ=2 ℓ̸=k (1 + 2δdsℓ,drℓ ) (2δdsk,drk + µ) 1−δk,1 [2δds1,dr1 + (1 + (µ − 1)δk,1)] = 1 8 L Y L ℓ=1 ℓ̸=k (1 + 2δdsℓ,drℓ ) (2 − µ + 2µδdsk,drk ) + 1 8 L Y L ℓ=1 ℓ̸=k (1 + 2δdsℓ,drℓ ) (2δdsk,drk + µ) = 2 8 L Y L ℓ=1 ℓ̸=k (1 + 2δdsℓ,drℓ ) (1 + (1 + µ)δdsk,drk ). (E.49) Thus, the difference with respect to µ is ∆µ n E h |ws,k(+1) | 2 |wr,k(µ) | 2 i + E h ws,k(+1)w ∗ r,k(+1)wr,k(µ)w ∗ s,k(µ) io = 4 8 L Y L ℓ=1,ℓ̸=k (1+2δdsℓ,drℓ )δdsk,drk . (E.50) Note that the last delta function above is nonzero only if dsk = drk, and as NT is the total number of elements that satisfy dsℓ = drℓ from Proposition 18, there are NT −1 elements that satisfy dsℓ = drℓ with ℓ ̸= k, making Eq. (E.50) equal to 4 · 3 NT −1/8 L with probability L−1 NT −1 /(2L−1 − 1) from Proposition 18. The summation over all s ̸= r of the difference is X s̸=r ∆µ n E h |ws,k(+1) | 2 |wr,k(µ) | 2 i + E h ws,k(+1)w ∗ r,k(+1)wr,k(µ)w ∗ s,k(µ) io = 2L−1 2 L−1 − 1 L X−2 NT =0 P(NT )=P(L) 4 · 3 NT −1 8 L L−1 NT −1 2 L−1 − 1 = 1 4 − 2 · 3 L−1 − 2 L−1 4 L . (E.51) For the average over displacement β in S2, note that for any two s · β, r · β with s ̸= r, we can always write them as s · β = αz + α1−z and r · β = αz − α1−z where αz, α1−z are complex Gaussian variables obeying distributions N C zE, N C (1−z)E . Note that z = ℓ/L with ℓ being an integer in the range of [1, L − 1]. The displacement average can therefore be simplified to Eβ | ⟨ψ|Bs⟩ |2 | ⟨ψ|Br⟩ |2 = Eβ | ⟨ψ|s · β⟩ |2 | ⟨ψ|r · β⟩ |2 = Eαy∼N C yE "Y 1 h=0 | ⟨ψ|αz + (−1)hα1−z⟩ |2 # ≡ C2(z), (E.52) where in the second equation we take an average over independent variables αz and α1−z with αz ∼ N C zE and α1−z ∼ N C (1−z)E , denoted as αy ∼ N C yE for simplicity. With Eqs. (E.51) and (E.52), we can have bounds for S2 as S2 ≥ X s̸=r ∆µ n E h |ws,k(+1) | 2 |wr,k(µ) | 2 i + E h ws,k(+1)w ∗ r,k(+1)wr,k(µ)w ∗ s,k(µ) io min E | ⟨ψ|Bs⟩ |2 | ⟨ψ|Br⟩ |2 = 1 4 − 2 · 3 L−1 − 2 L−1 4 L min ℓ C2 ℓ L , (E.53) S2 ≤ X s̸=r ∆µ n E h |ws,k(+1) | 2 |wr,k(µ) | 2 i + E h ws,k(+1)w ∗ r,k(+1)wr,k(µ)w ∗ s,k(µ) io max E | ⟨ψ|Bs⟩ |2 | ⟨ψ|Br⟩ |2 = 1 4 − 2 · 3 L−1 − 2 L−1 4 L max ℓ C2 ℓ L , (E.54) where the minimization and maximization are taken over all integers ℓ ∈ [1, L − 1]. The summation S3 involves four different sign vectors, although the total number of summation is about 16L−1 , a large amount of them can be excluded by the averaging over phase. To have E h ws,k(+1)w ∗ s ′ ,k(+1)wr,k(µ)w ∗ r ′ ,k(µ) i nonzero, Φs−Φs ′+Φr−Φr ′ can only be a constant that is independent of any ϕℓ . According to Eq. (E.5), it requires the coefficient for each ϕℓ to be zero, (dsℓ − 1) sℓ − ds ′ ℓ − 1 s ′ ℓ + (drℓ − 1) rℓ − dr ′ ℓ − 1 r ′ ℓ = 0. (E.55) 2 sℓ s ′ ℓ rℓ r ′ ℓ Is Eq. (E.57) satisfied −1 −1 −1 −1 Yes −1 −1 −1 +1 No −1 −1 +1 −1 No −1 +1 −1 −1 No +1 −1 −1 −1 No −1 −1 +1 +1 Yes −1 +1 −1 +1 No +1 −1 −1 +1 Yes Table E.2: A satisfiability test of Eq. (E.57) for all possible combination of sℓ , s ′ ℓ , rℓ , r ′ ℓ with 1 ≤ ℓ ≤ L − 1 up to a global reverse of signs. Note that dsℓ = |sℓ − sℓ−1|/2 = (1 − sℓsℓ−1)/2, then the above constraint can reduce to sℓ(1 − sℓsℓ−1) − s ′ ℓ (1 − s ′ ℓs ′ ℓ−1 ) + rℓ(1 − rℓrℓ−1) − r ′ ℓ (1 − r ′ ℓr ′ ℓ−1 ) = 2(sℓ − s ′ ℓ + rℓ − r ′ ℓ ) ⇒ sℓ − s ′ ℓ + rℓ − r ′ ℓ = −(sℓ−1 − s ′ ℓ−1 + rℓ−1 − r ′ ℓ−1 ), (E.56) where we have used s 2 ℓ = s ′2 ℓ = r 2 ℓ = r ′2 ℓ = 1 to get the last line. As sL−s ′ L+rL−r ′ L = 0, the constraint above becomes sℓ + rℓ − s ′ ℓ − r ′ ℓ = 0, ∀ℓ ∈ [1, L] ∩ N. (E.57) In Table. E.2, we list all the combination of sℓ , s ′ ℓ , rℓ , r ′ ℓ with 1 ≤ ℓ ≤ L − 1 up to a global reverse of signs and test if the constraint Eq. (E.57) is satisfied. A global reverse of all signs of sℓ , s ′ ℓ , rℓ , r ′ ℓ for instance only sℓ = −1 and only sℓ = +1 leads to same satisfiability result. Summarized from Table. E.2, there are 251 only three allowed combination of sℓ , s ′ ℓ , rℓ , r ′ ℓ , therefore the partition of s · β, s ′ · β, r · β, r ′ · β is s · β = αz + αz˜ + α1−z−z˜ s ′ · β = αz + αz˜ − α1−z−z˜ r · β = αz − αz˜ − α1−z−z˜ r ′ · β = αz − αz˜ + α1−z−z˜, (E.58) where αz ∼ N C zE is Gaussian variable, and similar for αz˜ and α1−z−z˜. As s, r, s ′ , r ′ are different, the z, z˜ are limited to z = ℓ1/L, z˜ = ℓ2/L with ℓ1, ℓ2 ∈ [1, L − 2] ∩ N and ℓ1 + ℓ2 ≤ L − 1. The average over displacements in S3 is E [⟨ψ|Bs⟩ ⟨Bs ′|ψ⟩ ⟨ψ|Br⟩ ⟨Br ′|ψ⟩] = Eαy∼N C yE " e i(χs−χs′+χr−χr′ ) Y 1 a=0 ⟨ψ|αz + (−1)aαz˜ + (−1)aα1−z−z˜⟩ ⟨αz + (−1)aαz˜ − (−1)aα1−z−z˜|ψ⟩ # (E.59) ≤ Eαy∼N C yE "Y 1 a=0 | ⟨ψ|αz + (−1)aαz˜ + (−1)aα1−z−z˜⟩ || ⟨αz + (−1)aαz˜ − (−1)aα1−z−z˜|ψ⟩ |# ≡ C3(z, z˜), (E.60) where we upper bound it by E[x] ≤ E[|x|] in the last line. In each class of states under consideration, we will show that C3(z, z˜) leads to higher order terms compared to Eq. (E.47) and (E.52) and thus can be neglected in the asymptotic region of E in later discussion. To conclude, we have the lower and upper bounds for variance of gradient in Eq. (E.44) as Var [∂θk C(x)] = 1 2 3 L−1 4 L C1 + 1 4 − 2 · 3 L−1 − 2 L−1 4 L min ℓ C2 ℓ L + O (C2) ≥ 1 2 3 L−1 4 L C1 + 1 4 − 3 L 4 L min ℓ C2 ℓ L + O (C2), (E.61) 252 and Var [∂θk C(x)] = 1 2 3 L−1 4 L C1 + 1 4 − 2 · 3 L−1 − 2 L−1 4 L max ℓ C2 ℓ L + O (C2) ≤ 1 2 3 L−1 4 L C1 + 1 4 + 2 L−1 4 L max ℓ C2 ℓ L + O (C2), (E.62) where the minimization and maximization are over integers 1 ≤ ℓ ≤ L − 1. The exact expression of correlators C1, C2 in the above bounds depend on the specific target state, and in the following, we evaluate those correlators with different target state |ψ⟩m to provide an insight of their asymptotic behavior, and thus the behavior of the gradient. In subsection E.1.3.1, we consider single-mode Gaussian states; In subsection E.1.3.2, we consider Fock number states. E.1.3.1 Single-mode Gaussian states Suppose the target qumode state is an arbitrary Gaussian (pure) state, as the correlators C1 and C2 only depend on the fidelity between target state |ψ⟩m and a coherent state, analytical evaluation is possible thanks to Refs. [269, 270, 271]. For a brief introduction to Gaussian states, please refer to Methods. As one can see from Eqs. (E.47) and (E.52), the correlators we need to evaluate only depends on the fidelity between target state |ψ⟩m and coherent states, we first evaluate the fidelity between Gaussian state in Eq. (5.23) in the main paper and coherent state |α⟩ from Eq. (5.28) in the main paper. Lemma 22 The fidelity between an arbitrary one-mode Gaussian state |ψ⟩m = D(γ)R(τ )S(ζ)|0⟩m and a coherent state |α⟩m = D(α)|0⟩m is F(ψ, |α⟩) = sech(ζ)e −(1+κ1)(Re{γ}−Re{α}) 2−(1−κ1)(Im{γ}−Im{α}) 2+2κ2(Re{γ}−Re{α})(Im{γ}−Im{α}) , (E.63) where κ1 ≡ cos(2τ ) tanh(ζ) and κ2 ≡ sin(2τ ) tanh(ζ). 253 The correlator C1 is simply the square of fidelity as C Gauss 1 ≡ Eα∼N C E | ⟨ψ|α⟩ |4 = Eα∼N C E F(ψ, |α⟩) 2 = sech2 (ζ)e −R(E)/G1(E) p G1(E) , (E.64) where the last line is obtained from the average over real and imaginary parts of α separately. Here we define G1(x) = 1 + 4x + 4 sech2 (ζ)x 2 (E.65) R(x) = 2|γ| 2 + 4 sech2 (ζ)|γ| 2x + 2 tanh(ζ)|γ| 2 cos(2(φ + τ )), (E.66) where φ = arctan(Im{γ}/ Re{γ}) is the angle of complex number γ. In the asymptotic region of E, one can see that C Gauss 1 ∼ 1/2E. The above C Gauss 1 for coherent state with ζ = 0, τ = 0 and single-mode squeezed vacuum (SMSV) state with γ = 0, τ = 0 is reduced to C Coh 1 = e −2|γ| 2/(1+2E) 1 + 2E , (E.67) C SMSV 1 = sech2 (ζ) q 1 + 4E + 4 sech2 (ζ)E2 . (E.68) The correlator C2 is the product of fidelity between ψ and coherent states |αz ± α1−z⟩ as C Gauss 2 (z) ≡ Eαy∼N C yE "Y 1 h=0 | ⟨ψ|αz + (−1)hα1−z⟩ |2 # = Eαy∼N C yE "Y 1 h=0 F(ψ, |αz + (−1)hα1−z⟩) # = sech2 (ζ)e −R(zE)/G1(zE) p G1(E − zE)G1(zE) . (E.69) 254 In the asymptotic region of E, we also see that C Gauss 2 (z) ∼ 1/4z(1 − z)E2 . For coherent and SMSV states, we also have C Coh 2 (z) = −2|γ| 2/(1 + 2zE) [1 + 2(1 − z)E](1 + 2zE) , (E.70) C SMSV 2 (z) = sech2 (ζ) p G1(E − zE)G1(zE) . (E.71) For C3, we have C Gauss 3 (z, z˜) = Eαy∼N C yE "Y 1 h=0 | ⟨ψ|αz + (−1)hαz˜ + (−1)hα1−z−z˜⟩ || ⟨αz + (−1)hαz˜ − (−1)hα1−z−z˜|ψ⟩ |# = Eαy∼N C yE "Y 1 h=0 q F(ψ, |αz + (−1)hαz˜ + (−1)hα1−z−z˜⟩) q F(ψ, |αz + (−1)hαz˜ − (−1)hα1−z−z˜⟩) # = sech2 (ζ)e −R(zE)/G1(zE) p G1(zE)G1(˜zE)G1[(1 − z − z˜)E] , (E.72) which clearly approaches the scaling of 1/E3 in the asymptotic region of E, and thus can be omitted. We expect that the scaling of C3 can also be generalized to other non-Gaussian states as well, though it may not be easy to solve. The bounds for variance of gradient in preparation of an arbitrary Gaussian state are Var [∂θk C(x)] ≥ 1 2 3 L−1 4 L C Gauss 1 + 1 4 − 3 L 4 L min ℓ C Gauss 2 ℓ L + O 1 E3 , (E.73) Var [∂θk C(x)] ≤ 1 2 3 L−1 4 L C Gauss 1 + 1 4 + 2 L−1 4 L max ℓ C Gauss 2 ℓ L + O 1 E3 , (E.74) 255 where the minimization and maximization is over all integers 1 ≤ ℓ ≤ L − 1. In the asymptotic region of E, as C Gauss 1 and C Gauss 2 shows different scaling, we can find the variance of the gradient is dominated by 1/E when 1/4E2 (3/4)L /6E ∈ O(1) ⇒ E ∈ Ω(1)3 2 4 3 L ∈ Ω(expL), (E.75) or equivalently, L ∈ 1 log(4/3) log O(1)2E 3 ∈ O(log E). (E.76) Depending on the energy, we can classify the CV VQCs in asymptotic E region as shallow and deep circuits. When the circuit is as shallow as L ∈ O (log E), the bounds for variance of gradient is dominated by the first ∼ 1/E term from correlator C Gauss 1 , which are identical and thus describe the variance of the gradient as Var [∂θk C] = 1 6 3 4 L C Gauss 1 + O 1 E2 . (E.77) In the preparation of a nonzero mean Gaussian state, i.e. coherent state, the leading order above brings us a peak of variance at about E ∼ |γ| 2 = Et which is the target state energy, and the variance shows the scaling of 1/E when E ≳ Et . While for zero-mean Gaussian state i.e. SMSV state, Eq. (E.68) monotonically decreases with E, and it can be estimated that when E ≥ cosh(ζ) = √ 1 + Et , the variance of the gradient approaches the scaling of 1/E. 256 On the other hand, when the circuit is as deep as L ∈ Ω(log E), then the bounds of variance are denominated by the second ∼ 1/E2 term from correlator C Gauss 2 , and bounded as 1 8 min ℓ C Gauss 2 ℓ L ≤ Var [∂θk C] ≤ 1 8 max ℓ C Gauss 2 ℓ L . (E.78) In the asymptotic region of E, both sides follow the scaling of 1/E2 , and so as the variance of the gradient itself. For a nonzero mean Gaussian state like coherent state, there is also a peak of variance by solving the extremals of Eq. (E.70), which stays in the range of [Et/2, Et ]. Beyond it, the variance begins to approach 1/E2 . However, for zero-mean Gaussian state like SMSV state, C SMSV 2 (z) in Eq. (E.71) simply decreases with E, and through a comparison of terms involving E3/2 and E2 in the denominator, the scaling of 1/E2 is estimated to start from E ∼ Lcosh ζ = L √ 1 + Et . E.1.3.2 Fock number states For non-Gaussian states, the evaluation of fidelity is in general difficult. In this part, we consider the preparation of a Fock number state with a closed form of fidelity to provide a physical insight on the scaling of variance in preparation of non-Gaussian states. Fock number states form a complete orthonormal basis for the Hilbert space. In this section, we will denote a number state as |Et⟩ F . We begin with lemmas about the distribution of the norm and complex argument angle of a Gaussian complex variable α ∼ N C σ2 . As they are widely known, we simply state the results. Lemma 23 Given a complex Gaussian distributed random variable α ∼ N C σ2 , the square of its norm follows the Gamma distribution |α| 2 ∼ Gamma(1, σ2 ), with probability density function p(|α| 2 ) = e −|α| 2/σ2 /σ2 . The argument arg{α} ≡ tan−1 (Im{α}/ Re{α}) is uniform in [−π, π]. One can further find that the difference of the arguments of two complex Gaussian variables α1, α2 from the same ensemble satisfy the triangular distribution as the following. 257 Corollary 24 For complex Gaussian variables αi ∈ N C σ 2 i with i = 1, 2, the difference of their argument δ ≡ arg{α1} − arg{α2} satisfies the triangular distribution δ ∼ Tri(−2π, 2π, 0) with distribution p(δ) = (2π − |δ|)/4π 2 . With the lemmas in hand, we can obtain the correlator C1 as C Fock 1 ≡ Eα∼N C E | F ⟨Et |α⟩ |4 = Eα∼N C E " e −2|α| 2 (Et !)2 |α| 4Et # = E|α| 2∼Gamma(1,E) " e −2|α| 2 (Et !)2 |α| 4Et # = (2Et)! (2EtEt !)2 (1 + 1/2E) −2Et 1 + 2E . (E.79) Similarly, the correlator C2 becomes C Fock 2 (x) ≡ Eαy∼N C yE "Y 1 h=0 | F ⟨Et |αz + (−1)hα1−z⟩ |2 # = Eαy∼N C yE " e −|αz+α1−z| 2 Et! |αz + α1−z| 2Et e −|αz−α1−z| 2 Et! |αz − α1−z| 2Et # = 1 (Et!)2 E|αy| 2∼Gamma(1,yE),δ∼Tri(−2π,2π,0) h e −2|αz| 2−2|α1−z| 2 |αz| 4 + |α1−z| 4 + 2|αz| 2 |α1−z| 2 − 4|αz| 2 |α1−z| 2 cos2 δ Et i = 1 (Et!)2 E|αy| 2∼Gamma(1,yE) h e −2(|αz| 2+|α1−z| 2 )Eδ∼Tri(−2π,2π,0) h |αz| 4 + |α1−z| 4 − 2|αz| 2 |α1−z| 2 cos(2δ) Et ii = 1 (Et!)2 E|αy| 2∼Gamma(1,yE) e −2(|αz| 2+|α1−z| 2 ) |αz| 2 + |α1−z| 2 2Et 2F1 1 2 , −Et, 1, 4|αz| 2 |α1−z| 2 (|αz| 2 + |α1−z| 2) 2 , (E.80) where 2F1(a, b, c, z) is the hypergeometric function. The integral over |αz| 2 , |α1−z| 2 above is hard to evaluate, but noticing that 0 ≤ 4|αz| 2 |α1−z| 2/(|αz| 2 + |α1−z| 2 ) 2 ≤ 1, 2F1 1 2 , −Et , 1, 1 ≤ 2F1 1 2 , −Et , 1, 4|αz| 2 |α1−z| 2 (|αz| 2 + |α1−z| 2) 2 ≤ 1, (E.81) where the L.H.S. is a constant depending on Et only. The ensemble average in Eq. (E.80) without hypergeometric function is 1 (Et !)2 E|αy| 2∼Gamma(1,yE) h e −2(|αz| 2+|α1−z| 2 ) |αz| 2 + |α1−z| 2 2Et i = (2Et)! (2EtEt !)2 (1 − z)(1 + 2zE) 1 + 1−2z z+2(1−z)zE 2Et − 2(1 − z)zE − z 1 − 2z 1 + 1 2zE −2Et [1 + 2(1 − z)E] (1 + 2zE) . (E.82) Therefore, we have the correlator C Fock 2 as C Fock 2 (z) = η (2Et)! (2EtEt !)2 (1 − z)(1 + 2zE) 1 + 1−2z z+2(1−z)zE 2Et − 2(1 − z)zE − z 1 − 2z 1 + 1 2zE −2Et [1 + 2(1 − z)E] (1 + 2zE) . (E.83) where η equals 2F1(1/2, −Et , 1, 1) in lower bound and 1 in upper bound. In the asymptotic region of E, the long fraction in the middle can be reduced to 1 + 2Et , and thus the correlator becomes C Fock 2 (x) = η (1 + 2Et)(2Et)! (2EtEt !)2 (1 + 1/2zE) −2Et [1 + 2(1 − z)E] (1 + 2zE) . (E.84) The correlator C3 for Fock state is C Fock 3 (z, z˜) = Eαy∼N C yE "Y 1 h=0 | F ⟨Et |αz + (−1)hαz˜ + (−1)hα1−z−z˜⟩ || ⟨αz + (−1)hαz˜ − (−1)hα1−z−z˜| Et⟩ F | # = Eαy∼N C yE "Y 1 h=0 e −|αz+(−1)hαz˜+(−1)hα1−z−z˜| 2/2 √ Et ! |αz + (−1)hαz˜ + (−1)hα1−z−z˜| Et × e −|αz+(−1)hαz˜−(−1)hα1−z−z˜| 2/2 √ Et ! |αz + (−1)hαz˜ − (−1)hα1−z−z˜| 2 !# (E.85) 100 101 102 103 Circuit ensemble energy E 10−10 10−8 10−6 10−4 10−2 Correlator CFock 3 C Fock 3 (1/3, 1/3) 1/E3 Figure E.1: Numerical result of correlator C Fock 3 (z, z˜) for Fock state in Eq. (E.85). Here we choose z = ˜z = 1/3. The dashed-dot line is 1/E3 for reference. It is hard to solve it analytically, instead, we perform Monte Carlo calculation to show its asymptotic scaling in Fig. E.1. Here we choose z = ˜z = 1/3. It clearly shows that in the asymptotic region of E, the scaling of C Fock 3 (z, z˜) as ∼ 1/E3 (dashed-dot line), which is a higher order term compared to C Fock 1 and C Fock 2 (z), and can be omitted as well. Combining Eqs. (E.79) and (E.84), we have the bounds for variance of gradient in preparation of a Fock state as Var [∂θk C(x)] ≥ 1 2 3 L−1 4 L C Fock 1 + 1 4 − 3 L 4 L min ℓ C Fock 2 ℓ L + O 1 E3 , (E.86) Var [∂θk C(x)] ≤ 1 2 3 L−1 4 L C Fock 1 + 1 4 + 2 L−1 4 L C Fock 2 1 − 1 L + O 1 E3 . (E.87) Similar to the discussion about Gaussian state preparation, we can also identify the critical E for scaling transition from 1/E2 to 1/E with fixed L at η(1 + 2Et)/4E2 (3/4)L /6E ∈ O(1) ⇒ E ∈ Ω(1)3η(1 + 2Et) 2 4 3 L ∈ Ω (expL). (E.88) 260 Or equivalently, we have L ∈ 1 log(4/3) log O(1) 2E 3η(1 + 2Et) ∈ O(log E). (E.89) When the circuit depth is as shallow as L ∈ O (log E), the bounds for variance of gradient is dominated by the first ∼ 1/E term from correlator C Fock 1 , which are identical and thus describe the variance of the gradient as Var [∂θk C] = 1 6 3 4 L C Fock 1 + O 1 E2 . (E.90) The peak of variance can also be found at E ∼ Et . On the other hand, when the circuit depth is as deep as L ∈ Ω(log E), then the bounds of variance is denominated by the second ∼ 1/E2 term from correlator C Fock 2 as 1 8 min ℓ C Fock 2 ℓ L ≤ Var [∂θk C] ≤ 1 8 C Fock 2 1 − 1 L , (E.91) where the coefficient η in C Fock 2 on L.H.S. and R.H.S of inequality is chosen to be 2F1(1/2, −Et , 1, 1) and 1 separately. In the asymptotic region, the variance also follows the scaling 1/E2 and the peak is in the region [Et/2, Et ]. E.1.4 Variance of gradient in preparation of multimode CV states In this section, we show that the results in the single-mode case in E.1.3 generalize to the variance of the gradient in preparation of an arbitrary multimode CV state |ψ⟩m. Lemma 6 in the main paper states that one can achieve universal control on multiple modes and one qubit by applying the set of ECD gates and single qubit rotations. Therefore, we consider a ladder setup of gates in circuits, as shown in Fig. E.2. In 261 1 …… 2 …… …… qubit qumodes 1,1 1 1 2,2 1 2 , 1 …… −+1,−+1 1 2 , −+2,−+2 , 1st Layer th Layer Figure E.2: Scheme of M-mode L-layer CV VQC. Cyan boxes with θℓ , ϕℓ ranging from 1 ≤ ℓ ≤ ML represents the qubit rotation UR(θℓ , ϕℓ); Pink boxes with β (j) ℓ denotes the ECD gate U (j) ECD(β (j) ℓ ) applying on the qubit and jth mode. the following, we use superscript in A(j) to denote the operator A that applies to all qumode trivially other than jth mode. We begin the analyses by generalizing the state representation in E.1.1 to the multimode case. To simplify the notation, we define β (j) = (β (j) 1 , . . . , β(j) L ) T , θ = (θ1, . . . , θML) T , ϕ = (ϕ1, . . . , ϕML) T and the overall parameters x = ({β (j)}M j=1, θ, ϕ). We also denote m = (m1, · · · , mM) as all modes. For an M-mode system, each of the L layers involves M single qubit rotations and ECD gates applied between the control qubit and the modes mj from j = 1 to M (see the set of gates surrounded by the red dashed box in Fig. E.2). The corresponding variational parameters in an L-depth circuit are {θℓ}ML ℓ=1, {ϕℓ}ML ℓ=1 and SM j=1{β (j) ℓ } L ℓ=1 with superscript (j) denoting the jth mode as explained above. The unitary of the M-mode L-depth circuit in Fig. E.2 is U = Y L ℓ=1 Y M j=1 U (j) ECD β (j) ℓ UR(θM(ℓ−1)+j , ϕM(ℓ−1)+j ) = Y L ℓ=1 Y M j=1 e iϕM(ℓ−1)+j sin θM(ℓ−1)+j 2 D(j) (−β (j) ℓ ) cos θM(ℓ−1)+j 2 D(j) (−β (j) ℓ ) cos θM(ℓ−1)+j 2 D(j) (β (j) ℓ ) e i(π−ϕM(ℓ−1)+j ) sin θM(ℓ−1)+j 2 D(j) (β (j) ℓ ) , (E.92) 262 where the block matrix representation above is adopted from Eq. (E.1). Note that the displacement operator D(j) (·) acts on all M modes, where the jth mode has a displacement while the rest are trivial identity. The output state of unitary U (M) L on initial state |0⟩ q ⊗M j=1 |0⟩mj is |ψ(θ, ϕ, {β (j) } M j=1)⟩ q,m ≡ U |0⟩ q ⊗ M j=1 |0⟩mj = X 1 a=0 X s ws,a(θ, ϕ)|a⟩ q e i PM j=1 χs (j) (β (j) )O M j=1 |(−1)a s (j) · β (j) ⟩mj = X 1 a=0 X s ws,a(θ, ϕ)|a⟩ q |Bs,a⟩m , (E.93) where s is a length-ML sign vector as s = (s1:ML−1, −1) with s1:ML−1 ∈ {−1, 1}ML−1 , defined in the same way as s in Eq. (E.2). The weight ws,a is defined in terms of s and a in the same way as in Eq. (E.2) via replacing L → ML. Here s (j) is the corresponding sign vector for jth mode, which is easily generated by collecting all (M(ℓ − 1) + j)th with ℓ ∈ [1, L − 1] elements of s in order as s (j) = sj , sM+j , . . . , s(L−1)M+j , (E.94) where sj denotes the jth element of the whole length-ML sign vector s. Note that the sign vectors for all modes {s (j)}M j=1 together form a partition of s. Inversely, another explicit way to generate all {s (j)}M j=1 is s (j) ∈ {−1, 1} L , if 1 ≤ j ≤ M − 1, (E.95) s (M) = (s (M) 1:L−1 , −1), where s (M) 1:L−1 ∈ {−1, 1} L−1 . (E.96) 26 and join them together in the inverse way of partition to construct the whole sign vector s. The displacement Bs (j) ,a for each mode is defined the same as in Eq. (E.2), and the state on all qumodes in Eq. (E.93) is defined as |Bs,a⟩m ≡ e i PM j=1 χs (j) ⊗ M j=1 |(−1)a s (j) · β⟩mj (E.97) for convenience. If we define vs,a(θ, ϕ, β) ≡ e i PM j=1 χs (j) (β (j) )ws,a(θ, ϕ), we have |ψ(x)⟩ q,m = X 1 a=0 X s vs,a(θ, ϕ, β)|a⟩ q O M j=1 |(−1)a s (j) · β (j) ⟩mj , (E.98) which generalizes Eq. (5.3) in the main paper. To conclude, the correspondance between Eq. (E.2) and (E.93) indicates a map from the M-mode state generated by UL to a single mode state generated by UML M : ψL,M (θ, ϕ, {β (j) } M j=1) → ψML,1(θ, ϕ, β). (E.99) The proof is easy to see from an explicit example of L = 1 and M = 2 and then generalize by mathematical induction, which is same as in E.1.1. The output state of M = 2 modes and L = 1 circuit as |ψ⟩ = e iϕ2 sin θ2 2 D(2)(−β (2)) cos θ2 2 D(2)(−β (2)) cos θ2 2 D(2)(β (2)) e i(π−ϕ2) sin θ2 2 D(2)(β (2)) e iϕ1 sin θ1 2 D(1)(−β (1)) cos θ1 1 D(1)(−β (2)) cos θ1 2 D(1)(β (1)) e i(π−ϕ1) sin θ1 2 D(1)(β (1)) × |0⟩m1 |0⟩m2 0 (E.100) = |0⟩ q ⊗ e i(ϕ1+ϕ2) sin θ1 2 sin θ2 2 |−β (1)⟩m1 |−β (2)⟩m2 + cos θ1 2 cos θ2 2 |β (1)⟩m1 |−β (2)⟩m2 + |1⟩ q ⊗ e iϕ1 sin θ1 2 cos θ2 2 |−β (1)⟩m1 |β (2)⟩m2 + e i(π−ϕ2) cos θ1 2 sin θ2 2 |β (1)⟩m1 |β (2)⟩m2 , (E.101) 264 which indicates a clear mapping to the state |ψL=2,M=1(θ, ϕ, β)⟩ shown in Eq. (E.10). For energy regularization, we still have the displacement in every ECD gate Gaussian distributed, β (j) ℓ ∼ N C E/L, and thus the ensemble-averaged energy per mode is also E ⟨m † jmj ⟩ = E. We still consider the gradient with respect to the kth qubit rotation angle θk. For a general M-mode operator, it is easy to check that the ensemble average of gradient is still zero, and the variance can be written in the same form as in Eq. (E.28). Aligned with the study in E.1.3, the target state of control qubit is also |0⟩ q and Eq. (E.42) becomes E [⟨O⟩k (+1) ⟨O⟩k (µ) ] = X s,s ′ ,r,r ′ E h ws,k(+1)w ∗ s ′ ,k(+1)wr,k(µ)w ∗ r ′ ,k(µ) i E [⟨ψ|Bs⟩ ⟨Bs ′|ψ⟩ ⟨ψ|Br⟩ ⟨Br ′|ψ⟩] . (E.102) Via the mapping from ψE,L,M to ψE,ML,1, the variance in the multimode scenario has the same form with Eq. (E.44) when one replaces the single-mode correlators with the multimode correlators. We discuss them in the following. For C1, each s (j) · β (j) ∼ N C E is in Gaussian distribution, and we have C1 = E | ⟨ψ|Bs⟩ |4 = Eα∼N C E | ⟨ψ| ⊗ M j=1 |αj ⟩ | 4 . (E.103) Note that the ensemble average is performed over every {αj}M j=1 independently. For correlator C2, we can still have for each s (j) · β = αzj + α1−zj and r (j) · β = αzj − α1−zj , thus C2 can be written as C2(z) = E | ⟨ψ|Bs⟩ |2 | ⟨ψ|Br⟩ |2 = Eαy∼N C yE "Y 1 a=0 | ⟨ψ| ⊗ M j=1 |αzj + (−1)aα1−zj ⟩ | 2 # , (E.104) where we define z = (z1, . . . , zM). However, unlike the one-mode case, in general it is possible that some of the elements in z is zero as long as at least one element of 1−z is nonzero (1 = (1, . . . , 1) is a length-M 2 vector), such that s ̸= r. Suppose the number of elements in z within (0, 1) is Nz, then the probability of Nz = M is p (Nz = M) = (2ML−1 )(2L − 2)M−1 (2L−1 − 1) (2ML−1)(2ML−1 − 1) = (2L − 2)M 2ML − 2 . (E.105) The probability is exponentially approaching unity as L increases, at a fixed value of M. We will discuss the consequence of the exponential scaling after we show the correlator’s scaling of some typical states. The last correlator C3 is E [| ⟨ψ|Bs⟩ || ⟨Bs ′|ψ⟩ | ⟨ψ|Br⟩ | ⟨Br ′|ψ⟩ |] = E h ⟨ψ| ⊗ M j=1 |s (j) · β (j) ⟩ ⊗ M j=1 ⟨s ′(j) · β (j) | |ψ⟩ ⊗ M j=1 |r (j) · β (j) ⟩ ⊗ M j=1 ⟨r ′(j) · β (j) | |ψ⟩ i = Eαy∼N C yE "Y 1 a=0 ⟨ψ| ⊗ M j=1 |αzj + (−1)aαz˜j + (−1)aα1−zj−z˜j ⟩ ⊗ M j=1 ⟨αzj + (−1)aαz˜j − (−1)aα1−zj−z˜j | |ψ⟩ # ≡ C3(z, z˜), (E.106) where we utilize the same method as in Eq. (E.60). Similar to the discussion for C2 above, it is also possible that some elements of z, z˜ are zeros, as long as there are at least one nonzero element in z˜, 1 − z − z˜ so that s, s ′ , r, r ′ different from each other. Its scaling is also left to later discussion. We then have the bound for variance of gradient as Var [∂θk C(x)] = 1 2 3ML−1 4ML C1 + 1 4 − 2 · 3ML−1 − 2ML−1 4ML min z C2(z) + O (C2) ≥ 1 2 3ML−1 4ML C1 + 1 4 − 3ML 4ML min z C2(z) + O (C2), (E.107) 2 and Var [∂θk C(x)] = 1 2 3ML−1 4ML C1 + 1 4 − 2 · 3ML−1 − 2ML−1 4ML max {x(j)} C2(z) + O (C2) ≤ 1 2 3ML−1 4ML C1 + 1 4 + 2ML−1 4ML max z C2(z) + O (C2), (E.108) where we have omitted the contribution of C3 in the asymptotic region of E ≫ 1. Note that the coefficient ahead of each correlator is exactly the same as in Eqs. (E.61) and (E.62) by replacing L → ML suggested by the map in Eq. (E.99). We consider the asymptotic region where the circuit ensemble energy per mode is larger than the target state energy at any mode, E ≥ maxj ⟨ψ| m † jmj |ψ⟩. In general, the above correlators are hard to evaluate, we obtain insights to their properties in two examples. E.1.4.1 Product states A simple example to begin with is the product state with no correlation between any modes, |ψ⟩m = ⊗M j=1 |ψj ⟩mj , where |ψj ⟩mj is the state of jth mode. We show that it is directly related to the one-mode correlators. In this case, C1 reduces to a product form, C Prod 1 = Eαj∼N C E | ⊗ M j=1mj ⟨ψj | ⊗ M j=1 |αj ⟩ | 4 = Y M j=1 Eαj∼N C E |mj ⟨ψj |αj ⟩ |4 . (E.109) Note that the ensemble average of the term inside parentheses is the correlator C1 for one mode CV state in Eq. (E.47), which has been shown to have the scaling of 1/E in the asymptotic region. Therefore, we have C1 ∼ 1/EM for an M-mode product state. 2 C2(z) reduces to C Prod 2 (z) = Eαy∼N C yE "Y 1 a=0 | ⊗ M j=1 ⟨ψj | ⊗ M j=1 |αzj + (−1)aα1−zj ⟩ | 2 # = Y M j=1 Eαy∼N C yE "Y 1 a=0 | ⟨ψj |αzj + (−1)aα1−zj ⟩ |2 #! . (E.110) Note that for a specific zj , if zj ∈ (0, 1), the corresponding term inside the parenthesis of Eq. (E.110) is single-mode correlator C2 whereas if zj = 0, 1 the corresponding one becomes C1. Suppose the number of elements in z within the range of (0, 1) is Nz, then the scaling of C Prod 2 is ∼ 1/EM+Nz . According to Eq. (E.105), the probability that Nz = M is p = (2L − 2)M/(2ML − 2) ∼ 1 − 1/2 L. Similarly, C3 becomes C Prod 3 (z, z˜) = Eαy∼N C yE "Y 1 a=0 ⟨ψ| ⊗ M j=1 |αzj + (−1)aαz˜j + (−1)aα1−zj−z˜j ⟩ ⊗ M j=1 ⟨αzj + (−1)aαz˜j − (−1)aα1−zj−z˜j | |ψ⟩ # = Y M j=1 Eαy∼N C yE "Y 1 a=0 ⟨ψj |αzj + (−1)aαz˜j + (−1)aα1−zj−z˜j ⟩ ⟨αzj + (−1)aαz˜j − (−1)aα1−zj−z˜j |ψj ⟩ #! , (E.111) For convenience, we denote the term inside the parenthesis of Eq. (E.111) to be C (j) . For a specific combination of zj , z˜j , 1 − zj − z˜j , if all of them are within (0, 1), C (j) is the single-mode correlator C3 ∼ 1/E3 in Eq. (E.60), on the other hand, if only two of them are in (0, 1) while the other is zero, C (j) becomes C2 ∼ 1/E2 , furthermore if only one of the three is nonzero C (j) is just C1 ∼ 1/E. Therefore, given different z, z˜, Eq. (E.111) can have the scaling of 1/Eν with integer ν ∈ [M, 3M]. As shown in Table. E.2, there are only three allowed assignments for each element of s, s ′ , r, r ′ , thus for jth mode, the probability of C (j) ∼ 1/Eνj where νj ∈ {1, 2, 3} is p(C (j) ∼ 1/Eνj ) = 1/3 L−1 , if νj = 1 2 L/3 L−1 , if νj = 2 1 − (1 + 2L)/3 L−1 , otherwise. (E.112) Suppose the number of νj in {νj}M j=1 being 1, 2 is n1, n2, then the probability of PM j=1 νj ≤ νc is p X M j=1 νj ≤ νc = X M n1,n2≥0, n1+n2≤M, n1+2n2+3(M−n1−n2)≤νc M! n1!n2!(M − n1 − n2)! 1 3 L−1 n1 2 L 3 L−1 n2 3 L−1 − 2 L − 1 3 L−1 M−n1−n2 (E.113) Setting νc = 2M allows us to determine the probability that C Prod 3 is not of a higher order than C Prod 2 . Analytical calculation is hard, and we show a numerical calculation result in Fig. E.3. The exponential suppression of probability for non-higher order indicates that C Prod 3 can be thought as higher order terms compared to C Prod 1 and C Prod 2 , as we did in the single mode case. To the end of product state section, we discuss the criterion for shallow and deep circuits. Recall the probability that C Prod 2 ∼ 1/E2M is p ∼ 1 − 1/2 L. For shallow circuits, the leading order is 1/EM under the constraint 1/4E2M (3/4)ML/3EM ∈ O(1) and (1−p)/4EM+1 (3/4)ML/3EM ∈ O(1), resulting in the condition L ∈ O(log E). On the other hand, for deep circuits the variance of the gradient is in 1/E2M under the condition 1/4E2M (3/4)ML/3EM ∈ Ω(1) and (1−p)/4EM+1 1/4E2M ∈ O(1) leading to L ∈ Ω(log E). 269 5 10 15 L 10−25 10−20 10−15 10−10 10−5 100 P r(2 M ≥ P M j=1 νj ) M = 2 M = 4 M = 6 M = 8 M = 10 Figure E.3: The probability of PM j=1 νj ≤ 2M in Eq. (E.113) versus different circuit depth L and modes number M. E.1.4.2 Multimode Gaussian states In this part, we present the results for the preparation of an arbitrary multimode Gaussian state, either entangled or not. We consider the target state to be an M-mode Gaussian state described by mean quadrature X m and CM Vm for simplicity. As all correlators are functions of fidelity between target state and a product of coherent states, we begin with the phase space representation of a product of coherent state ⊗M j=1 |αj ⟩mj , with quadrature mean and covariance matrix given as X = 2(Re{α1},Im{α1}, . . . , Re{αM},Im{αM}) T ≡ 2 (ξ1, ξ2, . . . , ξ2M−1, ξ2M) T = 2ξ, (E.114) Vcoh = I, (E.115) 270 where we denote real and imaginary parts of all αj in a unified vector ξ with each element in the distribution NE/2 . The CM Vcoh is a 2M × 2M identity matrix. Similarly, we can define X m = 2ξm. Via applying the general fidelity formula in Eq. (5.28) in the main paper, we can find that C1 becomes C Gauss 1 = Eα∼N C E | ⟨ψ| ⊗ M j=1 |αj ⟩ | 4 = Eα∼N C E F(|ψ⟩m , ⊗ M j=1 |αj ⟩) 2 = Z dξ 4M det(Vm + I) e −4(ξ−ξm) T (Vm+I) −1 (ξ−ξm) 1 (πE)M e −ξ T ξ/E = 4M (πE)M det(Vm + I) Z dξ exp −ξ T 4(Vm + I) −1 + I E ξ + 2ξm T 4(Vm + I) −1 ξ − 4ξm T (Vm + I) −1 ξm = 4M det(K)e −4ξm T Kξm (πE)M Z dξ exp −ξ T (4K + I/E)ξ + 2ξ ′T mξ (E.116) = 4Me −4ξm T Kξm (πE)M det(K) −1 Z dξ exp −(ξ − (4K + I/E) −1 ξ ′ m) T (4K + I/E)(ξ − (4K + I/E) −1 ξ ′ m) + ξ ′T m(4K + I/E) −1 ξ ′ m , (E.117) where in the second to last line we denote K = (Vm + I) −1 and ξ ′ m = 4Kξm for convenience. In the last line, we write the exponent to complete the square. As 4K + I/E is a symmetric real matrix, we can diagonalize it via an orthogonal matrix Q as 4K + I/E = QTΛQ, with Λ = diag(λ1, . . . , λ2M) to be a diagonal matrix. We can do a variable transformation ˜ξ = Q(ξ − K−1ξ ′ m), then the integral is reduced to Z d ˜ξ ∂ξ ∂ ˜ξ exp h −˜ξ TΛ˜ξ i = Z d ˜ξ exp " − X 2M i=1 λi ˜ξ 2 i # = πM p det(4K + I/E) , (E.118) and thus the correlator C Gauss 1 becomes C Gauss 1 = 4M det(K) exp −4ξ T m K − 4K(4K + I/E) −1K ξm p det(4K + I/E)EM . (E.119) 27 In the evaluation of C2, there are both ⊗M j=1 |αzj ± α1−zj ⟩, which can also be characterized by X ± = 2(ξz ± ξ1−z) and V± = I. The distribution of ξz is p(ξz) = exp −ξ T z Szξz p det{S}z /πM with Sz = ⊕M j=1I2/(zjE), where I2 is a 2 × 2 identity matrix. The correlator C2 becomes C Gauss 2 (z) = Eαy∼N C yE "Y 1 h=0 | ⟨ψ| ⊗ M j=1 |αzj + (−1)hα1−zj ⟩ | 2 # = Eαy∼N C yE "Y 1 h=0 F(|ψ⟩m ⊗ M j=1 |αzj + (−1)hα1−zj ⟩ # = Z dξzdξ1−z 2M p det(Vm + I) e −2(ξz+ξ1−z−ξm) T (Vm+I)−1 (ξz+ξ1−z−ξm) × 2M p det(Vm + I) e −2(ξz−ξ1−z−ξm) T (Vm+I)−1 (ξz−ξ1−z−ξm) × p det{S}z πM e −ξz T Szξz q det{S}1−z πM e −ξ1−z T S1−zξ1−z (E.120) = 4M det(K) q det{S}z det{S}1−z e −4ξ TmKξm π 2M Z dξzdξ1−ze −ξ T z (4K+Sz)ξz+2ξ ′Tmξz−ξ T 1−z (4K+S1−z)ξ1−z (E.121) = 4M det(K) q det{S}z det{S}1−z e −4ξ TmKξm+ξ ′Tm(4K+Sz)−1ξ ′Tm π 2M q det{S}z det{S}1−z × Z d ˜ξzd ˜ξ1−z exp h −˜ξ T z Λzξ˜z i exp h −˜ξ T 1−zΛ1−zξ˜1−z i (E.122) = 4M det(K) q det{S}z det{S}1−z e −4ξ TmKξm+ξ ′Tm(4K+Sz)−1ξ ′Tm π 2M q det{S}z det{S}1−z πM p det(4K + Sz) πM p det(4K + S1−z) = 4M det(K) exp −4ξ T m K − 4K(4K + Sz) −1K ξm p det(4K + Sz) det(4K + S1−z) 1 hQM j=1 zj (1 − zj ) i E2M (E.123) where in Eq. (E.121) we denote K and ξ ′ m in the same way as Eq. (E.117). In Eq. (E.122), we apply the same diagonalization method we use in the derivation of Eq. (E.119), where Λz = Qz(4K + Sz)QT z and so as Λ1−z, ˜ξz = Qz(ξz − (4K + Sz) −1ξ ′ m) and ˜ξ1−z = Q1−zξ1−z. Similar to the discussion of C Prod 2 , 272 if there are M − Nz elements in z that are either zero or one, then C Gauss 2 (z) ∼ 1/EM+Nz due to the fact that a Gaussian distribution with a zero variance in Eq. (E.120) becomes a Dirac-delta function. The last correlator left is C Gauss 3 (z1, z2) C Gauss 3 (z, z˜) = Eαy∼N C yE "Y 1 h=0 ⟨ψ| ⊗ M j=1 |αzj + (−1)hαz˜j + (−1)hα1−zj−z˜j ⟩ × ⊗ M j=1 ⟨αzj + (−1)hαz˜j − (−1)hα1−zj−z˜j | |ψ⟩ = 4M det(K) Z dξzdξz˜dξ1−z−z˜ e −(ξz+ξz˜+ξ1−z−z˜−ξm) T K(ξz+ξz˜+ξ1−z−z˜−ξm) e −(ξz+ξz˜−ξ1−z−z˜−ξm) T K(ξz+ξz˜−ξ1−z−z˜−ξm) × e −(ξz−ξz˜−ξ1−z−z˜−ξm) T K(ξz−ξz˜−ξ1−z−z˜) e −(ξz−ξz˜+ξ1−z−z˜−ξm) T K(ξz−ξz˜+ξ1−z−z˜−ξm) × p det{S}z πM e −ξz T Szξz p det{S}z˜ πM e −ξz˜ T Sz˜ξz˜ q det{S}1−z−z˜ πM e −ξ1−z−z˜ T S1−z−z˜ξ1−z−z˜ = 4M q det{S}z det{S}z˜ det{S}1−z−z˜ e −4ξ TmKξm π 3M det(K) −1 × Z dξzdξz˜dξ1−z−z˜e −ξ T z (4K+Sz)ξz+2ξ ′Tmξz−ξ T z˜ (4K+Sz˜)ξz˜−ξ T 1−z−z˜ (4K+S1−z−z˜)ξ1−z−z˜ = 4M q det{S}z det{S}z˜ det{S}1−z−z˜ e −4ξ TmKξm+ξ ′Tm(4K+Sz) −1ξ ′Tm π 3M det(K) −1 × Z d ˜ξzd ˜ξz˜d ˜ξ1−z−z˜e −ξ˜zΛzξ˜z e −ξ˜z˜Λz˜ξ˜z˜ e −ξ˜1−z−z˜Λ1−z−z˜ξ˜1−z−z˜ (E.124) = 4M q det{S}z det{S}z˜ det{S}1−z−z˜ e −4ξ TmKξm+ξ ′Tm(4K+Sz) −1ξ ′Tm π 3M det(K) −1 πM p det(4K + Sz) πM p det(4K + Sz˜) πM p det(4K + S1−z−z˜) = 4M det(K) exp −4ξ Tm K − 4K(4K + Sz) −1K ξm p det(4K + Sz) p det(4K + Sz˜) p det(4K + S1−z−z˜) 1 hQM j=1 zj z˜j (1 − zj − z˜j ) i E3M , (E.125) where in Eq. (E.124) we do the same diagonalization method as in Eq. (E.121), with Λz˜ = Qz˜(4K+Sz˜)QT z˜ and ˜ξz˜ = Qz˜ξξ˜, and so as Λ1−z−z˜, ˜ξ1−z−z˜. Following the same analysis from Eq. (E.113), the bulk contribution of C Gauss 3 (z, z˜) behaves as 1/Eν with ν > 2M, and thus can be omitted as they are higher order in E when E is large. In the asymptotic limit of E, we can omit the contribution of I/E in C Gauss 1 and Sz,S1−z in C Gauss 2 compared to 4K, and thus the determinants in those correlators reduce to constants independent of E, which directly leads to the scaling of 1/EM and 1/E2M separately. Therefore, the critical depth between 2 shallow and deep circuits in preparation of a general M-mode Gaussian state is also determined by the logarithm of circuit ensemble energy. In the following, we present an explicit example of two-mode squeezed vacuum (TMSV) state. The CM has been introduced in Methods. Through the calculation of Eqs. (E.119) and (E.123), one can directly find the C1 and C2 for TMSV state as C TMSV 1 = sech4 (ζ) G1(E) , (E.126) C TMSV 2 (z) = sech4 (ζ) G2(zE)G2[(1 − z)E] , (E.127) where we have defined G2(z) = 1 + 2∥z∥1 + 4 sech2 (ζ) Y j zj . (E.128) Here x is a vector and 1 above is an identity vector with same length as x. In the asymptotic region of large E, we also have C TMSV 2 (z) ∼ 1/64z1z2(1−z1)(1−z2)E4 . Note that both correlators monotonically decrease with E. To summarize, for shallow circuits L ∈ O(log E), the variance of the gradient is Var ∼ 1/EM; while for deep ones L ∈ Ω(log E), it becomes ∼ 1/E2M in the asymptotic limit of E. E.2 Supplemental Material for Section. 5.2 E.2.1 Entanglement measures The evaluation of distillable entanglement in general requires a regularization of an infinite number of copies; therefore, as explained in the main text, we consider the lower and upper bounds of it, RCI and 274 EoF [336, 337]. For a bipartite quantum system with A and B in a state ρˆAB, if classical communication is allowed, then the rate of entanglement generation is lower bounded by RCI [309] IR(ˆρAB) = max{S (ˆρB), S (ˆρA)} − S (ˆρAB), (E.129) where S(ˆρ) = − Tr{ρˆlog2 ρˆ} is the von Neumman entanglement entropy of state ρˆ and ρˆA = TrB ρˆAB is the reduced density matrix of subsystem A and similarly for ρˆB. For a symmetric two-mode Gaussian state characterized by the covariance matrix V = 1 2 aI2 cZ2 cZ2 bI2 , (E.130) in the standard form, its symplectic eigenvalues are simply given as ν± = [√y ± (b − a)]/2 with y ≡ (a + b) 2 − 4c 2 [249]. The entanglement entropy is S(ˆρ) = g(ν+) + g(ν−), where g(x) ≡ x + 1 2 log2 x + 1 2 − x − 1 2 log2 x − 1 2 . (E.131) For the microwave modes in Eq. (5.37) of the main text, its symplectic eigenvalues are equal ν± = s u u − v 2 w , (E.132) and the entropy of the entire system is S(ˆρmm) = 2g s u u − v 2 w ! . (E.133) 275 The RCI is thus IR(ˆρmm) = g u − v 2 2w − 2g s u u − v 2 w ! . (E.134) On the other hand, EoF quantifies the resources that are required to create a quantum state in terms of the number of Bell pairs (or “ebit”), which is formally defined as [286] Ef (ˆρ) = min i X i piS(|ψi⟩), (E.135) where the minimum is taken over all possible pure state decomposition of state ρˆ as ρˆ = P i pi |ψi⟩ ⟨ψi |. Specifically, for a pure state, both RCI and EoF reduce to entanglement entropy. EoF is in general subadditive, Ef (ˆρ1 ⊗ ρˆ2) ≤ Ef (ˆρ1) + Ef (ˆρ2). (E.136) EoF is known to be analytically solvable for the symmetric two-mode Gaussian state ρˆ with the covariance matrix V in Eq. (E.130) when a = b [338] as Ef (ˆρ) = h( √ ν1ν2) if ν1ν2 < 1, 0 otherwise, (E.137) where ν1, ν2 are the first two eigenvalues of V in increasing order and h(x) is defined as h(x) ≡ (1 + x) 2 4x log2 (1 + x) 2 4x − (1 − x) 2 4x log2 (1 − x) 2 4x . (E.138) 276 For the noisy entangled microwave modes (see Eq. (5.37) of the main text), we have ν1 = ν2 = u − v 2 w and EoF is Ef (ˆρm,m) = h u − v 2 w . (E.139) EoF is analytically solvable for arbitrary two-qubit states [339] and we explain it in the following. We first introduce the spin-flipped state of ρˆ as ρˆ˜ = (ˆσy ⊗ σˆy)ˆρ ∗ (ˆσy ⊗ σˆy), (E.140) where ρˆ ∗ is the complex conjugate of ρˆ. Define the function E(x) as E(x) = H 1 + √ 1 − x 2 2 ! , (E.141) where H(x) ≡ −x log2 (x) − (1 − x) log2 (1 − x) is the binary entropy. Note that the E function monotonically increases with x from 0 to 1. The EoF for an arbitrary two-qubit state ρˆ is thus Ef (ˆρ) = E [δ (ˆρ)] , (E.142) where δ(ˆρ) ≡ max{0, v1−v2−v3−v4} and {vi} 4 i=1 are the square root of eigenvalues of ρˆρˆ˜ in decreasing order. In the main text, we evaluate the entanglement measures for the noisy two-mode squeezed vacuum in Eq. (5.37) of the main text. When ζo = ζm = 1, ρˆm,m is in a pure TMSV state with mean photon number NIdeal = 16C 2/[(1 − C) 2 (1 + 6C + C 2 )]. In this pure state case, the EoF and RCI are both equal to g (2NIdeal + 1). 277 () () Figure E.4: (a) Schematic of the direct swap approach. (b) Entanglement of formation Ef between qubits versus evolution time t. The vertical dashed line indicates the time with maximum entanglement at t = π/2. E.2.2 Solving the direct swap In this section, we provide more details on the direct swap approach, for entanglement distillation on the hybrid CV-DV platform. For a given pair of noisy entangled microwave modes in state ρˆm,m, the composite system is prepared the same as it is in the hybrid LVQC approach where two qubits are in state |0⟩q , shown in the top and bottom halves of Fig. E.4(a). The CV-DV systems on both sides are evolved separately by the same unitary exp −itHˆ swap,ℓ where ℓ = 1, 2 stands for the two sides. Through the evolution, we monitor the entanglement (i.e. EoF) between two qubits, as shown in Fig. E.4(b). We stop the evolution and discard the modes when the two qubits are maximally entangled at the time t = (2n + 1)π/2, n ∈ N. In the direct swap approach, we choose the final two-qubit state to maximize the entanglement, whose fidelity to |Ψ+⟩ is not guaranteed to be the maximal. To maximize the fidelity, we allow an arbitrary local unitary Uˆ on one of the qubit. Note that the Bell state |Ψ+⟩ is invariant under Uˆ ⊗ Uˆ ⋆ , so all local unitary can be absorbed into a single qubit. We parameterize the single qubit unitary by three angles and numerically maximize the fidelity to obtain the results. E.2.3 Qubit distillation protocols As explained in the main text, to obtain states with further lower infidelity, we consider distillation protocols on two-copy of the output qubits produced from hybrid LVQCs or direct swap approach. Various 278 RX 1 RX 2 0 0 1 1 RX 1 RX 2 RY 3 RY 4 RY 3 RY 4 RX 4+1 RX 4+2 RX 4+1 RX 4+2 RY 4+3 RY 4+4 RY 4+3 RY 4+4 …… RX 2 RX 2 RX − 2 RX − 2 0 0 1 1 () () Figure E.5: Schematic of (a) DEJMPS protocol and (b) L-layer DV LVQC. entanglement distillation protocols on DV systems have been proposed, including BBPSSW [287], DEJMPS [288] and LOCCNet [290]. We mainly focus on the DEJMPS protocol which is proved to be optimal for bell-diagonal states with rank up to three, and a DEJMPS-inspired DV LVQC, shown in Fig. E.5(a)(b) separately. For both protocols, the two-qubit system A0, B0 and A1, B1 are separately initialized with identical two-qubit mixed state produced from hybrid distillation approach. Through the circuit, qubits A1, B1 are measured in the computational basis where only |00⟩ or |11⟩ are considered as success. 279
Abstract (if available)
Abstract
Quantum neural networks (QNNs) is a promising paradigm for quantum computing and quantum information processing on quantum devices from near-term towards fault-tolerant era. It is fundamental and important to understand the physics within its optimization process and performance, and to design new architectures to fully release its potential.
In this thesis, we investigate the phase transitions in QNNs associated with its training difficulty and dynamics, and its applications in both supervised and unsupervised learning. We identify a computational phase transition with quantum approximate optimization algorithm (QAOA) in solving combinatorial problems such as 3-SAT. It is connected to the controllability and complexity of QAOA circuits. We also show that the late-time dynamics of universal deep QNNs can be described generalized Lotka-Volterra equations, where target values induce convergence dynamics with different scaling. In applications to supervised learning, we find an exponential decay in quantum state discrimination error with extensive QNN circuit depth, in contrast to non-extensive ones with limited depth such as quantum convolutional neural networks to be suboptimal. For unsupervised generative learning, we propose the quantum denoising diffusion probabilistic model (QuDDPM) to enable efficiently trainable generative learning of quantum data via interpolation between data and noise. We provide bounds on the learning error and demonstrate QuDDPM's capability in learning correlated quantum noise model, quantum many-body phases and topological structure of quantum data. In the end, we extend our scope to the trainability analysis of universal QNNs on continuous-variable systems, and apply it to transduction of entanglement from continuous to discrete variable systems.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Characterization and suppression of noise in superconducting quantum systems
PDF
Quantum and classical steganography in optical systems
PDF
Open-system modeling of quantum annealing: theory and applications
PDF
Demonstration of error suppression and algorithmic quantum speedup on noisy-intermediate scale quantum computers
PDF
Lower overhead fault-tolerant building blocks for noisy quantum computers
PDF
Modeling and engineering noise in superconducting qubits
PDF
Towards robust dynamical decoupling and high fidelity adiabatic quantum computation
PDF
Protecting Hamiltonian-based quantum computation using error suppression and error correction
PDF
Quantum information-theoretic aspects of chaos, localization, and scrambling
PDF
Coherence generation, incompatibility, and symmetry in quantum processes
PDF
Applications of quantum error-correcting codes to quantum information processing
PDF
Applications and error correction for adiabatic quantum optimization
PDF
Dissipation as a resource for quantum information processing: harnessing the power of open quantum systems
PDF
Open quantum systems and error correction
PDF
Error correction and quantumness testing of quantum annealing devices
PDF
Tunneling, cascades, and semiclassical methods in analog quantum optimization
PDF
Advancing the state of the art in quantum many-body physics simulations: Permutation Matrix Representation Quantum Monte Carlo and its Applications
PDF
Quantum information techniques in condensed matter: quantum equilibration, entanglement typicality, detection of topological order
PDF
Imposing classical symmetries on quantum operators with applications to optimization
PDF
Explorations in the use of quantum annealers for machine learning
Asset Metadata
Creator
Zhang, Bingzhi
(author)
Core Title
Trainability, dynamics, and applications of quantum neural networks
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Physics
Degree Conferral Date
2024-05
Publication Date
07/03/2024
Defense Date
04/17/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,quantum computing,quantum information science,quantum machine learning
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Zhuang, Quntao (
committee chair
), Brun, Todd (
committee member
), Levenson-Falk, Eli (
committee member
), Lidar, Daniel (
committee member
), Zanardi, Paolo (
committee member
)
Creator Email
bingzhiz@usc.edu,bzzhangphy@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113997ES7
Unique identifier
UC113997ES7
Identifier
etd-ZhangBingz-13177.pdf (filename)
Legacy Identifier
etd-ZhangBingz-13177
Document Type
Thesis
Format
theses (aat)
Rights
Zhang, Bingzhi
Internet Media Type
application/pdf
Type
texts
Source
20240709-usctheses-batch-1177
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
quantum computing
quantum information science
quantum machine learning