Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
High performance and ultra energy efficient computing using superconductor electronics
(USC Thesis Other)
High performance and ultra energy efficient computing using superconductor electronics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
HIGH PERFORMANCE AND ULTRA ENERGY EFFICIENT COMPUTING USING
SUPERCONDUCTOR ELECTRONICS
by
Haolin Cong
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
May 2023
Copyright 2023 Haolin Cong
Dedication
To my parents and grandmother. And to my future wife.
ii
Acknowledgements
First and foremost, I would like to express my sincere gratitude to my advisor, Professor Massoud Pedram,
for his unwavering support, encouragement, and guidance throughout my doctoral studies. His expertise,
knowledge, and insights have been invaluable in shaping my research and shaping my career. I am truly
grateful for his mentorship and his dedication to my success.
Next, I would like to thank my PhD qualifying exam committee and defense exam committee members,
Prof. Peter Beerel, Prof. Sandeep Gupta, Prof. Wei Wu, and Prof. Aiichiro Nakano, for their time, effort,
and expertise in evaluating my research work. Their constructive feedback and insightful comments have
been instrumental in shaping my research and improving my work.
I would also like to extend my appreciation to my group mates, Mingye Li, Mustafa Karamuftuoglu, Bo
Zhang, Ting-Ru Lin, Sasan Razmkhah, Mohammad Saeed Abrishami, Zeming Cheng, Amirhossein Esmaili
Dastjerdi, Arash Fayyazi, Mahdi Nazemi, Marzieh Vaeztourshizi and Naveen Katam, for their valuable
contributions and camaraderie. Their insights, feedback, and support have been critical to my success
and have made my research journey more enjoyable and fulfilling. I would also like to thank the group
administrate assistant Annie Yu.
Outside SPORT lab group, I would also like to thank Prof. Mike Chen, Prof. Hossein Hashemi, Shiyu
Su, Ce Yang, Zisong Wang, Cheng-Ru Ho, Tzu-Fan Wu, Aoyang Zhang, Rezwan A Rasul and all the people
who helped me and encouraged me. I would also like to thank Dr. Pete Hopkins from national institute of
standards and technology (NIST) for measureing the chips.
iii
Finally, I would like to express my appreciation to my family and my girlfriend Yuting He for their
unwavering support, encouragement, and love throughout my doctoral journey. Their support has been
invaluable in helping me navigate the challenges and opportunities of pursuing a PhD degree.
iv
TableofContents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Superconductivity and Superconductor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Fundamental Properties of Superconductors . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Two different types of superconductors . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Josephson Effect and Josephson Junction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Pair Tunneling: the Josephson Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Josephson Junction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2.1 Structures of Josephson Junction . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2.2 The RCSJ Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Digital Design Using Josephson Junctions: Rapid Single Flux Quantum (RSFQ) . . . . . . . 9
1.3.1 Single Flux Quantum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Superconducting Quantum Interference Device . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Fundamentals of Rapid Single Flux Quantum Technology . . . . . . . . . . . . . . 14
1.4 Nb Based Josephson Junction Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Magnetic Josephson Junctions and 2-phi Junctions . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 2: Cell Design and Cell Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1 Standard Cell Library Design in Rapid Single Flux Quantum Technology . . . . . . . . . . 21
2.1.1 Background and Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2 Standard Cell Library General Description . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.3 Cells in qSportLib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.3.1 Josephson Transmission Line (JTL) . . . . . . . . . . . . . . . . . . . . . 26
2.1.3.2 Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.3.3 Splitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.3.4 Splitter-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.3.5 Merger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
v
2.1.3.6 D Flip-flop (DFF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.1.3.7 AND2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.3.8 OR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1.3.9 XOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.1.3.10 Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.1.3.11 Toggle Flip-flop (TFF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.1.3.12 Non-destructive Readout (NDRO) . . . . . . . . . . . . . . . . . . . . . . 42
2.1.3.13 Use Passive Transmission Line (PTL) . . . . . . . . . . . . . . . . . . . . 44
2.1.3.14 Passive Transmission Line Transmitter/Driver (PTLTX) . . . . . . . . . . 44
2.1.3.15 Passive Transmission Line Receiver (PTLRX) . . . . . . . . . . . . . . . . 45
2.1.3.16 Passive Transmission Line (PTL) . . . . . . . . . . . . . . . . . . . . . . . 46
2.1.3.17 DC/SFQ Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.1.3.18 SFQ/DC Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.1.3.19 AB+CD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.1.3.20 (A+B)(C+D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.1.3.21 ABCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.1.3.22 A+(BC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.1.3.23 (A+B)C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.1.4 Cell Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.1.5 Synthesis Result Using the qSportLib . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.2 Standard Cell Library Design in Half Flux Quantum Technology using 2-phi Junctions . . 64
2.2.1 Background and Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.2.2 Standard Cell Library Using 2-phi Junction . . . . . . . . . . . . . . . . . . . . . . . 66
2.2.2.1 Josephson Transmission Line (JTL) . . . . . . . . . . . . . . . . . . . . . 68
2.2.2.2 D Flip-flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2.2.3 Merger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.2.2.4 Splitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.2.2.5 OR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.2.2.6 AND2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.2.2.7 XOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.2.2.8 Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.2.2.9 PTL Driver and Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2.2.10 DC/SFQ Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.2.2.11 SFQ/DC Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.2.2.12 Random Pattern Generator . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.2.2.13 Library Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Chapter 3: Circuit Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.1 8-bit Multiplier Using the Complex Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.1.1 Background and Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.1.2 Carry Signal Generated by Single-Stage Gate . . . . . . . . . . . . . . . . . . . . . 88
3.1.3 Sum Signal Generated by Single-Stage Gate . . . . . . . . . . . . . . . . . . . . . . 92
3.1.4 One-bit Full Adder Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.1.5 8-bit Multiplier Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.1.6 Design Using Automation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.1.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
vi
3.1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.2 All Digital Phase-Lock-Loop Design in Rapid Single Flux Quantum Technology . . . . . . 106
3.2.1 Background and Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.2.2 Digital Phase Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.2.3 Digital Controlled Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.2.4 Digital Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.2.5 10-bit Frequency Divider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.2.6 RSFQ All Digital PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.3 Memory Prototype Using Multi-Fluxon Storage Cell . . . . . . . . . . . . . . . . . . . . . . 124
3.3.1 Background and Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.3.2 Multi-fluxon Destructive Readout Design . . . . . . . . . . . . . . . . . . . . . . . 124
3.3.3 Multi-bit Random Access Memory Design . . . . . . . . . . . . . . . . . . . . . . . 128
3.3.4 Error-resist Memory Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Chapter 4: Cell Characterization Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.1 Background and Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.2 DEMUX Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.3 Cell Delay Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.4 Setup Time Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.5 Hold Time Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Chapter 5: Final Remarks and Looking Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
vii
ListofTables
1.1 Design and Physical Layers in the MIT LL100µA/µm
2
SFQ5ee . . . . . . . . . . . . . . . 18
1.2 Summary of different types of JJs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 Summary of the cells in qSportLib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Circuit parameter list of JTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Circuit parameter list of buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Circuit parameter list of splitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Circuit parameter list of splitter-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Circuit parameter list of merger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7 Circuit parameter list of DFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.8 Circuit parameter list of AND2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.9 Circuit parameter list of OR2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.10 Circuit parameter list of XOR2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.11 Circuit parameter list of inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.12 Circuit parameter list of TFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.13 Circuit parameter list of NDRO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.14 Circuit parameter list of PTLTX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.15 Circuit parameter list of PTLRX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.16 Parameter values for the M1, M3, and M5-based PTLs . . . . . . . . . . . . . . . . . . . . . 47
viii
2.17 Circuit parameter list of DC/SFQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.18 Circuit parameter list of SFQ/DC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.19 Circuit parameter list of (AB)+(CD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.20 Summary of (AB)+(CD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.21 Circuit parameter list of (A+B)(C+D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.22 Summary of (A+B)(C+D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.23 Circuit parameter list of ABCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.24 Summary of ABCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.25 Circuit parameter list of A+(BC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.26 Summary of A+(BC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.27 Circuit parameter list of (A+B)C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.28 Summary of (A+B)C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.29 Summary of the cell measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.30 Summary of the cells in qSportLib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.31 Parameter values and margins of JTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.32 Parameter valuess and margins of DFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.33 Parameter values and margins of merger cell . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.34 Parameter values and margins of splitter cell . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.35 Parameter values and margins of OR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.36 Parameter values and margins of AND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.37 Parameter values and margins of XOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.38 Parameter values and margins of the inverter . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.39 Parameter values and margins of PTL driver and receive . . . . . . . . . . . . . . . . . . . 81
2.40 Parameter values and margins of DC/SFQ converter . . . . . . . . . . . . . . . . . . . . . . 82
ix
2.41 Parameter values and margins of SFQ/DC converter . . . . . . . . . . . . . . . . . . . . . . 84
2.42 Summary of the cell measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.1 Key components values of the carry cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.2 Key components values of the sum cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.3 COMPARISON TABLE OF FULL ADDER USING STANDARD GATES WITH SINGLE-
STAGE ADDER DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.4 ARRAY MULTIPLIER DESIGN AUTOMATION TOOL IMPLEMENTATION RESULT . . . . 104
3.5 summary of 8-bit parallel multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.6 Truth table of the phase detector logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.7 Summary of the DCO Monte Carlo simulations. . . . . . . . . . . . . . . . . . . . . . . . . 114
3.8 Summary of the features of the SFQ all-digital PLLs. . . . . . . . . . . . . . . . . . . . . . . 123
3.9 Circuit parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.10 Error-resist memory logic table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.1 Circuit parameter list of DEMUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.2 Summary of the hold time simulation of the AND gate (ps) . . . . . . . . . . . . . . . . . . 147
x
ListofFigures
1.1 Resistance of the wire of solid mercury at low temperature. (after [2]) . . . . . . . . . . . . 3
1.2 Diagram of the Meissner effect. (after [5]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Structure of a Superconductor-Insulator-Superconductor (SIS) junction. . . . . . . . . . . . 8
1.4 Symbol of a Josephson junction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Diagram of the RCSJ model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 I-V curve of Josephson junction for different Stewart-McCumber parameter. . . . . . . . . 10
1.7 Superconducting Quantum Interference Device (SQUID): (a) Circuit schematic; (b)
Structure diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.8 SQUID in a circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.9 A short voltage pulse representing logic ’1’ and the 2π phase leap when a junction
produces the pulse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.10 Schematic of RSFQ Josephson transmission line (JTL). . . . . . . . . . . . . . . . . . . . . . 16
1.11 Cross section of a wafer fabricated by the SFQ5ee process. . . . . . . . . . . . . . . . . . . 17
2.1 Design flow for a cell in qSportLib. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Design flow for a cell in qSportLib. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Diagram of a small system using abutment cells . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Schematic of JTL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Layout of JTL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Simulation waveform of JTL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
xi
2.7 Test waveform of JTL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.8 Schematic of buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.9 Layout of buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.10 Simulation waveform of buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.11 Test waveform of buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.12 Schematic of splitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.13 Layout of splitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.14 Simulation waveform of splitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.15 Schematic of splitter-3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.16 Layout of splitter-3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.17 Simulation waveform of splitter-3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.18 Test waveform of splitter-3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.19 Schematic of merger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.20 Layout of merger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.21 Simulation waveform of merger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.22 Schematic of DFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.23 Layout of DFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.24 Simulation waveform of DFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.25 Test waveform of DFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.26 Schematic of AND2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.27 Layout of AND2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.28 Simulation waveform of AND2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.29 Schematic of OR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.30 Layout of OR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
xii
2.31 Simulation waveform of OR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.32 Schematic of XOR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.33 Layout of XOR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.34 Simulation waveform of XOR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.35 Simulation waveform of XOR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.36 Schematic of inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.37 Layout of inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.38 Simulation waveform of inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.39 Diagram of the logic of TFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.40 Schematic of TFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.41 Layout of TFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.42 Simulation waveform of TFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.43 Simulation waveform of TFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.44 Schematic of NDRO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.45 Layout of NDRO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.46 Simulation waveform of NDRO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.47 Schematic of PTLTX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.48 Layout of PTLTX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.49 Schematic of PTLRX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.50 Layout of PTLRX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.51 Diagram of a PTL with stitched ground shields above and below . . . . . . . . . . . . . . . 47
2.52 Diagram of the passive transmission line (PTL) contact . . . . . . . . . . . . . . . . . . . . 48
2.53 Layout of Metal 1 PTL(bottom) and the corner (right upper). . . . . . . . . . . . . . . . . . 48
2.54 Layout of the Metal 3 (bottom) and Metal 5 (upper) PTL. . . . . . . . . . . . . . . . . . . . 48
xiii
2.55 Layout of the M3-M5 PTL contact-and-corner (left bottom) and M1-M3 PTL contact-and-
corner (right upper). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.56 Schematic of DC/SFQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.57 Layout of DC/SFQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.58 Simulation waveform of DC/SFQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.59 Schematic of SFQ/DC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.60 Layout of SFQ/DC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.61 Simulation waveform of SFQ/DC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.62 Schematic of (AB)+(CD). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.63 Layout of (AB)+(CD). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.64 Schematic of (AB)+(CD) using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . . 54
2.65 Layout of (AB)+(CD) using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . . . . 54
2.66 Simulation waveform of (AB)+(CD). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.67 Schematic of (A+B)(C+D). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.68 Layout of (A+B)(C+D). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.69 Schematic of (A+B)(C+D) using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . 56
2.70 Layout of (A+B)(C+D) using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . . . 56
2.71 Simulation waveform of (A+B)(C+D). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.72 Schematic of ABCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.73 Layout of ABCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.74 Schematic of ABCD using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . . . . 57
2.75 Layout of ABCD using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . . . . . . 57
2.76 Simulation waveform of ABCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.77 Schematic of A+(BC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.78 Layout of A+(BC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
xiv
2.79 Schematic of A+(BC) using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . . . . 59
2.80 Layout of A+(BC) using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . . . . . . 59
2.81 Simulation waveform of A+(BC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.82 Schematic of (A+B)C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.83 Layout of (A+B)C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.84 Schematic of (A+B)C using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . . . . 61
2.85 Layout of (A+B)C using clock-follow-data scheme. . . . . . . . . . . . . . . . . . . . . . . . 61
2.86 Simulation waveform of (A+B)C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.87 Chip photo 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.88 Chip photos 2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.89 A measured waveform for the XOR gate in qSportLib . . . . . . . . . . . . . . . . . . . . . 64
2.90 Simulation waveform of a comparison on 0-JJ and2ϕ -JJ. . . . . . . . . . . . . . . . . . . . 67
2.91 Schematic of the Josephson Transmission Line (JTL). . . . . . . . . . . . . . . . . . . . . . 68
2.92 Simulation waveform of the Josephson Transmission Line (JTL). . . . . . . . . . . . . . . . 69
2.93 Schematic of the D Flip-flop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.94 Simulation waveform of the D Flip-flop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.95 Schematic of the merger cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.96 Simulation waveform of the merger cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.97 Schematic of the splitter cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.98 Simulation waveform of the splitter cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.99 Schematic of the OR gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.100 Simulation waveform of the OR gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.101 Schematic of the AND gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.102 Simulation waveform of the AND gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
xv
2.103 Schematic of the XOR gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.104 Simulation waveform of the XOR gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.105 Schematic of the inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.106 Simulation waveform of the inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.107 Schematic of the PTL driver and receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.108 Simulation waveform of the PTL driver and receiver. . . . . . . . . . . . . . . . . . . . . . 80
2.109 Schematic of the DC/SFQ converter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.110 Simulation waveform of the DC/SFQ converter. . . . . . . . . . . . . . . . . . . . . . . . . 82
2.111 Schematic of the SFQ/DC converter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.112 Simulation waveform of the SFQ/DC converter. . . . . . . . . . . . . . . . . . . . . . . . . 83
2.113 Schematic of the random pattern generator. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.114 Simulation waveform of the random pattern generator. . . . . . . . . . . . . . . . . . . . . 84
2.115 An example layout of the OR gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.1 (a) Circuit diagram of conventional one-bit full adder; (b) Schematic of a simpler one-bit
full adder using confluence buffer (CB) and TFF. . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2 (a) 3-XOR gate; (b) Majority gate taken from [46]. . . . . . . . . . . . . . . . . . . . . . . . 88
3.3 Schematic of the single-stage RSFQ carry cell. . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4 Margins of critical components of the carry cell. . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5 Layout of the carry cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.6 Simulation waveform and the bias margin of the carry cell . . . . . . . . . . . . . . . . . . 92
3.7 Schematic of the single-stage RSFQ sum cell. . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.8 Margins of key components of sum cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.9 Layout of the sum cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.10 Simulation waveform for the sum cell and the bias margin. . . . . . . . . . . . . . . . . . . 96
xvi
3.11 Block-level diagram of the novel one-bit full adder design employing single-stage complex
RSFQ gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.12 Comparison of the layouts using standard RSFQ gates and single-stage complex RSFQ gates. 98
3.13 Simulation waveform of one-bit full adder employing single-stage complex RSFQ gates. . . 99
3.14 Block-level diagram of the 8-bit multiplier. . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.15 Floorplan of the 8-bit multiplier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.16 Clock-follow-data clocking scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.17 Schematic of the delay cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.18 Input-to-Output delay of the employed delay cell for different bias current conditions. . . . 103
3.19 Chip layout of the 8-bit multiplier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.20 Simulation waveform of 8-bit multiplier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.21 (a) Basic idea of a PLL; (b) a PLL with frequency divider. . . . . . . . . . . . . . . . . . . . 108
3.22 (a) A NDRO performs as phase detector; (b) the waveform showing function of the PD. . . 109
3.23 Block diagram of the digital controlled oscillator. . . . . . . . . . . . . . . . . . . . . . . . . 113
3.24 Schematic of the tunable delay JTL cell: critical currentJ1=131.7µA,J 2=60µA,J 3=
271.6µA,J 4 = 130µA,J 5 = 149.7µA,J 6 = 100µA ; inductanceL1 = 3.3pH,L2 =
1.87pH,L3=4.0pH; bias currentI
1
=100µA,I
2
=31.25µA,I
3
=125µA . . . . . . . . 113
3.25 The delay of JTL with different Josephson Junction bias current on a 130µA -critical-current
JJ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.26 Histograms of the Monte Carlo simulation as a function of the DCO oscillation period
showing the means and standard deviation (STD). . . . . . . . . . . . . . . . . . . . . . . . 115
3.27 Histogram of the Monte Carlo simulation as a function of the DCO frequency step. . . . . 115
3.28 (a) Structure of the digital loop filter; (b) its mathematical equivalent. . . . . . . . . . . . . 116
3.29 Block diagram of the digital accumulator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.30 Block diagram of the frequency divider. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.31 Block diagram of the RSFQ all-digital PLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.32 Layout of the RSFQ all-digital PLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
xvii
3.33 Mathematical equivalent of the RSFQ all-digital PLL. . . . . . . . . . . . . . . . . . . . . . 121
3.34 Simulation waveform of the locked signal with reference. . . . . . . . . . . . . . . . . . . . 122
3.35 Simulation waveform for the period jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.36 Plot of the output jitter vs. the DCO oscillating frequency step in ps. . . . . . . . . . . . . . 123
3.37 Schematic of the MFDRO cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.38 Margins of the MFDRO cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.39 Diagram of the state machine for MFDRO. . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.40 Chip photo of a test structure for MFDRO. . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.41 Layout of the MFDRO cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.42 Measured waveform of the MFDRO cell. (a) Case for three fluxons; (b) case for two
fluxons; (c) case for one fluxon and (d) case for zero fluxon. . . . . . . . . . . . . . . . . . . 128
3.43 Register file for the multi-bit random access memory. . . . . . . . . . . . . . . . . . . . . . 130
3.44 Structure and waveform diagram of the HC-write block. . . . . . . . . . . . . . . . . . . . 131
3.45 Connection of the register files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.46 Diagram of the multi-bit random access memory. . . . . . . . . . . . . . . . . . . . . . . . 132
3.47 Structure and waveform diagram of the HC-clock block. . . . . . . . . . . . . . . . . . . . 132
3.48 Design and waveform diagram of the serial-to-parallel counter using T1FF. . . . . . . . . . 133
3.49 Register file structure for the error-resist memory. . . . . . . . . . . . . . . . . . . . . . . . 134
3.50 Diagram of the error-resist memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
3.51 (a) Schematic of the soma; (b) Waveform of the soma circuit operating as a threshold
detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.1 Schematic of the DEMUX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.2 Layout of the DEMUX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.3 Simulation waveform the DEMUX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.4 Diagram of the cell delay measurement prototype. . . . . . . . . . . . . . . . . . . . . . . . 140
xviii
4.5 Diagram of the cell delay measurement prototype. . . . . . . . . . . . . . . . . . . . . . . . 140
4.6 Histogram of the Monte-Carlo simulation result of the DFF delay using the proposed
prototype. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.7 Relationship between the cell delay and the input-to-clock time interval∆ t. . . . . . . . . 142
4.8 Diagram of the setup/hold time measurement prototype. . . . . . . . . . . . . . . . . . . . 142
4.9 The schematic of tunable delay block using cascaded JTLs. . . . . . . . . . . . . . . . . . . 143
4.10 Configuration to detect setup event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.11 Configuration to measure the clock path delay and signal path delay. . . . . . . . . . . . . 144
4.12 Histogram of the Monte-Carlo simulation result of the AND gate hold time. . . . . . . . . 145
4.13 Histogram of the Monte-Carlo simulation result of the AND gate setup time. . . . . . . . . 145
4.14 Diagram of the hold time measurement prototype. . . . . . . . . . . . . . . . . . . . . . . 146
4.15 Diagram of the input pattern generator block. . . . . . . . . . . . . . . . . . . . . . . . . . 146
xix
Abstract
As semiconductor technology scales down to sub-nanometer levels, maintaining Moore’s law becomes
increasingly challenging. Consequently, researchers are seeking alternative solutions for the next gener-
ation of computers, and superconducting materials, which were initially discovered in 1908, are receiving
renewed attention. Superconducting electronics (SCE) logic families, such as Single-Flux-Quantum (SFQ)
that utilize Josephson junctions, present a viable solution for developing a faster and more energy-efficient
next-generation computer. The delay of a single SFQ gate can be as low as a few picoseconds, enabling
SFQ-based systems to potentially operate at tens of gigahertz. Additionally, the switching energy of SFQ
circuits is almost four orders of magnitude smaller than that of CMOS technology, making SFQ a clear
winner in terms of power consumption when the operating rate is high. The SFQ circuit presents several
challenges: the SFQ system has to be fully path-balanced; any one-to-multiple connection requires a usage
of specially designed splitting cells; it is difficult to precisely characterize a SFQ cell due to its super-fast
nature and so on. To address these challenges and to enrich the SFQ circuit design, several works are done
and included in this dissertation.
To address these challenges and improve the SFQ circuit design, this dissertation describes a method-
ology for developing a normal Josephson junction-based standard cell library (qSportLib) using the MITLL
SFQ5ee process. The design flow for a cell in qSportLib is explained, and the abutment connection strategy
is discussed. The library cells are introduced, including the circuit schematics, parameter values, simulated
margins, physical layout design, and simulated/measured waveform. The dissertation presents several
xx
newly designed cells, including complex logic gates and multi-fluxon destructive readout cells. Some of
the library cells are fabricated and confirmed to be functional through measurement. The dissertation also
proves the improvement achieved by adding complex logic gates to the cell library through the EDA tool’s
synthesizing results.
Furthermore, the dissertation describes the design of a 2ϕ Josephson junction based standard cell li-
brary that can be triggered by half flux quantum. This feature guarantees a saving of half of the dynamic
power consumption compared with conventional RSFQ circuits, and the2ϕ -junction based cell eliminates
the usage of inductors, thus improving scalability and saving area. The circuit schematics are shown, and
the parameter values are listed with their margins. The layouts based on a virtual process are demonstrated,
and the correct function of each block is proved by simulation. This is the first time a 2ϕ -junction-based
standard cell library is presented.
The dissertation also explains the implementation of three RSFQ circuits using the newly designed
qSportLib, including an 8-bit multiplier, an all-digital phase-lock-loop (PLL) design, and a multi-fluxon
destructive readout (MFDRO). First, the 8-bit multiplier is implemented using a single stage full adder
design with complex gates, SUM, and CARRY. The multiplier structure and layout floor plan are shown, and
the multiplier’s functional operation is confirmed at 40GHz through post-layout simulation. The synthesis
results of array multipliers with various bits are compared between the standard cell library with and
without the SUM and CARRY cell, demonstrating the benefit of using complex cells. Second, the all-
digital PLL design provides a clock synthesizing block for the RSFQ system and is the first all-digital PLL
implemented in RSFQ technology. The PLL generates a 50GHz clock signal from a 48.82MHz reference
clock with a 2.93ps peak-to-peak jitter. Third, the MFDRO is presented with its circuit schematic, parameter
values, and layout. The cell is fabricated and tested to be functional. The dissertation also presents a
high-capacity random access memory and an error-resistant random access memory to demonstrate the
application of the MFDRO.
xxi
This dissertation also proposed a cell characterization prototype used for measuring the clock-to-Q
delay, the setup time and hold time, making it the first proposed design for setup time and hold time
measurement. A manually controlled demultiplexer is also designed during this and the characterization
scheme is validated by Monte-Carlo simulation. Meanwhile, the cell delay measurement and setup time
measurement are implemented and fabricated.
In summary, this PhD work develops two standard cell libraries, achieves three circuit designs, and
proposes a cell characterization prototype. By combining theoretical and experimental contributions, this
PhD work aims to improve superconducting-based computing systems by enhancing logic efficiency, syn-
thesis performance, area efficiency, power consumption, and functional designs.
xxii
Chapter1
Introduction
For thousands of years, devices have been used to assist with computation. Alan Turing proposed the
principle of the modern computer in 1936, which is now known as a universal Turing machine. Mean-
while, computing devices continued to evolve, from vacuum tubes to transistors and then to metal-oxide-
semiconductor transistors (MOSFETs). This evolution led to significant improvements in the number of
devices on a single chip, the size of computers, their operation speed, and computing capability. Nowadays,
computers are ubiquitous and have become an essential part of modern life. However, as semiconductor
technology scales down to sub-nanometer levels, maintaining Moore’s law becomes increasingly challeng-
ing. Consequently, researchers are seeking alternative solutions for the next generation of computers, and
superconducting materials, which were initially discovered in 1908, are receiving renewed attention.
Traditional semiconductor-based devices face fundamental limitations in terms of power consumption
and speed due to the inherent resistance of the materials used. Superconducting circuits, on the other hand,
can operate with very low power dissipation and can achieve very high clock frequencies. There has been
a revival interest in superconducting electronics (SCE). SCE logic families, such as Single-Flux-Quantum
(SFQ) that utilize Josephson junctions, present a viable solution for developing a faster and more energy-
efficient next-generation computer. The delay of a single SFQ gate can be as low as a few picoseconds,
enabling SFQ-based systems to potentially operate at tens of gigahertz. Additionally, the switching energy
1
of SFQ circuits is almost four orders of magnitude smaller than that of CMOS technology, making SFQ
a clear winner in terms of power consumption when the operating rate is high. Rapid SFQ (RSFQ) is an
especially noteworthy SFQ technology that employs resistors for biasing and aims to achieve ultra-high
speed. In addition to SFQ logic, there are numerous novel logic families that draw inspiration from newly
discovered devices.
The SFQ circuit presents unique challenges in comparison to conventional CMOS circuits. Firstly,
the majority of SFQ cells operate synchronously, necessitating the implementation of fully path-balanced
systems by means of inserting DFFs and a fully connected clock network. Secondly, most SFQ cells possess
a fan-out of one, except for specially designed splitting cells, which entails a series of additional splitting
blocks for each one-to-multiple connection in a SFQ system. Combining these issues, the clock distribution
network becomes prohibitively expensive. Another significant challenge for SFQ designers is the difficulty
of precisely characterizing a SFQ cell due to its super-fast nature. To address these challenges and to enrich
the SFQ circuit design, two new standard cell libraries have been developed: one (qSportLib) using normal
Josephson junctions and the other utilizing newly discovered devices. A variety of functional circuits have
been designed using the qSportLib library, including an 8-bit multiplier, an all-digital phase-locked loop,
and two prototypes of random access memory. A measurement scheme for the timing characterization of
SFQ cells is also proposed. Subsequently, this chapter outlines fundamental concepts and knowledge that
must be understood before delving into specific designs.
1.1 SuperconductivityandSuperconductor
In 1908, Heike Kamerlingh-Onnes successfully liquefied helium, the last of the noble gases whose boiling
point at atmospheric pressure is 4.2K. This allowed him to conduct experiments at extremely low temper-
atures. During his investigation of the electrical resistance of metals, Kamerlingh-Onnes initially studied
2
platinum and gold. In order to test his hypothesis that highly pure samples would exhibit vanishing re-
sistance at very low temperatures. Later, Kamerlingh-Onnes turned to the metal mercury, which was the
only metal that could be purified to a high degree through multiple distillations at that time. [1]
Figure 1.1: Resistance of the wire of solid mercury at low temperature. (after [2])
Kamerlingh-Onnes’ results were published in a curve which is depicted in Fig. 1.1 [2]. Demonstrated
at temperatures just below 4.2K, the resistance of mercury suddenly vanishes to an immeasurably small
value. Kamerlingh-Onnes named this new phenomenon the "superconductive state," due to its extraor-
dinary electrical properties. Since then, hundreds of thousands of superconducting materials, including
pure elements, alloys, ceramics, and more, have been discovered. While some of these materials exhibit
superconductivity at atmospheric pressure, others require high pressure. Presently, niobium is the pure
element with the highest known transition temperature (9K) at atmospheric pressure.
The superconductivity phenomenon was explained in 1957 by Bardeen, Cooper, and Schrieffer, who
formulated the "BCS theory." According to this theory, in accordance with the laws of quantum mechanics,
electrons form pairs upon the transition to the superconducting state. These pairs, known as Cooper
pairs, condense into a new state where they form a coherent matter wave with a well-defined phase. And
3
the "phonons," or quantized vibrations of the crystal lattice, serve as a medium for the interaction of the
electrons.
1.1.1 FundamentalPropertiesofSuperconductors
The disappearance of electrical resistance is a unique and defining feature of superconductors, hence the
name "superconductivity." In addition to this property, superconductors possess other noteworthy char-
acteristics, although some of these fall outside the scope of this dissertation. This section will focus on
several important concepts and properties, including critical temperature, critical current, the Meissner-
Ochsenfeld effect, and critical magnetic field.
As mentioned previously, a material’s resistance vanishes abruptly at a particular temperature known
as the critical temperature (T
c
). As the transition takes place within a narrow temperature range (∆ T ),T
c
is defined as the temperature at which the resistance drops to half its normal-state value ( R
n
).
The amount of current that a superconductor can carry is limited. If the current flowing through a
superconductor’s cross-section exceeds the critical current (I
c
), the sample will no longer be in a super-
conducting state. In such a situation, the current per unit area is referred to as the critical current density
(J
c
).
When an external magnetic field is applied to a perfect conductor, it should preserve its interior mag-
netic field. However, in the case of a superconductor, the applied magnetic field is expelled from the interior
of the material except for a thin outer layer (known as the "Meissner-Ochsenfeld effect" or "ideal diamag-
netism").[3] This observation is explained by the London equation, developed by Fritz and Heinz London
[4] which states that∇
2
H =λ − 2
H, whereH is the magnetic field and λ is the London penetration depth.
In contemporary understanding, the expulsion of applied magnetic fields by superconductors is a well-
known phenomenon with a defined limitation. This threshold is typically referred to as the critical mag-
netic field H
c
, which represents the magnitude of magnetic field that a superconductor can withstand
4
Figure 1.2: Diagram of the Meissner effect. (after [5])
before it transitions out of the superconducting state. In the case of a thin, elongated superconductor rod,
the appearance of resistance under a progressively increasing parallel magnetic field occurs at a critical
magnetic field H
c
. Empirically determinedH
c
values are often observed to have a functional relationship
with temperatureT . Specifically, the following equation, where T
c
represents the critical temperature and
H
c
(0) represents the critical magnetic field at 0K for a given material, describes this relationship:
H
c
=H
c
(0)[1− (
T
T
c
)
2
] (1.1)
As can be seen from the above formula,H
c
approaches zero asT approachesT
c
.
1.1.2 Twodifferenttypesofsuperconductors
There exist two distinct categories of superconductors, namely:
• Type-I superconductors, which are capable of expelling a magnetic field up to a critical value H
c
.
If the externally applied magnetic field is larger than H
c
, the superconductor loses its conductivity
and returns to its normal state. Elemental superconductors are predominantly categorized as type-I
5
superconductors, although niobium, vanadium, and technetium are elemental type-II superconduc-
tors.
• Type-II superconductors, which have two critical magnetic fields. When the external magnetic field
is below the "lower critical magnetic field" H
c1
, the type-II superconductor behaves like a type-I su-
perconductor, expelling the magnetic field from its interior and exhibiting no resistance. The type-II
superconductor loses its superconducting state when the externally applied field exceeds the "upper
critical magnetic field" H
c2
. However, if the applied magnetic field is between H
c1
andH
c2
, the type-
II superconductor remains in an intermediate phase known as the Vortex state, exhibiting a mixture
of ordinary and superconducting properties. Type-II superconductors are typically fabricated from
metal alloys or complex oxide ceramics, and all high-temperature superconductors belong to this
category.
1.2 JosephsonEffectandJosephsonJunction
1.2.1 PairTunneling: theJosephsonEffect
It is important to note that when two superconductors are separated by a sufficient distance, their Cooper
pairs can be described by two distinct wave functions that are unrelated.
ψ =|ψ (⃗ r)|e
i[θ (⃗ r)− (2ϵ f
/ℏ)t]
(1.2)
However, as the distance between them decreases and a barrier (such as a normal conductor or an insula-
tor) is introduced, the two wave functions begin to couple, allowing for pair tunneling to occur when the
6
coupling energy exceeds thermal fluctuation energy or there is a voltage difference. Under these circum-
stances, the wave functions of each superconductor can be described by the following equations, where
Us represent the wave function energy andK is the coupling constant:
iℏ
∂ψ 1
∂t
=U
1
ψ 1
+Kψ 2
iℏ
∂ψ 2
∂t
=U
2
ψ 2
+Kψ 1
(1.3)
Since there are numerous references available for the derivation of these equations [6], this section aims
only to introduce the necessary knowledge for the subsequent circuit designs. In conclusion, the current
densityJ and the phase difference of the wave function φ are related according to the following sinusoidal
equation:
J =J
c
sinφ (1.4)
Here, J
c
represents the critical current density of the structure, which can only be determined through
more sophisticated means. Additionally, the phase difference φ changes with respect to time according to
the following equation, which is dependent on the voltageV :
∂φ
∂t
=
2e
ℏ
V (1.5)
1.2.2 JosephsonJunction
1.2.2.1 StructuresofJosephsonJunction
The Josephson junction (JJ) is a structure composed of two layers of superconductors separated by a
barrier layer, which can be an insulator, a normal metal, or a physical constriction. The resulting junc-
tions are called Superconductor-Insulator-Superconductor (SIS), Superconductor-Normal-Superconductor
7
(SNS), and Superconductor-Constriction-Superconductor (SCS), respectively. Fig. 1.3 illustrates the struc-
ture of an SIS junction, while Fig. 1.4 depicts a commonly used symbol for the Josephson junction.
Figure 1.3: Structure of a Superconductor-Insulator-Superconductor (SIS) junction.
Figure 1.4: Symbol of a Josephson junction.
1.2.2.2 TheRCSJModel
In order to use Josephson junctions in circuit design and other applications, it is necessary to have an
extracted circuit model that accurately describes the current-voltage properties, as well as the dynamics
and parasitic components of the junction. One widely used model is the resistance and capacitance shunted
junction (RCSJ) model, which was introduced by W.C. Stewart and D.E. McCumber. This model, shown
in Fig. 1.5, includes a junction that follows the Josephson current, an effective capacitance resulting from
the capacitor-like S-I-S structure of the junction, and an effective resistance. The authors also introduced
8
the Stewart-McCumber parameter,β c
, which is defined by Equation 1.6 and is an important parameter in
determining the current-voltage relationship of a Josephson junction.
β c
=
2πI
c
R
2
C
Φ 0
(1.6)
Whenβ c
is greater than one, the junction is said to be under-damped, as illustrated by the left curve in Fig.
1.6. In contrast, when β c
is less than one, the junction is said to be over-damped, as shown by the right
curve in Fig. 1.6. To achieve the over-damped setting, a shunt resistor is typically added in addition to the
effective resistance in the RCSJ model. Both under-damped and over-damped junctions have been used
to realize logic, but due to the multi-value property of under-damped junctions and other issues, these
designs were eventually abandoned in late 1983 [7].
Figure 1.5: Diagram of the RCSJ model
1.3 DigitalDesignUsingJosephsonJunctions:RapidSingleFluxQuantum
(RSFQ)
For many years, the Josephson junction has been a topic of interest among researchers. In the 1970s,
IBM embarked on a project to create a Josephson junction-based prototype computer, which attracted
9
Figure 1.6: I-V curve of Josephson junction for different Stewart-McCumber parameter.
significant attention and was seen as a potential competitor to silicon-based semiconductor computers.
However, the project was eventually abandoned due to several drawbacks. One reason was the instability
of the process resulting from the use of "soft" superconductors like lead alloys. Another critical drawback
was the choice of under-damped Josephson junctions (β c
>> 1) for the logic circuitry. This choice made
it impossible to reset the logic gate by merely cutting off the signal current. As a result, digital gates
required periodic resets by completely turning off the bias current, which severely limited operation speed
and undermined the competitiveness of the project.
Since 1985, a new type of Josephson junction-based circuitry has been developed in Moscow by K.K.
Likharev, O.A. Mukhanov, and V.K. Semenov from the Moscow State University and the Institute of Radio-
engineering and Electronics [8], [9]. This new family of logic circuits used a resistive biasing approach and
was named resistive single flux quantum (RSFQ) for its promising high-speed operation. As the operation
rate increased dramatically, it was later renamed rapid single flux quantum while retaining the RSFQ
abbreviation. Since then, extensive research has been conducted in this area, and the RSFQ logic has
emerged as one of the most promising and attractive candidates for the next generation of very large-scale
integration (VLSI) circuits.
10
In this section, we will explore the fundamentals of RSFQ technology, including the logic representa-
tion, basic devices, circuit schemes, and fabrications.
1.3.1 SingleFluxQuantum
The magnetic flux Φ is defined as the product of magnetic field B and the areaS, which makes it a function
of bothB andS while the magnetic field B and areaS can be arbitrary. However, for highly homogeneous
type-II superconductors,Φ was found to be quantized experimentally by B. S. Deaver and W. M. Fairbank
[10], and independently by R. Doll and M. Näbauer [11] in 1961. This means that the magnetic flux Φ can only take on discrete values, known as the magnetic flux quantum denoted as Φ 0
. The value of the
magnetic flux quantum is given by Equation. 1.7, where h is the Planck constant ande is the elementary
charge.
Φ 0
=h/(2e)≈ 2.0678× 10
− 15
Wb or V · S (1.7)
1.3.2 SuperconductingQuantumInterferenceDevice
The Superconducting Quantum Interference Device (SQUID) is a critical device in superconducting elec-
tronics, consisting of a loop of superconductor line with one or two Josephson junctions. The supercon-
ductor line here serves as an inductance in circuitry. Fig. 1.7 depicts the circuit schematic and structure
diagram of a two-junction SQUID. The SQUID has been widely used as a magnetometer in magnetic field
sensing, bio-magnetism, laboratory measurement, geo-magnetics, and other fields. However, this section
will primarily focus on its electrical properties as a fundamental device in circuits, and only present the
critical steps. A more detailed derivation can be found in [12], [13].
11
Figure 1.7: Superconducting Quantum Interference Device (SQUID): (a) Circuit schematic; (b) Structure
diagram.
First of all, we know the wave function of the structure has to be single valued implying the integration
of the phase differences along the SQUID circle must be integer multiple of 2π :
I
∆ θd
⃗
l =2nπ =φ
2
− φ
1
+
2π Φ 0
(LI
circle
+Φ external
) (1.8)
whereφ
2
andφ
1
are the phase differences across the two junctions, LI
circle
is the phase difference along
the inductance loop andΦ external
denotes the externally applied magnetic field which in this circumstance
is zero. If we take a differentiating of the previous equation with respect of time and substitute the Eq. 1.5
along with the inductance law, we can get:
Φ 0
2π (
dφ
1
dt
− dφ
2
dt
)− L
I
circle
dt
=V
J1
− V
J2
− V
L
=0 (1.9)
The SQUID meets the Kirchhoff’s voltage law.
12
We shall now depict the schematic of a SQUID in the context of circuitry, as illustrated in Fig. 1.8.
Here,I
C1
andI
C2
denote the critical currents of the Josephson junctions,I
circle
represents the equivalent
circulating current,I
b
is the bias current, andL=L
1
+L
2
denotes the loop inductance. Furthermore, let
the currents flowing through the two junctions be assumed as I
1
andI
2
. Here we have:
I
b
=I
1
+I
2
=I
c1
sinφ
1
+I
c2
sinφ
2
(1.10)
Several assumptions can be made while keep the generalizability of the derivation: Firstly, it is assumed
Figure 1.8: SQUID in a circuit.
thatI
c1
is greater than or equal toI
c2
; thenI
b
is greater than or equal to zero; andφ
1
andφ
2
are restricted
to the range of(− π/ 2,π/ 2).
φ
2
=arcsin(
I
b
I
c2
− I
c1
I
2
sinφ
1
) (1.11)
φ
1
− φ
2
=φ
1
− arcsin(
I
b
I
c2
− I
c1
I
2
sinφ
1
)=
2π Φ 0
LI
circle
± 2nπ (1.12)
13
The authors of [12], [13] used the previously assumed constraints and symmetry simplification, where
L
1
=L
2
andI
c1
=I
c2
, to derive a relationship between the loop inductanceL and the possible quantum
staten. Specifically, they obtained the following non-linear inequality:
L≥ Φ 0
2π · 2|n|π − π/ 2+arcsin(
I
b
Ic
)
I
c
− I
b
/2
(1.13)
Although this equation is complex, it provides insight into the general trend: larger inductanceL results in
the circuit having the ability to store more states (i.e., flux quantums). In practical applications, designers
often rely on experience rather than numerical calculations to implement the desired function.
1.3.3 FundamentalsofRapidSingleFluxQuantumTechnology
The concept of rapid single flux quantum (RSFQ) logic was first presented by Likharev, Mukhanov and
Semenov in 1985 [8]. The RSFQ logic family introduced a few innovative approaches, including the use of
a very short voltage pulse with a quantized area of
R
V(t)dt=Φ 0
(Fig. 1.9) to represent the binary ’1’, in
contrast to the convention of using DC voltage levels. The RSFQ circuits employ over-damped Josephson
junctions rather than under-damped ones, and are biased through resistors to a constant voltage source.
These innovations provide several unique and advantageous properties to the RSFQ technology. Due to
the short reaction time of the junctions and the duration of the voltage pulse that can be as short as
two picoseconds, RSFQ circuit operation is fast. The single-value V-I curve of an over-damped Josephson
junction ensures circuit resets itself, and the system is DC biased, which is simpler than AC biasing.
Fig. 1.10 depicts a fundamental building block in RSFQ technology used for passing SFQ pulses from
input to output. This block typically consists of two Josephson junctions (J1 and J2), resistive biases (I1 and
I2), and inductors connecting the junctions. The bias is achieved by a fixed resistor connected to a constant
DC voltage source, the junctions are made of two superconductor layers with a thin insulator barrier in
between, and the inductors are simply the parasitic inductors of the superconductor metal lines. The cell is
14
Figure 1.9: A short voltage pulse representing logic ’1’ and the 2π phase leap when a junction produces
the pulse.
named the Josephson transmission line (JTL) as it uses active components (Josephson junctions) to transmit
pulses, in contrast to the passive transmission line (PTL) that we will encounter in later chapters. During
the design phase, the bias current is set to be slightly less than the critical current of the junctions to enable
a pulse from the input to increase the current flowing through J1, causing the transient junction current
to exceed the critical current. The junction then temporarily exits the superconducting state, producing a
pulse that propagates to the next junction. In addition to the JTL cell, the RSFQ logic family also includes
logic and storage blocks such as AND gates, OR gates, D flip-flops (DFFs), and non-destructive readouts
(NDROs) that are necessary to implement a functional system. These blocks perform operations on the
SFQ pulses transmitted by the JTL cells to realize the desired logical or storage functions. However, a
detailed description of these blocks is beyond the scope of this chapter. In the upcoming Chapter 2 section
2.1, we will discuss the design and physical implementation of a standard cell library in RSFQ that includes
all the essential blocks mentioned here.
15
Figure 1.10: Schematic of RSFQ Josephson transmission line (JTL).
1.4 NbBasedJosephsonJunctionProcess
The fabrication of superconducting devices based on niobium has become widely adopted by modern de-
signers. The Institute of Advanced Industrial Science and Technology (AIST) in Japan is a primary provider
of foundry services, with close collaborations with Nagoya University, Yokohama National University, and
Kyoto University. The AIST standard process allows for up to four niobium layers and Josephson junctions
with a critical current density of 2.5kA/cm
2
[14]. They also offer an advanced option with six niobium
metal layers and higher critical current density for Josephson junctions (10kA/cm
2
) [15]. In 2014, they
introduced a new fabrication process with nine layers [16].
Moreover, the MIT Lincoln Laboratory has developed a fabrication process forNb/AlO
x
/Nb Joseph-
son junctions with critical current density of100µA/µm
2
(10kA/cm
2
) and sizes as small as 200 nm [17].
Specifically, the MITLL SFQ5ee process node has eight metal layers available for designers, a resistance
layer, a junction layer, a kinetic inductance layer and a pad layer as shown in Fig. 1.11. Table. 1.1 lists the
available layers for designers, while the via layers used to connect different metal layers, resistance layer,
and junction layer are not shown to provide clarity. The MITLL process offers two options for resistance
layers, but this dissertation utilizes the2Ω /sq version for better yield. It is necessary to mention that all
the works presented in the following chapters use the MITLL SFQ5ee process for design and fabrication.
16
Figure 1.11: Cross section of a wafer fabricated by the SFQ5ee process.
1.5 MagneticJosephsonJunctionsand2-phiJunctions
In approximately 1999, V. V. Ryazanov published an experimental study on the temperature dependence
of critical current in Josephson junctions with a weakly ferromagnetic inter-layer (Cu
x
Ni
1− x
[18], [19]).
Their findings showed a sharp transition in critical current from a maximum to zero for specific thicknesses
of the ferromagnetic layer. They concluded that this behavior can only be explained by a 0-π phase state
change in the junctions. This unique Josephson junction, composed of a Superconductor-Ferromagnetic-
Superconductor (SFS) structure, was subsequently termed as a magnetic JJ (MJJ). MJJ has since been utilized
in various fields, including quantum computing [20] and cryogenic memory [21], among others. Further-
more, this innovative structure has given rise to diverse varieties of new devices, such as π -junctions,
ϕ 0
-junctions, and 2ϕ -junctions. Here we mainly focus on the 2ϕ -junctions which will be used as a key
component of a novel logic family that introduced in later chapter.
Recently, several studies have demonstrated that at the 0-π transition, the fundamental term of the
current-phase relationship (CPR) vanishes, causing high-order harmonic terms to become significant [22].
In another study [23], a single Superconductor-Ferromagnetic-Superconductor (SFS) junction utilizing a
17
Table 1.1: Design and Physical Layers in the MIT LL100µA/µm
2
SFQ5ee
Layer Name Material Usage
L0 MoN
x
High kinetic inductance (≈ 8 pH/sq)
M0 Nb Connection and inductance
M1 Nb Connection and inductance
M2 Nb Connection and inductance
M3 Nb Connection and inductance
M4 Nb Connection and inductance
M5 Nb Connection and inductance
J5 AlO
x
/Nb Josephson junctions
R5 Mo Resistance (R
s
=2Ω /sq or6Ω /sq)
M6 Nb Connection and inductance
M7 Nb Connection and inductance
M8 Au/Pt/Ti Chip pad
Cu
47
Ni
53
alloy barrier was implemented with two parallel superconducting inductors: a readout inductor
and a small shunt inductor. The readout inductor was coupled to a commercial dc superconducting quan-
tum interference device (SQUID) sensor, which detected flux Φ in the readout loop. By measuring the CPR
at different barrier thicknesses and temperatures, [23] demonstrated a π -periodic behavior and eliminated
alternative explanations except for a second-order CPR. The overall CPR was then rewritten as Eq. (1.14),
leading to the discovery of a new device, named the2ϕ -JJ, with a CPR as shown in Eq. (1.15).
J
s
(ϕ )=J
c1
sin(ϕ )+J
c2
sin(2ϕ ) (1.14)
And a new device named2ϕ -JJ comes to light with the CPR shown as:
J
s
(ϕ )=J
c2
sin(2ϕ ) (1.15)
Several interesting observations were made about the2ϕ -JJ. First, the current-phase relation was found
to be in a period ofπ rather than2π . Second, the2ϕ -JJ switches when there is aπ phase jump and produces
a half flux quantum (
1
2
Φ 0
=1.03× 10
− 15
Wb). Therefore, the phase shift of a2ϕ -JJ for every switching is
alsoπ [24]. Table. 1.2 presents a brief summary of the different types of Josephson junctions.
18
Table 1.2: Summary of different types of JJs
JJ types
0-junction
(normal JJ)
π -junction ϕ 0
-junction 2ϕ -junction
CPRs I =I
c
sin(ϕ ) I =− I
c
sin(ϕ ) I ≈ I
c
sin(ϕ +ϕ 0
) I =I
c
sin(2ϕ )
Ground phases 0 π ϕ 0
0
JJ structure SIS SFS SFS SFS
Fab process
example
Nb/Al− AlO
x
/Nb Nb-Cu
0.47
Ni
0.53
-Nb
YBa
2
Cu
3
O
7
-Nb
and other
Nb-CuNi-Nb
∗
1.6 Organization
Chapter 2 introduces two standard cell libraries. The first library, named qSportLib, is based on the normal
Josephson junction and implements several complex logic gates to reduce the logic depth, leading to im-
proved system performance in terms of power and area. The design flow of the library cells is presented,
along with the cell implementations, including schematic, circuit parameters, simulated margins, layout,
and measurement waveform for the cells that have been tested. The second library utilizes a newly dis-
covered device, the2ϕ -junction, and aims for a more power-efficient and smaller design by using smaller
bias and avoiding inductors. The cell implementations are demonstrated, including schematic, circuit pa-
rameters, simulated margins, and layout.
Chapter 3 presents three circuit designs that utilize the newly developed qSportLib. The first design is
an 8-bit multiplier that uses the single-stage carry gate and single-stage sum gate to implement a parallel
array multiplier. The structure of the multiplier, layout, and post-layout simulation are presented. The
second design is an all-digital phase-locked loop (PLL), demonstrating the first all-digital clock synthesizing
circuit. The function blocks used are explained, and Monte-Carlo simulations are performed to validate
the design. The layout of the PLL is presented, and it is shown to provide a 50GHz clock signal from a
∗
Please note that bothϕ andφ are the lowercase of Greek alphabet phi with uppercase ofΦ . They may appear in different
forms in different articles.
19
48.82MHz reference. The third design is a multi-fluxon destructive readout (MFDRO) cell, and its design
and measurement results are presented. Finally, a high-capacity random access memory (RAM) design and
an error-resistant RAM design are illustrated to demonstrate the potential application of the MFDRO.
Chapter 4 proposes a novel cell characterization scheme. First, a manually controlled demultiplexer
design is introduced. Then, prototypes of cell delay measurement, setup time measurement, and hold time
measurement are exhibited with the system schematic and layout. Monte-Carlo simulations are performed
on the post-layout design to validate the proposed idea.
Finally, chapter 5 concludes the dissertation.
20
Chapter2
CellDesignandCellLibraries
2.1 StandardCellLibraryDesigninRapidSingleFluxQuantumTechnology
In this section, a standard cell library in RSFQ technology named qSportLib is presented including the cell
design flow, cell connection strategy, schematic design, structure of passive transmission lines, and cell
layouts. Library cells have been fabricated using MITLL SFQ5ee process and their functionality verified.
In addition, performance and critical margin data for the qSportLib cells are reported. The synthesis result
is also provided to demonstrate the saving by using complex logic gate.
2.1.1 BackgroundandPriorWork
Single flux quantum (SFQ) technology as a promising candidate for the next generation very-large-scale in-
tegration (VLSI) circuits has been drawing attention of many researchers. More precisely, rapid single-flux
quantum (RSFQ) circuits have demonstrated very high operation rate and ultra-low power consumption.
The RSFQ logic cells comprise of Josephson junctions (JJs) and inductors forming superconducting loops
where each JJ exhibits an intrinsic switching time of 1 to 2ps with a dynamic switching energy consump-
tion as low as10
− 19
J/bit. There is a large number of publications related to the RSFQ circuit technology
including circuit and system design [25]–[27], layout design [28], [29] and electronic design automation
(EDA) tools [30], [31].
21
Taking advantage of abundant logic and memory resources and powerful EDA tools, CMOS circuit and
system design (especially digital design) has become fully automated, which enables designers to focus on
providing more complex functionality using more advanced architectures. The conventional CMOS circuit
design flow, which starts with behavioral and/or RTL synthesis and ends with placement and routing) relies
heavily on the availability of a highly-optimized and well-characterized library of standard logic cells.
Standard cell library offers the designers various logic cells (including some complex logic blocks for
datapath computation.) The performance of these cells under different input waveforms and output loading
conditions are characterized by doing detailed circuit simulation. In addition, the critical margins are
calculated based on process spread under various operating conditions e.g., biasing voltage levels. A good
cell library design should take several aspects into account including timing characterization data (for
example clock-to-Q delay, setup and hold times, etc.), robustness to process and/or operating condition
variabilities, power dissipation (both static and dynamic), area footprint, and so on. In this section, we
introduce a methodology for designing a RSFQ standard cell library, named qSportLib. The standard cell
library targets the MIT Lincoln Lab’s SFQ5ee process [17] with eight routing metal layers, a pad layer, a
resistance layer, and a Josephson junction layer with 100µA/µm
2
critical current density. Fig. 2.1 is the
legends of the design layers of MITLL process in the layout editing tool (Cadence LayoutEditor) where the
’Mx’ denotes the metal layers and ’Ix’ is the name for via layer. For the other layers: ’L0’ is the high kinetic
inductance layer; ’C0’ is the contact layer of ’L0’ with ’M1’; ’J5’ is the Josephson junction layer; ’R5’ is the
resistance layer; ’C5J’ is the contact layer of ’J5 with ’M6’; ’C5R’ is the contact layer of ’R5’ with ’M6’; ’M8’
is the chip pad layer; ’TEM’ and ’PORT’ are virtual layers for parameter extraction purpose.
2.1.2 StandardCellLibraryGeneralDescription
Fig.2.2 shows the design flow for the qSportLib cell library. First, a circuit schematic for a target cell
is generated and its correct functionality is confirmed by simulation tools like JSIM [32], JoSIM [33] or
22
Figure 2.1: Design flow for a cell in qSportLib.
WRspice [34]. At this stage, we only care about the circuit structure and the correct functionality without
optimizing specific circuit parameters such as the bias current value, inductance values, and junction sizes.
Next, we move to the stage where the circuit netlist will be tuned to have good margins with respect to
all circuit parameters. After this optimization, circuit functionality is again checked. Next, a layout is
generated that adheres to the optimized circuit parameter values while meeting the design rule constraints.
To accomplish this task, a parameter extraction tool such as InductEx [35] is used to extract values of the
drawn layout and again circuit simulation is performed. The designer must iterate between layout design,
extraction, and circuit simulation until the extracted circuit parameter values match those of the optimized
netlist within an acceptable error tolerance of 1− 5%. The library cell design is thus completed with a
netlist file with final extracted parameter values and a cell layout that passes the design rule check (DRC).
Josephson transmission line (JTL) and passive transmission line (PTL) are the two types of connections
used in a RSFQ circuit. Compared with the metal wire connections in complementary metal–oxide–semiconductor
(CMOS) system, both of the RSFQ connection blocks require a well-defined physical environment and com-
ponents that can not be placed freely. For example, a JTL connection consists of active JJs that can only
be placed in the same layer as the logic cells themselves. Moreover, the characteristic impedance of a PTL
connection should match with those of the corresponding PTL driver and receiver. To facilitate a practical
way of doing the physical design of a RSFQ circuit, library cells must be designed to lie on a grid as in [36].
In our design flow, the size of a grid unit is 25µm -by-25µm .
23
Figure 2.2: Design flow for a cell in qSportLib.
Fig.2.3 shows our cell abutment strategy. There are reserved pin locations on edges of the grid unit.
A bias guard ring (including the power and ground layers) surrounds the cell so that a power mesh and
ground plane are automatically created when placing the cells next to each. Details of the layout strategy
will be provided in later section. For cells with relatively simple structures, such as the DFF, JTL and PTL
driver and receiver, only one grid unit is sufficient. However, for larger cells such as NDRO, INV, AND, and
OR gate, multiple grid units are used. Fig.2.3 demonstrates how a small circuit is built by abutting various
logic cells without the need for any external connections.
Figure 2.3: Diagram of a small system using abutment cells
24
2.1.3 CellsinqSportLib
Table 2.1 contains a list of the implemented cells and connection blocks including AND, OR, XOR, DFF,
etc., Josephson transmission blocks, and passive transmission blocks. More specifically, the Josephson
transmission blocks consist of a splitter that receives one signal from the input port and sends out two SFQ
pulses through the two output ports, a merger that combines two incoming signals so that the output will
produce a SFQ pulse whenever an input signal arrives on either of the two input ports, a buffer which only
allows a signal to pass through in one direction (input to output), and a Josephson transmission line (JTL)
that is symmetric with respect to its input and output ports so that it can pass the signal in both directions.
Passive transmission blocks consist of passive transmission line (PTL) as well as its transmitter/driver
(PTLTX) and receiver (PTLRX).
Table 2.1: Summary of the cells in qSportLib
cell JJ count
current
(µA )
delay
(ps)
area
(µm
2
)
setup time
(ps)
hold time
(ps)
JTL 1 90 2.0 625 / /
Buffer 2 90 4.0 625 / /
Splitter 3 309 10.1 625 / /
Splitter-3 7 730 10.2 1250 / /
Merger 6 375 4.0 625 / /
DFF 4 212 2.0 625 1.0 /
AND2 11 530 4.5 2500 / 4.11
OR2 9 475 3.5 2500 4.03 /
XOR2 8 435 7.18 2500 / 2.7
INV 9 546 10.24 2500 8.15 /
TFF 11 601 6.8 2500 / /
NDRO 11 875 10 2500 / /
PTLTX 2 265 3.5 625 / /
PTLRX 2 252 3.5 625 / /
DC/SFQ 3 450 10 1250 / /
SFQ/DC 10 1025 10 2500 / /
PTL 0 0 10.54/mm 625 / /
AB+CD 26 1507 12.57 5400 / 11.0
(A+B)(C+D) 18 1157 7.0 3600 2.18 /
ABCD 29 1254 11.95 5400 / 5.5
A+(BC) 20 1104 9.9 5400 11.9 /
(A+B)C 15 1008 6.9 3600 10.4 /
25
2.1.3.1 JosephsonTransmissionLine(JTL)
Fig. 2.4 is the schematic of the Josephson transmission line. In fact, the JTL consists of a series of repeatable
structure with inductors and junctions while the structure showing in Fig. 2.4 is the minimum unit. We
made this the smallest block in our cell library so that as described in previous section the abutment
connection can be utilized to integrate the system conveniently. Table. 2.2 is a list of the circuit parameters
with their values and simulated margins. The inductors are implemented by ’M6’ (metal 6) layer and the
biasing current comes from the power supply frame surrounding the cell through a resistor to the junction.
Fig. 2.5 is a screenshot of the layout where the ’M4’ and ’M7’ layers are used as ground floor and ceiling
which serve as the electromagnetic (EM) shielding along as the rectangular via structure at edges of the
cell. In addition, the cut-offs of the ’M4’ and ’M7’ layers are the so called ’moat’ whose function is to
prevent the flux being trapped in the circuit components which will then influence the biasing and cause
a shifting off from the normal operation or malfunction. In the end, Fig. 2.6 and Fig. 2.7 are the waveform
plots from simulation and testing respectively.
Figure 2.4: Schematic of JTL. Figure 2.5: Layout of JTL.
26
Table 2.2: Circuit parameter list of JTL
Circuit parameter Value Margin
L1 2.0pH − 50% 50%
J1 130µA − 46% 34%
L2 2.0pH − 50% 50%
I1 90µA − 40% 50%
Figure 2.6: Simulation waveform of JTL. Figure 2.7: Test waveform of JTL.
2.1.3.2 Buffer
Fig. 2.8 is the schematic of the buffer cell. For the buffer cell, the bias current flows from the right to left
through J2 and up to bottom of J1. When a pulse comes from the input on left side of the cell, J1 will be
triggered first and then produce an output pulse propagating to the output. Because the direction of this
pulse is opposite with the bias current of J2, J2 remains quiet and the pulse will come out at the ’OUT’ port.
But when a pulse comes from the output and propagates toward left. Now the bias current of J2 is at the
same direction of the incoming pulse, so that J2 will be triggered and cancel the pulse coming from output
port. By this, the buffer cell serves as a one direction valve block. Fig. 2.9 shows the layout of the buffer
where the input port is at left side of the block and output port is at the bottom. This is also an example
of how we configure the corner or curve cell like shown in abutment connection section. Fig. 2.10 and
Fig. 2.11 are the plots of simulation and testing waveform in which we can clearly see that one direction
transmission is allowed while the opposite direction transmission is blocked.
27
Figure 2.8: Schematic of buffer. Figure 2.9: Layout of buffer.
Table 2.3: Circuit parameter list of buffer
Circuit parameter Value Margin
L1 3.6pH − 50% 50%
J1 139µA − 32% 44%
J2 106µA − 35% 50%
L2 5.8pH − 50% 50%
I1 90µA − 50% 43%
Figure 2.10: Simulation waveform of buffer. Figure 2.11: Test waveform of buffer.
28
2.1.3.3 Splitter
Fig. 2.12 is the schematic of the splitter cell. The splitter cell receives an input pulse which makes the J1 flip.
The induced current will then be divided into two branches through L2 and L3. The bias currents of the J2
and J3 are slightly higher than the normal bias so that they can be triggered by smaller current. Thus, one
input pulse will be reproduced by two output ports. This cell solves the fan-out issue of RSFQ technology.
Whenever a signal needs to drive multiple blocks, for example the clock distribution, a splitter network is
used. Fig. 2.13 shows the layout of the splitter. Fig. 2.14 is the plot of simulation waveform. Because our
testing partner only reported the bias margin, the testing waveform is missed here. The splitter shows a
bias margin of13%.
Figure 2.12: Schematic of splitter. Figure 2.13: Layout of splitter.
Table 2.4: Circuit parameter list of splitter
Circuit parameter Value Margin
L1 2.0pH − 50% 50%
J1 129µA − 50% 50%
L2,L3 2.5pH − 50% 50%
J2,J3 140µA − 35% 50%
L4,L5 2.5pH − 50% 50%
I1 90µA − 50% 43%
I2,I3 119µA − 45% 43%
29
Figure 2.14: Simulation waveform of splitter.
2.1.3.4 Splitter-3
Fig. 2.15 is the schematic of the splitter-3 cell where only one is labeled for the symmetric components to
keep the figure clean. Splitter-3 cell works exactly like the splitter cell above except that it can replicate the
input pulse at three output ports. So in terms of the fan-out capacity, splitter-3 is definitely more efficient
than splitter. Fig. 2.16 shows the layout of the splitter-3. Table. 2.5 is a list of the circuit parameters. Fig.
2.17 and Fig. 2.18 are the plots of simulation and testing waveform. For the test waveform, only one of the
outputs is tapped out to the pad due to the limit of pad number. The splitter shows a bias margin of17.2%.
Figure 2.15: Schematic of splitter-3. Figure 2.16: Layout of splitter-3.
30
Table 2.5: Circuit parameter list of splitter-3
Circuit parameter Value Margin
L1 2.0pH − 50% 50%
J1 150µA − 50% 50%
L2 0.8pH − 50% 50%
L3 4.2pH − 50% 50%
J2 150µA − 50% 50%
L4 4.5pH − 50% 50%
J3 130µA − 50% 50%
L5 3.0pH − 50% 50%
I1 153µA − 50% 43%
I2 125µA − 45% 43%
I3 62.5µA − 45% 43%
Figure 2.17: Simulation waveform of splitter-3. Figure 2.18: Test waveform of splitter-3.
2.1.3.5 Merger
Fig. 2.19 is the schematic of the merger cell. Merger cell as its name merges two input branches to one
output. The cell will give out a pulse when either of the inputs receive a signal. Meanwhile, the serial
junction at the merging branch J2 serves as a buffer junction which can cancel the back propagation pulse
coming from the other input branch. For example, when there is a signal pulse coming from ’B’, without
J2, there will be a pulse coming out of port ’A’ as well. In addition, a unique property of the merger cell
is that, when the input pulses of two branches (’A’ and ’B’) arrive at the ports within a certain short time
interval, only one pulse will be given out by the cell. The designers should carefully deal with this during
the implementation and sometime can use this to fulfill some special functions. Fig. 2.20 shows the layout
31
of the merger cell. Fig. 2.21 is the plot of simulation waveform. Because our testing partner only reported
the bias margin for merger, the testing waveform is missed here. The merger shows a bias margin of8%
in the testing.
Figure 2.19: Schematic of merger. Figure 2.20: Layout of merger.
Table 2.6: Circuit parameter list of merger
Circuit parameter Value Margin
L1 2.0pH − 50% 50%
J1 129µA − 50% 50%
L2,L3 2.5pH − 50% 50%
J2,J3 140µA − 35% 50%
L4,L5 2.5pH − 50% 50%
I1 90µA − 50% 43%
I2,I3 119µA − 45% 43%
2.1.3.6 DFlip-flop(DFF)
Fig. 2.22 is the schematic of the D Flip-flop (DFF) cell. In chapter 1.3 section 1.3.2, we calculated the
relationship of the loop inductance L with the possible quantum state number of the SQUID where we
know that there could be more than one stats allowed if we make the loop inductance larger. Thus here
for the storage block DFF, L2 is designed to be relative larger (compared with the inductor in JTL) so that
32
Figure 2.21: Simulation waveform of merger.
the loop J1-L2-J2 can storing one fluxon. When a input pulse arrives, it will be stored in the loop and leads
a increasing of the loop current with a clockwise direction (J1->L2->J2->J1). After that, a clock signal
coming from the ’CLK’ port will then trigger the J2 junction to flip, producing a pulse to the next junction
J4 and cancelling the stored flux. By doing so, the DFF cell will then return to the initial state waiting for
the next input. Another case is that when the clock signal comes without an input pulse arrived before.
At this circumstance, the current of J2 is designed so that it can not be triggered purely by the clock pulse.
Instead, a smaller junction J3 is flipped and cancels the pulse from ’CLK’ port. The DFF cell stays at the
initial state as well. Fig. 2.23 shows the layout of the DFF. Table. 2.7 is a list of the circuit parameters. Fig.
2.24 and Fig. 2.25 are the plots of simulation and testing waveform. The splitter shows a bias margin of
81.3%.
2.1.3.7 AND2Gate
After we have the merger and DFF designs, now we can move to the first logic gate in RSFQ technology
which is the AND gate. The cell is named as AND2 for two-input AND gate. It is possible to implement an
AND gate with three inputs but for this library it is not needed. Fig. 2.26 is the schematic of the AND2 cell.
With a careful stare, one can easily find that the AND gate is actually made up with two DFFs on the input
33
Figure 2.22: Schematic of DFF. Figure 2.23: Layout of DFF.
Table 2.7: Circuit parameter list of DFF
Circuit parameter Value Margin
L1 4.1pH − 50% 50%
J1 114µA − 50% 50%
L2 8.0pH − 22% 50%
J2 129µA − 26% 44%
L3 3.8pH − 50% 50%
J3 80µA − 50% 50%
L4 6.0pH − 50% 50%
J4 148µA − 31% 50%
L5 4.2pH − 50% 50%
I1 103µA − 49% 43%
I2 109µA − 29% 43%
branches, one splitter for the clock branch and a merger which finishes the function. The mechanism is
that when a clock signal arrives at the port, the input branches will either give out a fluxon or remain quiet
according to the input signals before clock like the DFF function. Later, the flux induced current will added
up on junction J5 and the bias of J5 is set to be that only two fluxons can trigger it. Therefore, the AND
logic is implemented. The Fig. 2.27 shows the layout of the AND2 and the circuit parameters are shown
in Table. 2.8. Fig. 2.28 is the plot of simulation waveform. We did not test the AND2 gate individually but
34
Figure 2.24: Simulation waveform of DFF. Figure 2.25: Test waveform of DFF.
instead a full adder mini system is tested. And the bias margin of AND2 gate can not be smaller than that
of the full adder which is27%.
Figure 2.26: Schematic of AND2. Figure 2.27: Layout of AND2.
2.1.3.8 OR2Gate
OR2 gate shares the same idea as the AND2 gate with a slight change. As the Fig. 2.29 shows, it also
consists of a merger which merges the current and the clock branch is moved to the later stage after the
merger which forms a DFF. By doing so, we saved the splitter used previously in AND2 gate and the two
35
Table 2.8: Circuit parameter list of AND2
Circuit parameter Value Margin Circuit parameter Value Margin
L1 3.4pH − 50% 50% J1 155µA − 50% 50%
L2 6.4pH − 43% 44% J2 221µA − 38% 38%
L3 3.1pH − 50% 50% J3 123µA − 46% 35%
L4 3.06pH − 50% 50% J4 83µA − 50% 50%
L5 4.9pH − 50% 50% L6 3.0pH − 50% 50%
J5 102µA − 50% 50% L6 4.5pH − 50% 50%
J6 93µA − 50% 50% L7 5.1pH − 50% 50%
I1 113µA − 50% 50% I2 62.5µA − 50% 50%
I3 62.5µA − 50% 50% I4 92.5µA − 50% 50%
Figure 2.28: Simulation waveform of AND2.
DFFs at input branches. However, now we need to deal the dual input pulses issue which means when the
two input signal arrive at different time but within one clock period, without J3, the J4 would be triggered
twice and produce two fluxons to the J4-L3-J5 loop. If so, the cell will give out the output pulse without the
need of clock signal which is not what we want. So J3 is here to cancel the second input pulse therefore
solves the mentioned issue. The Fig. 2.30 shows the layout of the OR2 and the circuit parameters are shown
in Table. 2.9. Fig. 2.31 is the plot of simulation waveform. Again we did not test the OR2 gate individually
but instead a full adder mini system is tested. And the bias margin of OR2 gate can not be smaller than
that of the full adder which is27%.
36
Figure 2.29: Schematic of OR2. Figure 2.30: Layout of OR2.
Table 2.9: Circuit parameter list of OR2
Circuit parameter Value Margin Circuit parameter Value Margin
L1 1.25pH − 50% 50% J1 179µA − 32% 31%
L2 3.4pH − 43% 44% J2 105µA − 32% 50%
L3 12pH − 31% 50% J3 97µA − 41% 50%
L4 2.7pH − 50% 50% J4 105µA − 38% 50%
L5 4.9pH − 50% 50% J5 106µA − 41% 50%
L6 4.0pH − 50% 50% J6 93µA − 50% 35%
J7 121µA − 28% 50% I1 125µA − 50% 50%
I2 92µA − 38% 50% I3 54µA − 50% 50%
Figure 2.31: Simulation waveform of OR2.
37
2.1.3.9 XOR2Gate
The key component in XOR2 cell is the J3 junction as shown in the Fig. 2.32. When an input pulse comes
at ’A’, firstly it will trigger the input junction (J1) and be stored as a loop current through the input branch
J1, L2 to J3, J5. Again, this current can not trigger the J5 by itself and it needs to wait for the clock signal to
release a fluxon to the next stage. When there is another input pulse coming from ’B’, another loop current
will be added along the symmetric path as the previous signal from ’A’. Then the key junction J3 will be
triggered before J5 and reset the circuit. Therefore, the ’11’ input combination can produce no output as
well as the ’00’ combination. Only ’01’ or ’10’ can provide enough current for J5 to flip at the arrival of
clock and avoid the triggering of J3 which fulfill the XOR function. The Fig. 2.33 shows the layout of the
XOR2 and the circuit parameters are shown in Table. 2.10. Fig. 2.34 and Fig. 2.35 are the plots of simulation
and tested waveform. The tested bias margin is37.4%.
Figure 2.32: Schematic of XOR2. Figure 2.33: Layout of XOR2.
38
Table 2.10: Circuit parameter list of XOR2
Circuit parameter Value Margin Circuit parameter Value Margin
L1 3.5pH − 50% 50% J1 116µA − 50% 50%
L2 9.5pH − 50% 50% J2 108µA − 29% 34%
L3 9.2pH − 31% 50% J3 120µA − 32% 26%
L4 2.0pH − 50% 50% J4 102µA − 28% 25%
L5 4.7pH − 50% 50% J5 107µA − 32% 26%
L6 4.0pH − 50% 50% J6 153µA − 38% 38%
J7 121µA − 38% 50% I1 78µA − 34% 50%
I2 131µA − 34% 50% I3 54µA − 50% 50%
Figure 2.34: Simulation waveform of XOR2. Figure 2.35: Simulation waveform of XOR2.
2.1.3.10 Inverter
Fig. 2.12 is the schematic of the inverter cell. The key structure here is the loop formed by L3-J6-L4-J7.
When an input pulse comes, the fluxon produced by J1 will stored in the loop as a counter-clockwise loop
current which increases the bias level of J7 so that the arriving clock signal will trigger J7 instead of J5. And
the J7 releases a fluxon which cancel the stored fluxon and reset the current. On the other hand, a clock
signal without the loop current induced by the input will directly trigger the J8 and propagate through L5
to the output port. 2.38 is the plot of simulation waveform with the circuit parameters in Table. 2.11. Fig.
2.37 shows the layout of the inverter. There is a potential risk in this design that the initial state of the
loop is weak to the process variation or flux trapping. So in practice, the inverting function is commonly
implemented by the XOR gate with a splitter tapping the clock signal to one of the input ports.
39
Figure 2.36: Schematic of inverter. Figure 2.37: Layout of inverter.
Table 2.11: Circuit parameter list of inverter
Circuit parameter Value Margin Circuit parameter Value Margin
L1 5.9pH − 50% 50% J1 175µA − 31% 50%
L2 2.2pH − 50% 50% J2 144µA − 50% 50%
L3 9.9pH − 31% 50% J3 83µA − 50% 50%
L4 4.9pH − 50% 50% J4 70µA − 50% 50%
L5 4.0pH − 50% 50% J5 100µA − 26% 50%
J6 67µA − 50% 46% J7 130µA − 50% 50%
J8 173µA − 48% 50% J9 159µA − 50% 50%
I1 83µA − 50% 41% I2 100µA − 50% 50%
I3 125µA − 38% 50% I4 74µA − 50% 50%
Figure 2.38: Simulation waveform of inverter.
2.1.3.11 ToggleFlip-flop(TFF)
The state of a TFF toggles between ’0’ and ’1’ any time an input signal arrives. The TFF gives an output
pulse when its internal state returns to ’0’ from ’1’ like shown in Fig. 2.39. The structure of the TFF is
similar with inverter that has a internal superconducting loop which can be set and rest. Fig. 2.40 is the
40
schematic of TFF where the internal superconducting loop mentioned is formed by J5-L6-J2-L3-J7-L9-L8-
L7. When an input pulse comes at ’IN’, it induces a clock-wise current in the loop. And the next input
pulse will trigger the J3 junction producing the output pulse and reset the loop. Meanwhile, the loop can
also be reset by a pulse coming from the ’RESET’ port. The Fig. 2.41 shows the layout of the TFF and the
circuit parameters are shown in Table. 2.12 while some of the uncritical parameters are hidden. Fig. 2.42
and Fig. 2.43 are the plots of simulation and tested waveform. The tested bias margin is45.2%.
Figure 2.39: Diagram of the logic of TFF.
Table 2.12: Circuit parameter list of TFF
Circuit parameter Value Margin Circuit parameter Value Margin
L1 2.5pH − 50% 50% J1 181µA − 50% 44%
L2 2.5pH − 50% 50% J2 104µA − 50% 32%
L3 2.2pH − 50% 50% J3 97µA − 50% 50%
L4 2.0pH − 50% 50% J4 200µA − 47% 46%
L5 2.7pH − 50% 50% J5 106µA − 49% 44%
L6 2.0pH − 50% 50% J7 149µA − 41% 26%
J8 100µA − 29% 37% I1 147µA − 49% 50%
I2 174µA − 35% 50% I3 92.5µA − 50% 50%
I2 178µA − 50% 34%
41
Figure 2.40: Schematic of TFF. Figure 2.41: Layout of TFF.
Figure 2.42: Simulation waveform of TFF. Figure 2.43: Simulation waveform of TFF.
2.1.3.12 Non-destructiveReadout(NDRO)
Fig. 2.44 is the schematic of the NDRO cell. The function mechanism can be explained like this: The top
branches in the figure (’SET’ and ’RESET’) control the bias current flowing through L3-J5-L4 to the L5-
J7-J6. When the cell is set, which means a pulse came before to the ’SET’ port, the fluxon forms a loop
current from J2-L1 to the path we just mentioned and therefore adds the bias current of J7 from right to
left. In this case, a ’READOUT’ signal can not trigger the J7 junction, instead it will directly make the
J8 flip and produce an output. However, a ’RESET’ pulse leads to a flip of J5 junction and reset the bias
42
control branch. At this time, the counter direction bias of J7 is not strong enough to resist the flipping
caused by ’READOUT’ pulse. Thus the J7 will be triggered and cancel the ’READOUT’ signal, preventing
the J8 from giving an output at the same time. 2.46 is the plot of simulation waveform with the circuit
parameters in Table. 2.13 while some of the uncritical parameters are hidden. Fig. 2.45 shows the layout
of the NDRO. Because our testing partner only reported the bias margin, the testing waveform is missed
here. The splitter shows a bias margin of10.7%.
Figure 2.44: Schematic of NDRO. Figure 2.45: Layout of NDRO.
Table 2.13: Circuit parameter list of NDRO
Circuit parameter Value Margin Circuit parameter Value Margin
L1 2.3pH − 50% 50% J1 267µA − 50% 43%
L2 2.1pH − 50% 50% J2 237µA − 41% 50%
L3 3.3pH − 31% 50% J4 283µA − 50% 29%
L4 2.5pH − 50% 50% J6 297µA − 41% 50%
L5 2.0pH − 50% 50% J7 78µA − 50% 35%
J6 67µA − 50% 46% J9 133µA − 41% 29%
I1 156µA − 50% 46% I2 87µA − 50% 38%
43
Figure 2.46: Simulation waveform of NDRO.
2.1.3.13 UsePassiveTransmissionLine(PTL)
The robust JTL can be used for the communication of signals between logic gates. However, it has several
disadvantages: first, the JTL cell utilizes the Josephson junctions to reproduce the signal pulse during
transmission which consumes quite a large portion of the power considering the chip scale; second, even
though one can extend the cover distance of a JTL cell by using wider metal lines and larger junctions, there
is still a limit for the distance of a two-junction JTL unit structure which is about50µm ; third, due to the
usage of active components, JTL cell can never be placed overlapping with the logic gates or memory cells
which undermines the integration level badly. Therefore JTL is only suitable for short distance cell-to-cell
connections and we need a more efficient method for the far-reach connections ( > 200µm ). The passive
transmission line is the solution given by the researchers. Each PTL connection consists three parts: the
PTL driver or transmitter (PTLTX), the PTL and the PTL receiver (PTLRX). We will introduce the design
and physical implementation of those three parts in this section.
2.1.3.14 PassiveTransmissionLineTransmitter/Driver(PTLTX)
The PTLTX is nothing buy a JTL-like structure with a serial resistor at the output port as shown in Fig.
2.47. There are mainly two reasons for the serial resistor: one is to match the impedance with PTL; another
44
is to consume the residual pulse caused by impedance mismatching. The parameters are shown in the list
of Table. 2.14. Please note that the McCumber parameter of J2 is also modified by changing the shunt
resistance (RJ2) for a better impedance matching.
Figure 2.47: Schematic of PTLTX. Figure 2.48: Layout of PTLTX.
Table 2.14: Circuit parameter list of PTLTX
Circuit parameter Value Margin
L1 1.6pH − 50% 50%
J1 133µA − 44% 50%
L2 1.5pH − 50% 50%
J2 141µA − 38% 50%
RJ2 6.8Ω A − 50% 50%
L3 2.0pH − 50% 50%
R 1.27Ω A − 50% 50%
I1 167µA − 50% 26%
I2 80µA − 50% 41%
2.1.3.15 PassiveTransmissionLineReceiver(PTLRX)
Similar to PTLTX, the PTL receiver also applies a JTL-like structure and the input junction (J1) is manip-
ulated to match the impedance.
45
Figure 2.49: Schematic of PTLRX. Figure 2.50: Layout of PTLRX.
Table 2.15: Circuit parameter list of PTLRX
Circuit parameter Value Margin
L1 5.3pH − 50% 50%
J1 135µA − 28% 23%
L2 5.75pH − 50% 50%
J2 139µA − 50% 50%
L3 5.08pH − 50% 50%
RJ1 7.18Ω A − 50% 50%
I1 147µA − 31% 20%
I2 71µA − 50% 50%
2.1.3.16 PassiveTransmissionLine(PTL)
PTLs have been thoroughly studied in the literature, see for example[37], [38] where the authors examine
several PTLs done on different layers with different characteristic impedance values ( Z
0
), structures, and
widths. In qSportLib, we have implemented three kinds of PTLs with the signal paths realized on the M1,
M3 or M5 layers. These PTLs utilize a sandwich structure as shown in Fig.2.51. Taking the M3-based
PTL as an example, the layer below (M2) and the layer above (M4) are used as ground shields, which help
isolate the external interference and ensure a desired characteristic impedance value. There are stitches
every25µm that connect the two ground shields together. The M1 and M5 PTL have a similar sandwich
structure where for the M1 PTL, M0 and M2 serve the ground shields whereas for the M5 PTL, M4 and
46
M6 serve as the ground shields. The layout examples are shown in Fig. 2.53 and Fig. 2.54. Since various
metal layers in the MIT Lincoln Lab process have different dielectric characteristics, to achieve a target
impedance value (which is set to 8 ohms in qSportLib), the PTLs on M1, M3 or M5 layers assume different
widths as shown in Table 2.16 withW
g
andW
s
denoting the widths of the ground shield metals and the
signal path, respectively.
Figure 2.51: Diagram of a PTL with stitched ground shields above and below
Table 2.16: Parameter values for the M1, M3, and M5-based PTLs
PTL layer W
g
W
s
Z
0
M1 7.85µm 2.85µm 8.0
M3 6.5µm 2.9µm 8.0
M5 9.7µm 3.7µm 8.0
The PTL contact cell is another critical component when using PTL connections in the circuit. The
contact cell is needed to enable changing the routing layers used as the main signal path. In the qSportLib,
we have designed three PTL contact cells: M1-M3, M3-M5, and M1-M5. Similar to the PTL design itself, the
key consideration for designing a PTL contact cell is to maintain impedance matching while continuing
to shield the signal path. Fig.2.52 shows an example cross-sectional view of a PTL contact from the M5
PTL to the M3 PTL. The signal path goes from M5 to M4 through the I4 via and continues to M3 through
the I3 via. Notice that the ground shield layers must continue to keep guarding the signal path: the top
47
M6 ground shield thus goes to the top M4 ground shield layer whereas the bottom M4 ground shield goes
to the bottom M2 ground shield layer. Considering that in many practical routing scenarios, adjacent PTL
layers will be used for two perpendicular routing directions (i.e., horizontal and vertical), we have also
implemented two PTL contact cells where the upper and lower PTLs are oriented at a 90-degree angle
with respect to each other as shown in the Fig. 2.55.
Figure 2.52: Diagram of the passive transmission line (PTL) contact
Figure 2.53: Layout of Metal 1 PTL(bottom) and the
corner (right upper).
Figure 2.54: Layout of the Metal 3 (bottom) and
Metal 5 (upper) PTL.
48
Figure 2.55: Layout of the M3-M5 PTL contact-and-corner (left bottom) and M1-M3 PTL contact-and-
corner (right upper).
2.1.3.17 DC/SFQConverter
An SFQ pulse duration can be as short as two pico-second. It is obvious that such a rapid transient signal
can not be used directly for off-chip communication. In practice, special I/O cells are used to convert
the slow changing DC signal to SFQ signal (DC/SFQ converter) and SFQ signal to DC signal (SFQ/DC
converter). Fig. 2.56 is the schematic of the DC/SFQ converter. The block takes current input and produce
an SFQ pulse per rising edge. The50Ω resistor at the input path is used to provide the desired current from
a voltage input source connected to the chip pad. When the input current starts to rise, due to the dynamic
49
property of inductor L2, most of the transient current goes to the right through J2 to ground. Therefore
the J2 junction will be triggered. When the input current settles, L2 now becomes low impedance (actually
zero impedance because this is made by superconducting material) and the current was lead to ground
through L2 instead of J2. Table. 2.17 is a list of the circuit parameters with their values and simulated
margins. Fig. 2.57 is a screenshot layout. In the end, Fig. 2.58 is the waveform plot from simulation.
Figure 2.56: Schematic of DC/SFQ. Figure 2.57: Layout of DC/SFQ.
Table 2.17: Circuit parameter list of DC/SFQ
Circuit parameter Value Margin
L1 2.5pH − 50% 50%
L2 7.8pH − 50% 50%
J1 132µA − 44% 16%
J2 100µA − 50% 50%
L3 2.2pH − 50% 50%
J3 149µA − 50% 50%
L4 0.7pH − 50% 50%
I1 156µA − 50% 19%
2.1.3.18 SFQ/DCConverter
Fig. 2.59 is the schematic of the SFQ/DC converter. Though the schematic showing in the figure looks quite
complicate, the idea is simple. The junctions from J1 to J7 forms a TFF-like structure like we mentioned
before. Therefore the internal state of that structure will toggle between two different stats at every input.
We just need to monitor the internal state by tapping out through the J8 path to the output. A simulation
waveform demonstrates the operation of SFQ/DC cell in Fig. 2.61 where we can see the output level is
50
Figure 2.58: Simulation waveform of DC/SFQ.
constant and low initially. Later, it starts oscillating when an input pulse came and return to zero after
the second input’s arrival. In practice, we can utilize a low-pass filter which could be as simple as a 50Ω
resistor connecting after the output port to filtering out the high frequency components. Table. 2.18 is a
list of the circuit parameters with their values and simulated margins and Fig. 2.60 is the layout.
Figure 2.59: Schematic of SFQ/DC. Figure 2.60: Layout of SFQ/DC.
2.1.3.19 AB+CD
The utilization of a single-stage gate to execute complex logic is a beneficial approach. The schematic of
the (AB)+(CD) gate is presented in Fig. 2.62. The initial portion of the gate encompasses two structures
resembling AND gates. Upon arrival of a SFQ pulse at either input port, such as port A, the J16 junction is
51
Table 2.18: Circuit parameter list of SFQ/DC
Circuit parameter Value Margin Circuit parameter Value Margin
L1 3.8pH − 50% 50% J3 203µA − 25% 26%
L2 1.8pH − 50% 50% J4 188µA − 32% 50%
L3 1.4pH − 50% 50% J5 161µA − 50% 32%
L4 1.5pH − 50% 50% J6 325µA − 13% 50%
L7 1.9pH − 50% 50% J7 235µA − 50% 35%
L9 1.6pH − 44% 35% J8 110µA − 50% 23%
J1 266µA − 38% 20% J9 228µA − 26% 43%
J2 216µA − 50% 26% J10 230µA − 19% 31%
R1 4.7Ω − 50% 50% I1 324µA − 25% 26%
I2 156µA − 50% 43% I3 227µA − 50% 34%
I4 312µA − 50% 50%
Figure 2.61: Simulation waveform of SFQ/DC.
triggered, resulting in the production of a looping current at J16-L18-J18. At the arrival of the clock signal,
J18 is triggered, and the current merges with its counterpart from branch B at J21. J21 is designed to require
current from both branches to be triggered, thereby completing the first-level AND logic. Thereafter,
another structure resembling an OR gate is cascaded. At this point, junction J25 is set to require current
from either branch to trigger it, thereby completing the (AB)+(CD) logic. Fig. 2.63 illustrates the layout of
the (AB)+(CD) gate, with key circuit parameters provided in Table 2.19.
Alternatively, the logic can be implemented using a clock-follow-data scheme as depicted in Fig. 2.64
and Fig. 2.65, whereby standard AND and OR gates are utilized, and the second-stage gate is driven by a
delayed clock signal which arrives after the corresponding signal. The clock delay can be achieved using
52
a splitter where one of the output ports is terminated with a five ohms resistance. Data is processed
by the first- and second-stage gates sequentially within the same clock period. Fig. 2.66 illustrates the
simulation waveform, while Table 2.20 summarizes and compares the different methods for fulfilling the
same (AB)+(CD) logic. The two-stage implementation represents the minimum cost, utilizing standard
cells comprising two AND gates, one OR gate, three splitters, and necessary JTLs. As indicated in the
table, both single-stage methods improve performance in terms of JJ count, area, and bias current, but
exhibit longer delays, set-up times, or hold times.
Figure 2.62: Schematic of (AB)+(CD). Figure 2.63: Layout of (AB)+(CD).
Table 2.19: Circuit parameter list of (AB)+(CD)
Circuit parameter Value Margin Circuit parameter Value Margin
L2, L6, L14, L18 7.3pH − 29% 50% J2, J7, J13, J18 180µA − 29% 31%
J3, J6, J14, J17 110µA − 44% 41% L3, L7, L15, L19 2.5pH − 50% 50%
J4, J8, J15, J19 95µA − 50% 50% J10, J21 168µA − 50% 49%
J23, J24 116µA − 50% 50% L25 0.70pH − 50% 50%
J25 109µA − 50% 50% I1, I3, I6, I8 131µA − 35% 50%
I4, I9 92.5µA − 50% 50% I11 178µA − 29% 50%
53
Figure 2.64: Schematic of (AB)+(CD) using clock-
follow-data scheme.
Figure 2.65: Layout of (AB)+(CD) using clock-
follow-data scheme.
Figure 2.66: Simulation waveform of (AB)+(CD).
2.1.3.20 (A+B)(C+D)
Employing the similar idea, the (A+B)(C+D) logic can be implemented by two OR-like structures and an
AND-like structure as shown in Fig. 2.67. The circuit mechanism is similar as what explained in previous
(AB)+(CD) section. Fig. 2.68 illustrates the layout of the (AB)+(CD) gate, with key circuit parameters
provided in Table 2.21. Alternatively, the logic can be implemented using a clock-follow-data scheme
54
Table 2.20: Summary of (AB)+(CD)
Parameter Complex gate Clock-follow-data scheme Two-stage implementation
JJ count 26 44 44
Area (µm
2
) 5400 9000 18900
Bias (µA ) 1507 2883 2805
Worst stage delay (ps) 12.57 26.75 4.53
Set-up time (ps) -11.0 -11.42 4.03
Hold time (ps) 11.0 11.42 4.11
as depicted in Fig. 2.69 and Fig. 2.70. Fig. 2.71 illustrates the simulation waveform, while Table 2.22
summarizes and compares the different methods for fulfilling the (A+B)(C+D) logic.
Figure 2.67: Schematic of (A+B)(C+D). Figure 2.68: Layout of (A+B)(C+D).
2.1.3.21 ABCD
Employing the similar idea, the ABCD logic can be implemented by three AND-like structures as shown
in Fig. 2.72. The circuit mechanism is similar as what explained in previous (AB)+(CD) section. Fig. 2.73
illustrates the layout of the ABCD gate, with key circuit parameters provided in Table 2.23. The ABCD logic
implemented using a clock-follow-data scheme is depicted in Fig. 2.74 and Fig. 2.75. Fig. 2.76 illustrates
55
Table 2.21: Circuit parameter list of (A+B)(C+D)
Circuit parameter Value Margin Circuit parameter Value Margin
J1, J3, J11, J13 218µA − 31% 31% J6, J15 162µA − 32% 29%
J5, J16 96µA − 32% 29% L5, L15 8.6pH − 32% 50%
J8, J18 92µA − 50% 50% J9 168µA − 50% 50%
I1, I2, I3, I4 89.2µA − 40% 50% I5, I8 100µA − 43% 50%
I6 117µA − 35% 50%
Figure 2.69: Schematic of (A+B)(C+D) using clock-
follow-data scheme.
Figure 2.70: Layout of (A+B)(C+D) using clock-
follow-data scheme.
Figure 2.71: Simulation waveform of (A+B)(C+D).
the simulation waveform, while Table 2.24 summarizes and compares the different methods for fulfilling
the ABCD logic.
56
Table 2.22: Summary of (A+B)(C+D)
Parameter Complex gate Clock-follow-data scheme Two-stage implementation
JJ count 18 35 38
Area (µm
2
) 3600 7200 18000
Bias (µA ) 1157 2265 2750
Worst stage delay (ps) 7.0 22.45 4.53
Set-up time (ps) 2.18 5.46 4.03
Hold time (ps) -2.18 -5.46 4.11
Figure 2.72: Schematic of ABCD. Figure 2.73: Layout of ABCD.
Figure 2.74: Schematic of ABCD using clock-
follow-data scheme.
Figure 2.75: Layout of ABCD using clock-follow-
data scheme.
2.1.3.22 A+(BC)
The three-input complex gate can be implemented replacing one of the logic structure with DFF structure
like shown in Fig. 2.77 where the A+(BC) logic is demonstrated. Fig. 2.78 illustrates the layout of the A+(BC)
57
Table 2.23: Circuit parameter list of ABCD
Circuit parameter Value Margin Circuit parameter Value Margin
J1, J5, J12, J16 119µA − 50% 50% J2, J7, J13, J18 152µA − 46% 31%
J10, J21 154µA − 50% 46% L2, L6, L14, L18 8.6pH − 38% 50%
J23, J24 106µA − 50% 50% J25 185µA − 50% 47%
I1, I3, I6, I8 103µA − 49% 50% I4, I9 77.9µA − 48% 50%
I11 109µA − 46% 50%
Figure 2.76: Simulation waveform of ABCD.
gate, with key circuit parameters provided in Table 2.25. Alternatively, the logic can be implemented
using a clock-follow-data scheme as depicted in Fig. 2.79 and Fig. 2.80. Fig. 2.81 illustrates the simulation
waveform, while Table 2.26 summarizes and compares the different methods for fulfilling the A+(BC) logic.
2.1.3.23 (A+B)C
The schematic of (A+B)C is shown in Fig. 2.82 where the A+(BC) logic is demonstrated. Fig. 2.83 illustrates
the layout of the A+(BC) gate, with key circuit parameters provided in Table 2.27. Alternatively, the logic
can be implemented using a clock-follow-data scheme as depicted in Fig. 2.84 and Fig. 2.85. Fig. 2.86
58
Table 2.24: Summary of ABCD
Parameter Complex gate Clock-follow-data scheme Two-stage implementation
JJ count 26 43 46
Area (µm
2
) 5400 9000 18900
Bias (µA ) 1254 2629 2860
Worst stage delay (ps) 11.95 26.21 4.53
Set-up time (ps) -5.5 -14.4 -4.11
Hold time (ps) 5.5 14.4 4.11
Figure 2.77: Schematic of A+(BC). Figure 2.78: Layout of A+(BC).
Figure 2.79: Schematic of A+(BC) using clock-
follow-data scheme.
Figure 2.80: Layout of A+(BC) using clock-follow-
data scheme.
illustrates the simulation waveform, while Table 2.28 summarizes and compares the different methods for
fulfilling the A+(BC) logic.
59
Table 2.25: Circuit parameter list of A+(BC)
Circuit parameter Value Margin Circuit parameter Value Margin
J1, J5 124µA − 50% 50% J2, J6 180µA − 38% 50%
J4, J8 103µA − 50% 50% J11, J17 170µA − 29% 38%
L3, L7 4.8pH − 50% 50% I4 83µA − 50% 50%
I10 178µA − 29% 50%
Figure 2.81: Simulation waveform of A+(BC).
Figure 2.82: Schematic of (A+B)C. Figure 2.83: Layout of (A+B)C.
60
Table 2.26: Summary of A+(BC)
Parameter Complex gate Clock-follow-data scheme Two-stage implementation
JJ count 20 37 37
Area (µm
2
) 5400 9000 15300
Bias (µA ) 1104 2565 2487
Worst stage delay (ps) 9.9 25.9 4.53
Set-up time (ps) 11.9 -12.56 4.03
Hold time (ps) -11.9 12.56 4.11
Table 2.27: Circuit parameter list of (A+B)C
Circuit parameter Value Margin Circuit parameter Value Margin
J1, J3 179µA − 50% 50% J2, J4 105µA − 41% 45%
J6 163µA − 50% 50% J9 160µA − 39% 38%
L2, L4 3.8pH − 50% 50% L5 12pH − 50% 50%
I1, I2 158µA − 49% 50% I6 83µA − 50% 50%
Figure 2.84: Schematic of (A+B)C using clock-
follow-data scheme.
Figure 2.85: Layout of (A+B)C using clock-follow-
data scheme.
2.1.4 CellCharacterization
To validate and characterize the designed library cells, three chips were fabricated using MIT Lincoln Lab’s
SFQ5ee process. The chip photos are shown in Fig.2.87 and 2.88. Besides the individual cell tests, a 1-bit
full adder was implemented using the designed library cells (two XORs, two ANDs, one OR, JTLs and
splitters). An example of a post-manufacturing measured waveform for the XOR gate is shown in Fig.2.89.
The top waveform is the XOR output whereas the bottom is the clock signal. For the output waveform,
level changes represent ’1’s whereas for the clock signal, the rising edges represent ’1’s. The input pattern
is 00-01-10-11 resulting in an output pattern of 0-1-1-0. The measured bias margins are listed in Table 2.29.
61
Figure 2.86: Simulation waveform of (A+B)C.
Table 2.28: Summary of (A+B)C
Parameter Complex gate Clock-follow-data scheme Two-stage implementation
JJ count 15 37 37
Area (µm
2
) 3600 7200 15300
Bias (µA ) 1008 1947 2487
Worst stage delay (ps) 6.9 20.9 4.53
Set-up time (ps) 10.4 -11.72 4.03
Hold time (ps) -10.4 11.72 4.11
2.1.5 SynthesisResultUsingtheqSportLib
To investigate the impact of integrating complex logic gates into the standard cell library, several syntheses
were conducted on various benchmarks, such as the Kogge-Stone adders, combination logic benchmarks,
dividers, and array multipliers. The findings, presented in Table. 2.30, provide insight into the effect of
the inclusion of complex cells. For each benchmark the results are presented in three columns: ’W/O’
represents the outcome without complex cells, ’W Comp’ represents the outcome with complex cells and
the savings. The data indicates that integrating complex cells results in an average reduction of more
than 30% in logic depth, a 27% decrease in the number of gates, a 44% decrease in the number of path
balance DFFs, and a 27% decrease in the number of splitters. This leads to a 32% reduction in total area and
62
Figure 2.87: Chip photo 1
Figure 2.88: Chip photos 2 and 3
approximately 20% savings in bias current. Moreover, the worst stage delay remains relatively stable, but
there is a 28.5% reduction in latency from primary inputs to outputs due to the decrease in logic depth.
2.1.6 Conclusion
The design methodology for a RSFQ standard cell qSportLib using the MIT Lincoln Lab’s SFQ5ee process
was reported in this chapter. The goal was to enable better use of a cell abutment strategy while supporting
both JTl and PTl connections. The qSportLib cells were fabricated at the MIT Lincoln Lab and verified
post-fab. And the authors would like to thank Pete Hopkins and Adam Sirois of the National Institute of
Standards and Technology (NIST) for testing the fabricated chips.
63
Figure 2.89: A measured waveform for the XOR gate in qSportLib
Table 2.29: Summary of the cell measurement
Cells
Critical margins
(simulated)
Bias margins
(tested)
Cells
Critical margins
(simulated)
Bias margins
(tested)
JTL 80% 132% Buffer 76% 22.2%
Splitter 85% 26.4% Merger 85% 16%
DFF 72% 82.6% OR2 63% 60%
AND2 76% 16.4% XOR2 53% 37.4%
TFF 67% 45.2% NDRO 70% 20%
DC/SFQ 60% 60% SFQ/DC 50% 60%
Splitter3 88% 60% INV 81% N/A
PTLTX-RX (M3) 51% 12% PTLTX-RX (M5) 51% 22%
MFDRO 40% 5% Majority 78% N/A
XOR3 50% 5% A+(BC) 67% N/A
(AB)+(CD) 61% N/A (A+B)C 77% N/A
(A+B)(C+D) 60% N/A ABCD 77% N/A
*The MFDRO, majority (carry) and XOR3 (sum) cells are explained in chapter 3 along with the circuit designs
demonstrating their applications.
2.2 StandardCellLibraryDesigninHalfFluxQuantumTechnologyusing
2-phiJunctions
This section provides a detailed account of a standard cell library that utilizes the2ϕ -junction. The library
encompasses a range of components that include four distinct types of logic cells, namely inverter, AND
gate, OR gate, and XOR gate. Furthermore, it contains five transmission blocks that include JTL, splitter,
merger, passive transmission line (PTL) transmitter/driver, and PTL receiver. Additionally, the library
also comprises one storage cell, DFF, and two I/O interface blocks in the form of DC/SFQ and SFQ/DC
64
Table 2.30: Summary of the cells in qSportLib
KSA16 KSA32 c3540
W/O W Comp Savings W/O W Comp Savings W/O W Comp Savings
Logic depth 10 7 30.0% 12 9 25.0% 32 23 28.1%
gate count 192 143 25.5% 466 342 26.6% 1220 900 26.2%
DFF count 200 95 52.5% 532 258 51.5% 1214 864 28.8%
splitter count 567 416 26.6% 1431 1037 27.5% 3444 2869 16.7%
gate area (mm
2
) 0.691 0.515 25.5% 1.678 1.231 26.6% 4.392 3.469 21.0%
total area (mm
2
) 2.084 1.387 33.4% 5.270 3.468 34.2% 12.848 10.088 21.5%
Total bias (A) 0.313 0.254 18.9% 0.786 0.631 19.7% 1.93 1.74 9.7%
Worst stage delay (ps) 21.10 21.10 0.0% 21.10 21.10 0.0% 34.30 35.70 -4.1%
Latency (ps) 211.00 168.80 20.0% 253.20 189.90 25.0% 1097.00 856.80 21.9%
ID4(divider) arrMult16 Average
W/O W Comp Savings W/O W Comp Savings Savings
Logic depth 27 16 40.7% 88 47 46.6% 34.1%
gate count 128 98 23.4% 1992 1329 33.3% 27.0%
DFF count 337 169 49.9% 3575 2137 40.2% 44.6%
splitter count 569 368 35.3% 7251 5057 30.3% 27.3%
gate area (mm
2
) 0.461 0.362 21.5% 7.171 4.815 32.9% 25.5%
total area (mm
2
) 2.251 1.378 38.8% 27.909 18.252 34.6% 32.5%
Total bias (A) 0.311 0.219 29.5% 3.99 3.14 21.3% 19.8%
Worst stage delay (ps) 25.50 29.70 -16.5% 21.10 22.40 -6.2% -5.3%
Latency (ps) 688.50 475.20 31.0% 1856.80 1030.40 44.5% 28.5%
converters. These various cells have been specifically designed to fulfill the fundamental requirements of
any general-purpose system.
To confirm their functionalities, the circuits have been simulated using JoSIM [33] simulator. Moreover,
the circuits have been optimized using qCS, which has yielded satisfactory margins. The critical circuit
parameters have been presented along with their respective margins. Additionally, the estimated cell
layout area has been provided, which has been calculated using the available MITLL SFQ5ee process with
a simulated2ϕ -junction device.
2.2.1 BackgroundandPriorWork
As the demand for faster, more efficient, and compact technology increases, the reduction in the size of
inductors in SFQ circuits has become a pressing issue. The exponential increase in mutual inductance and
cross-talk impedes the reduction in the width and spacing of metal lines. Although the kinetic inductor may
serve as a viable solution, it introduces further complications. In light of this, the study described in [39]
65
aims to eliminate the need for inductors to improve the scalability of SFQ circuits. This has been achieved
by implementing several logic cells (NDRO, DRO, and half adder) with2ϕ -junction. Concurrently, efforts
are being made to employ the 2ϕ -junction to reduce the dynamic power of SFQ systems. In [24], three
new cells, namely a Josephson transmission line (JTL), an inverter, and an OR gate, have been introduced
utilizing half flux quantum (HFQ) pulses. Compared to traditional RSFQ cells, these cells have smaller
latency and switching power. But unfortunately, there is no circuit parameter shown in the paper and
three cells are not enough for a standard cell library to meet the basic function requirement. Furthermore,
a recent study has proposed an interface between SFQ circuits and HFQ circuits [40] which also helps to
broaden the application of the potenial HFQ technology.
2.2.2 StandardCellLibraryUsing2-phiJunction
In this design, the majority of the cells have adopted conventional RSFQ cell structures, while the inductors
have been replaced by the normal Josephson junction (0-JJ) and the 2ϕ -junction is now utilized as the
switching component. As a result, the logic ’1’ in this logic family is represented by a half flux quantum
pulse with a voltage· time area half of a full flux quantum (
1
2
Φ 0
= 1.03× 10
− 15
Wb). The elimination of
the inductor in the cell design enables us to avoid the associated drawbacks and inconvenience of large
inductance. It should be noted that, although the designed inductor is avoided, there are still parasitic
inductors on every connection, which are assumed to be 0.5pH during the design process. However,
according to the simulation results, the tolerance to parasitic inductance is greater than 100% (0pH to
> 1pH). Therefore, all parasitic inductance has been omitted from the following schematics for clarity.
A comparison simulation of normal Josephson junction (0-JJ) and2ϕ -junction using JoSIM is presented in
Figure 2.90. In the simulation, the same constant voltage is applied on both junctions (0-JJ and2ϕ -JJ), and
both junctions switch whenever the voltage· time integration exceeds the flux quantum requirement. The
simulation clearly shows that the2ϕ -junction switches twice while the 0-JJ switches only once at the same
66
time (same amount of flux quantum received by two junctions). The simulation confirms the statements
made previously as well as in [24]. It should be noted that, although the amplitude of the 2ϕ -junction
voltage in the figure is larger than that of 0-JJ, the time duration is smaller so that the pulse area is half of
the 0-JJ pulse.
Figure 2.90: Simulation waveform of a comparison on 0-JJ and2ϕ -JJ.
The forthcoming sections will elucidate the particulars of each cell. Firstly, a schematic will be pre-
sented, followed by a table that lists the values of each parameter along with their respective margins.
Subsequently, a simulation wave will be provided to validate the correct functionality. An estimated lay-
out is then devised for every cell to evaluate the potential area savings. However, given the current lack of
technology capable of providing the2ϕ -JJ, we rely on the MITLL SFQ5ee process and a counterfeit2ϕ -JJ
device for estimation purposes. Thus, the layout cannot be manufactured, and, to avoid distracting the
readers, only an exemplary layout of the OR gate will be presented in the summary section to provide an
impression for the readers.
67
2.2.2.1 JosephsonTransmissionLine(JTL)
The schematic of a JTL block, as illustrated in Fig. 2.91, comprises two 2ϕ -JJs represented by the square
boxes (J1 and J3), and two 0-JJs (J2 and J4), indicated by devices without boxes. The structure is symmetri-
cal, with J1/J3, J2/J4, and I1/I2 being identical. Consequently, the JTL cell is repeatable, and when connected
to the OUT port, a subsequent JTL IN port can form a JTL chain without any superfluous component. As
shown, J1 substitutes the inductor in the former RSFQ JTL as a J1-J2-J3 loop, thereby establishing the phase
equation where the integration of the phase difference from the positive terminal of J1, through J2, FJ3,
and back to J1 is an integer number of2π . When a half-flux-quantum (HFQ) pulse is received from the IN
port, J1 will switch and generate another HFQ pulse that propagates to the next device, enabling the HFQ
pulse to be transported along the JTL. The simulation waveform depicted in Fig. 2.92 and the component
values listed in Table 2.31 confirm the JTL’s correct operation. The JTL’s critical margin is 71% on J1/J3.
Figure 2.91: Schematic of the Josephson Transmission Line (JTL).
Table 2.31: Parameter values and margins of JTL
Components Values Margins
J1,J3 70µA 71%
J2,J4 80µA 97%
I1,I2 45µA 88%
Bias 1mV 88%
68
Figure 2.92: Simulation waveform of the Josephson Transmission Line (JTL).
2.2.2.2 DFlip-flop
Fig. 2.93 illustrates the schematic of the DFF. When a half-flux-quantum (HFQ) pulse is received from the
D port, the HFQ will be stored in the J1-J2-J3 loop as a clockwise looping current, subsequently increasing
the bias current of J3. As a result, the incoming HFQ pulse from the CLK port will trigger J3, generating an
HFQ output at the Q port. If no HFQ pulse is stored, the incoming pulse from CLK will trigger J4 instead,
and no pulse will be produced at Q. The simulation waveform presented in Fig. 2.94 and the component
values listed in Table 2.32 confirm the DFF’s correct operation. The DFF’s critical margin is 68% on J1.
Figure 2.93: Schematic of the D Flip-flop.
69
Figure 2.94: Simulation waveform of the D Flip-flop.
Table 2.32: Parameter valuess and margins of DFF
Components Values Margins
J1 63µA 68%
J2 65µA 90%
J3 86µA 75%
J4 74µA 85%
J5 117µA 97%
I1 31µA 100%
I2 32µA 100%
Bias 1mV 88%
2.2.2.3 Merger
Fig. 2.95 depicts the schematic of the merger cell, also known as the confluence buffer. When a half-flux-
quantum (HFQ) pulse is received from one of the input ports, such as IN1, the HFQ triggers the receiving
junction (J1), subsequently increasing the current of the corresponding branch (J1-J2-J3-J7), and causing
J7 to flip, producing an output pulse. At the same time, the buffering junction on the other branch (J6
for the case of an input coming from IN1) is triggered to cancel the flux flowing backward toward the
other input port (IN2). Specifically, J3 and J6 prevent the back-propagating pulses. If two input pulses
arrive at the same time or within a small time window (several picoseconds), only one output pulse will
70
be generated at the OUT port. The simulation waveform presented in Fig. 2.96 includes not only the input
and output waveforms but also the phases of J3 and J6 to illustrate how they prevent back propagation.
The component values are listed in Table 2.33. The merger’s critical margin is 72% on the overall bias
voltage.
Figure 2.95: Schematic of the merger cell.
Figure 2.96: Simulation waveform of the merger cell.
71
Table 2.33: Parameter values and margins of merger cell
Components Values Margins
J1,J4 83µA 75%
J2,J5 151µA 100%
J3,J6 62µA 91%
J7 40µA 100%
J8 93µA 82%
I1,I2 27µA 100%
I3 120µA 81%
Bias 1mV 72%
2.2.2.4 Splitter
The fan-out of a half-flux-quantum (HFQ) logic cell is limited to one, just like in single flux quantum (SFQ)
cells. Therefore, in order to duplicate a pulse, a splitter is required. The schematic of the splitter cell is
presented in Fig.2.97. The incoming HFQ pulse from the IN port is received by J1 and divided into two,
triggering J5 and J4 separately. This produces an HFQ pulse at each output port. The simulation waveform
is displayed in Fig.2.98, while the values of the components used in the simulation are listed in Table.2.34.
The critical margin of the splitter is70% on the current bias I2/I3.
Figure 2.97: Schematic of the splitter cell.
72
Figure 2.98: Simulation waveform of the splitter cell.
Table 2.34: Parameter values and margins of splitter cell
Components Values Margins
J1 65µA 98%
J2,J3 73µA 99%
J4,J5 80µA 88%
J6,J7 80µA 90%
I1 70µA 100%
I2,I3 45µA 70%
Bias 1mV 88%
2.2.2.5 OR2Gate
The OR gate can be understood as a combination of two DFF cells that are driven by the same clock,
followed by a merger cell, as depicted in Fig. 2.99. The DFF cell has been previously introduced and
therefore will not be elaborated again. The following merging part has the same structure as the previously
introduced merger cell. However, the component values differ as they have been automatically optimized
by the tool. The optimized values are presented in Table 2.35 and the critical margin is40% on J1/J4. The
simulation waveform is presented in Fig. 2.100.
73
Figure 2.99: Schematic of the OR gate.
Figure 2.100: Simulation waveform of the OR gate.
2.2.2.6 AND2Gate
The structure of the AND gate is identical to that of the OR gate, as illustrated in Fig.2.101. However, the
critical components of the structure, namely the merging part, are modified so that the output junction J7
necessitates at least two HFQ pulses to generate a pulse. The simulation waveform is presented in Fig.2.102,
and the component values are recorded in Table.2.36. The merging part’s critical margin is85% on J8.
74
Table 2.35: Parameter values and margins of OR gate
Components Values Margins
J1,J4 95µA 40%
J2,J5 100µA 67%
J3,J6 83µA 37%
J7 68µA 73%
J8 74µA 69%
I1,I2 25µA 93%
I3 83µA 99%
Bias 1mV 44%
Figure 2.101: Schematic of the AND gate.
2.2.2.7 XOR2Gate
The schematic of the XOR gate is presented in Fig.2.103. In contrast to the AND or OR gate, an additional
junction (J7) is introduced after the merging point of the two input branches. Due to the presence of
J7, when pulses arrive from both branches (corresponding to the case: A=1 and B=1), J7 will flip and J8
remains inactive, leading to no pulse output. However, when only one pulse arrives from either of the
branches (corresponding to the case: A=1, B=0 or A=0, B=1), J8 is triggered to generate the output pulse.
The simulation waveform is depicted in Fig.2.104, while the component values are provided in Table.2.37.
The critical margin of the merger is20% on J7.
75
Figure 2.102: Simulation waveform of the AND gate.
Table 2.36: Parameter values and margins of AND gate
Components Values Margins
J1,J4 70µA 97%
J2,J5 140µA 99%
J3,J6 72µA 94%
J7 99µA 98%
J8 74µA 85%
I1,I2 31µA 100%
I3 32µA 100%
Bias 1mV 92%
2.2.2.8 Inverter
The schematic diagram of an inverter is depicted in Fig.2.105. The inverter consists of two main parts.
The upper part of the cell has a splitter-like structure in which one branch is combined with the clock
branch of the DFF-like structure located at the bottom. Thus, upon arrival of the clock signal, the first part
of the inverter generates either ’11’ or ’10’ depending on whether an input pulse was previously stored.
Subsequently, the following XOR-like structure completes the inverter function. The simulation waveform
of the inverter is illustrated in Fig.2.106. The component values are presented in Table.2.38. The critical
margin of merger is16% on J14 and J15.
76
Figure 2.103: Schematic of the XOR gate.
Figure 2.104: Simulation waveform of the XOR gate.
2.2.2.9 PTLDriverandReceiver
The schematic of a PTL transmitter/driver (TX) and receiver (RX) is presented in Fig.2.107. The PTL struc-
tures of both the TX and RX are similar to JTLs, but with a serial resistor added to the output port of the
PTL driver to ensure impedance matching. For the presented design, the characteristic impedance is set to
four ohms, but the circuit parameters can be adjusted to suit any practical values for the technology being
used. The simulation waveform of the PTL TX and RX are shown in Fig.2.108, and the component values
are listed in Table.2.37. The critical margin of the PTL driver is81% on J1, and the critical margin of the
PTL receiver is81% on J7.
77
Table 2.37: Parameter values and margins of XOR gate
Components Values Margins
J1,J4 87µA 46%
J2,J5 95µA 75%
J3,J6 72µA 40%
J7 70µA 20%
J8 72µA 22%
J9 80µA 89%
I1,I2 31µA 79%
I3 63µA 88%
I4 39µA 39%
Bias 1mV 26%
Figure 2.105: Schematic of the inverter.
2.2.2.10 DC/SFQConverter
The schematic diagram of a DC/SFQ converter is presented in Fig.2.109. The circuit comprises a serial
input resistor R1 that converts the input voltage to current. R1 can be implemented either on-chip, as in
this design, or off-chip. A large inductor L1 follows the input resistor. At the rising edge of the input signal,
L1 exhibits high impedance, allowing most of the current to flow through J2, which triggers an HFQ pulse
at the output port. When the input voltage becomes steady, L1 behaves as a short connection, and the
input current flows through L1 to ground, leaving J2 inactive. The simulation waveform is displayed in
Fig.2.110, and the component values are listed in Table.2.40. The critical margin of PTL driver is98% on
J1.
78
Figure 2.106: Simulation waveform of the inverter.
Table 2.38: Parameter values and margins of the inverter
Components Values Margins Components Values Margins
J1 85µA 43% J2 77µA 69%
J3 95µA 52% J4 70µA 48%
J5 82µA 62% J6 80µA 81%
J7 80µA 61% J8,J11 96µA 76%
J9,J12 101µA 85% J10,J13 78µA 64%
J14 80µA 16% J15 82µA 16%
J16 90µA 67% I1 43µA 63%
I2 40µA 79% I3 57µA 65%
I4,I5 37µA 79% I6 70µA 73%
I7 40µA 36% Bias 1mV 22%
2.2.2.11 SFQ/DCConverter
The schematic of SFQ/DC converter is illustrated in Fig.2.111. Upon the arrival of an input pulse, the cell is
disturbed from its quiescent state, causing the output junction J5 to oscillate. A subsequent input pulse then
restores the cell to its initial state. During the active mode, the SFQ/DC converter continually generates
current through the L1. A serial resistorR
out
is employed to form a low-pass filter with the off-chip wire
inductance and the equivalent load of the oscilloscope used to observe the output signal. Fig.2.112 displays
the simulation waveform, where the input signal is shown at the top, the observed voltage after L1 in the
middle, and the output signal at the oscilloscope at the bottom. The output state changes with each input
79
Figure 2.107: Schematic of the PTL driver and receiver.
Figure 2.108: Simulation waveform of the PTL driver and receiver.
pulse. The component values are presented in Table.2.41. The critical margin of the PTL driver is87% for
the overall bias voltage.
2.2.2.12 RandomPatternGenerator
In order to analyze the library cells, a random pattern generator is implemented in the circuit shown in
Fig.2.113. The generator comprises of two DFF chains, whose signals are tapped at different locations using
splitters and then fed to XOR gates. The outputs of these XOR gates are merged with the initial signals to
form a feedback loop, thereby generating two random data series which are used as inputs for the following
stages. These stages include an AND gate, an inverter, and an OR gate, all of which are synchronized with
a clock signal, although the clock signal is hidden in the diagram for clarity. The simulation waveform
of this micro system is shown in Fig.2.114, where the top waveform represents the clock signal and the
’Series 1’ and ’Series 2’ signals are the bit series at the output of the two DFF chains, while the ’OUT’ signal
80
Table 2.39: Parameter values and margins of PTL driver and receive
Components Values Margins Components Values Margins
J1 80µA 81% J5 75µA 88%
J2 80µA 84% J6 61µA 87%
J3 70µA 100% J7 80µA 81%
J4 83µA 100% J8 80µA 84%
R1 0.5ohms 100% I3 62µA 81%
I1 45µA 88% I4 45µA 88%
I1 49µA 94% RX bias 1mV 88%
TX bias 1mV 100%
Figure 2.109: Schematic of the DC/SFQ converter.
is at the output of the last stage OR gate. This micro system demonstrates the correct functioning of the
utilized cells and the system integration capability of the half flux quantum standard cells.
2.2.2.13 LibrarySummary
The present study outlines the design of the Half Flux Quantum (HFQ) standard cell library employing
the 2ϕ -junction. Furthermore, the prototype layouts of individual cells were also created using a fabri-
cated technology. The technology is based on the SFQ5ee process published by MITLL, with an additional
layer for emulating the device. Fig.2.115 depicts the layout of an OR gate as an example, consisting of four
metal layers (M4 - M7), three metal vias, a resistor layer, two junction layers for 0-JJ and2ϕ -JJ, a resistor
contact layer, and a junction contact layer. The compactness of the layout is attributed to the elimination
of inductor, resulting in reduced area requirements, which can be further scaled down with technological
advancement. Table.2.42 presents the list of cells implemented in the library, along with the number of
81
Figure 2.110: Simulation waveform of the DC/SFQ converter.
Table 2.40: Parameter values and margins of DC/SFQ converter
Components Values Margins
R1 50ohms 100%
J1 83µA 98%
J2 71µA 100%
J3 100µA 100%
I1 48µA 100%
L1 6.7pH 100%
Bias 1mV 100%
junctions (0-JJ and2ϕ -JJ) and critical margins. The estimated layout areas and the bias current are com-
pared with a conventional RSFQ library developed in-house. On average, the HFQ cells exhibit50.8% less
area and61% less bias current, making them highly attractive for practical implementations.
2.2.3 Conclusion
A library of HFQ standard cells utilizing2ϕ -junction has been developed in this study. The design method-
ology has been elaborated and the available cells in the standard cell library have been presented along with
their respective schematics, component values and margins. The library comprises of several cells such
as inverter, AND gate, OR gate, XOR gate, JTL, splitter, merger, PTL driver, PTL receiver, DFF, DC/HFQ
82
Figure 2.111: Schematic of the SFQ/DC converter.
Figure 2.112: Simulation waveform of the SFQ/DC converter.
converter and HFQ/DC converter. Compared to conventional RSFQ cells, the newly developed cells re-
quire less bias current, which reduces the dependence on inductors and offers improved scalability and
smaller area. The HFQ logic family has demonstrated its potential to emerge as a strong contender for
next generation VLSI circuits.
83
Table 2.41: Parameter values and margins of SFQ/DC converter
Components Values Margins
J1 112µA 91%
J2 104µA 100%
J3 70µA 100%
J4 78µA 100%
J5 70µA 97%
J6 80µA 100%
J7 80µA 100%
J8 80µA 100%
L1 0.5pH 100%
L2 3pH 93%
I1 45µA 100%
I2 26µA 100%
I3 48µA 100%
I4 22µA 100%
Rout 50ohmA 100%
Bias 1mV 87%
Figure 2.113: Schematic of the random pattern generator.
Figure 2.114: Simulation waveform of the random pattern generator.
84
Figure 2.115: An example layout of the OR gate.
Table 2.42: Summary of the cell measurement
Area (µm
2
) Bias (µA )
Cell names No. of2ϕ -JJ
No. of
saved Ls
Critical
margins
2ϕ -JJ RSFQ Saving 2ϕ -JJ RSFQ Saving
JTL 2 2 71% 225 900 75% 90 180 50%
DFF 3 4 68% 375 900 58% 63 212 70%
Merger 5 6 72% 550 900 39% 174 375 54%
Splitter 3 5 70% 494 900 45% 160 309 48%
OR2 11 11 37% 1263 1800 30% 259 475 45%
AND2 11 11 85% 1263 3600 65% 220 530 58%
XOR2 12 11 20% 1428 3600 60% 290 435 33%
PTL driver 2 2 81% 429 900 52% 93 265 65%
PTL receiver 2 2 81% 369 900 59% 107 252 58%
DC/SFQ 1 2 98% 641 900 29% 96 450 79%
SFQ/DC 4 5 87% 876 3600 76% 141 1025 86%
Average 53% 59%
85
Chapter3
CircuitDesigns
In this chapter, three circuit designs using the mentioned qSportLib are presented. The utilization of com-
plex cells, such as carry and sum cells, is showcased by the design of an 8-bit multiplier. In addition, an
all digital PLL is employed to implement a clock synthesizing scheme for the RSFQ system. Finally, the
memory performance of the RSFQ technology is enhanced through the implementation of random access
memory designs, which incorporate a multi-fluxon destructive readout cell.
3.1 8-bitMultiplierUsingtheComplexCells
3.1.1 BackgroundandPriorWork
When compared to the conventional CMOS logic, the SFQ technology has some drawbacks in spite of its
many advantages. The drawbacks in terms of circuit synthesis are the requirements to employ splitter
cells to drive multiple fanouts and insert an appropriate number of D-flipflops at appropriate locations for
gate-level path balancing. These additional requirements typically result in a large increase in the gate
count (by a factor of two or more). Using complex SFQ logic cells (that can produce the desired Boolean
function with smaller circuit depth and initial gate count) are highly desirable for the SFQ logic family.
This is because by having smaller circuit depth, the number of path-balancing DFFs, JTLs, and PTLs used
86
in the circuit will be greatly decreased along with the accompanying reduction in the clock distribution
network resource usage [41].
One-bit full adder cell serves as an important functional block in digital circuit systems since it is widely
used in the arithmetic-logic units (ALU) [42], integer and floating-point data path units of processors [43]
, and custom multipliers [44], [45]. The conventional one-bit full adder using standard logic gates consists
of two XOR gates, two AND gates and an OR gate as shown in Fig. 3.1 (a). Along with the required logic
gates, many DFF cells and splitters are also required to satisfy the path-balancing and fan-out requirements
of an SFQ circuit, which in turn increases the area and power consumption of any circuit that uses one-bit
full adder as a circuit block.
Figure 3.1: (a) Circuit diagram of conventional one-bit full adder; (b) Schematic of a simpler one-bit full
adder using confluence buffer (CB) and TFF.
There is also a design of one-bit ALU using the single-stage full adder [46]. Shown in Fig. 3.2, a 3-input
XOR gate and a 3-input majority gate are used to create the sum and carry out signal outputs of the full
adder cell, respectively. In this work, the potential timing issue of the CB-based FA discussed above is
solved and the inputs can arrive freely during one operating period as long as they meet the set-up and
hold time constraints. However, looking at the insertion location of the clock signal in Fig. 3.2 (a) and (b),
it can be seen that the clock signal is injected at the back end of the 3-XOR gate (closer to OUT) whereas
it is injected at the front end of the majority gate (closer to inputs A, B , and C as part of the DFF gates).
87
This results in a potential drawback for this design, which is the unbalanced set-up times and clk-2-out
propagation delays for the sum and carry outputs. In [46], it is shown that the set-up times for these XOR
and majority cells are 15ps and 1ps, respectively, whereas the clk-2-out delays for these two gates are 5ps
and 18ps, respectively. Although the setup+propagation times for these two gates are similar (20ps and
19ps), the unbalanced timing creates extra difficulties for the chip designer.
Figure 3.2: (a) 3-XOR gate; (b) Majority gate taken from [46].
The goal of the present paper to provide a new SFQ one-bit adder design that improves the results of
[46] by eliminating the said imbalance between timing behaviors of the sub-circuits producing the sum and
carry outputs of the adder. An 8-bit multiplier using the newly designed cells is demonstrated to show the
savings on power and area. The synthesis results for the multiplier before and after adding the single-stage
full adder into the library are compared and contrasted as well.
3.1.2 CarrySignalGeneratedbySingle-StageGate
The carry out signal of the one-bit full adder can be generated as a majority function. Carry cell should
produce an output pulse when two or more of the inputs (two bits to be added and the carry-in bit) receive
an SFQ pulse as an input (logic ONE). A majority gate is implemented as shown in Fig.3.3 for the carry
functionality. The inputs for the carry cell are received and stored in inductive loops just as in a DFF
circuit. These stored pulses are kept in these loops as persistent currents until the cell is clocked. For
example, Josephson Junction J2 switches when a pulse is received at input A and the SFQ pulse is stored in
the superconducting loop consisting of J2, L2 and J4. When an input SFQ pulse arrives at the CLK input, J4
88
Figure 3.3: Schematic of the single-stage RSFQ carry cell.
switches releasing the flux stored in that loop which flows as current through J13, L4. This current is then
merged at L13 with currents from the other two similar circuit branches (which are J14, L8 on one hand
and J15, L12 on the other hand). In the case of no input pulses at A (or B, C), the CLK signal cannot make
J4 (or J8, J12 for each input branch) to switch and therefore no additional current flows through the branch
J13, L4 (or J14, L8 and J15 and L12). Current flowing through L13 to J16 is the summation of currents from
all three branches. The critical current of J16 is set to a value such that the additional current from at least
two input branches is required to make J16 switch. Compared to the design of [46] which is shown in
Fig. 3.2 (b), in our proposed design we have removed the junctions after the DFF in order to reduce the
cell area and power consumption. This modification also helps minimize the clock-to-Q delay. Table 3.1
shows component values for the carry cell schematic of Fig. 3.3. And Fig. 3.4 shows the margins of key
components of the carry cell.
The layout of carry cell is shown in Fig. 3.5. A 50 um by 50 um bias frame using metal 7 is placed at
the boundary of carry cell. Metal 6 and metal 5 are used to complete the cell connections and for strip
line inductances of the cells. A metal 4 ground plane with moats lying beneath the JJs and inductors.
89
Table 3.1: Key components values of the carry cell
Component Value Component Value
J1, J5, J9 90µA L1, L5, L9 3.37pH
J2, J6, J10 118µA L2, L6, L10 8.02pH
J3, J7, J11 87µA L3, L7, L11 4.07pH
J4, J8 ,J12 135µA L4, L8, L12 7.15pH
J13, J14, J15 120µA L13 2.14pH
J16 167µA Bias 2.5mV
Figure 3.4: Margins of critical components of the carry cell.
Current biases are provided by the metal 7 bias frame, through the R5 layer which forms the bias resistors.
Circuit parameter values such as inductance and area of Josephson Junction are extracted from the layout
by InductEx [35] and simulated by JSIM [32]. Extraction and JSIM simulations were iterated to get a circuit
netlist with good margins.
Fig. 3.6 shows the post-layout simulation waveform of the carry cell showing the correct functionality
at 40GHz with a bias margin of 32%, a set-up time of 1.7 ps, a hold time of 2.2 ps. The bias-margin-vs-clock-
period plot in Fig. 6(b) shows that the bias margin is reduced by about 25% (relative to the bias margin
at low frequency) when the clock frequency is increased from 30GHz to 50GHz. One of the explanations
for this phenomenon is that the SFQ pulse usually has a main peak pulse lasting for about 3ps and a pulse
90
Figure 3.5: Layout of the carry cell.
tail with some ripples on it which lasts for about 10ps. At very high frequency, for example 50GHz, the
clock period (which is 20ps) has a value close to the duration of a SFQ pulse. In this situation, the bias
margin (as well as other components of the operating margin of a circuit) will be adversely affected at
higher frequencies. Also, operating at higher frequency requires faster switching of the junctions which
will raise the lower boundary of the bias. For carry/majority cell, the clock-to-Q delay depends on the
applied input pattern, more precisely, the number of the ’1’s in the input pattern. Specifically, the clock-
to-Q delay is about 3.5ps if there are three ’1’s in the input pattern, whereas it is 7.8ps if there are only two
’1’s in the input pattern. This behavior can be explained as follows. Although for both cases of two and
three input ’1’s, the final output junction J16 is triggered to release an output SFQ pulse, more branches are
turned on and more current is fed into J16 for the case of three ’1’s, resulting in a faster generation of an
SFQ pulse at the output. Now then, the input pattern dependency of the carry/majority cell propagation
delay must be considered at the full-chip level, for example, when checking for potential setup or hold
time violations. This is easy to do since when checking for setup time violations, we can use the longer
91
cell delay value (corresponding to two input ‘1’s) whereas during the hold time violation check, we use
the shorter cell delay value (corresponding to three input ‘1’s).
Figure 3.6: Simulation waveform and the bias margin of the carry cell .
3.1.3 SumSignalGeneratedbySingle-StageGate
The functionality of the sum output of the one-bite full adder can be represented as (A⊕ B)⊕ C in
Boolean logic. The RSFQ sum cell gives an output pulse whenever there is an odd number of input pulses
(logic 1) before arrival of the clock signal. A pair of two-input XOR gates and one DFF cell are required to
generate this signal. Fig. 3.7 gives the schematic of the sum cell. The top left part of the schematic forms
a clocked two-input XOR function gate. Only if inputs A, B are different, J7 will switch after the arrival
of the clock (if both inputs are one’s, J5 will switch and no pulse will be transmitted to the latter part of
the cell). The input part of the circuit where input C is received is simply a DFF-like cell which replicates
92
the input pulse from C and transmits it to the latter part of the cell after the clock pulse arrives. The latter
part of the cell is in fact a clockless XOR gate. Compared to the 3-XOR design of [46], which is shown in
Fig. 3.2 (a), our proposed 3-input xor function is divided into two stages: the first stage computes A⊕ B
and replicates C for later use; the second stage computes the XOR result of the outputs of the first stage.
Moreover, the clock signal is injected between the two stages. Instead of putting all the workload on the
parts of the cell before the clock branches, our new design balances the load of the cell before and after the
clock branches, which results in the 3-input XOR functioning correctly at a 40GHz clock frequency with
a bias margin of 20%, a set-up time of 2 ps, a hold time of 2 ps, and a clock-to-Q delay of 5.8 ps. As we
can see, benefiting from this balanced design, the set-up time has been reduced to 2ps with a very small
increase in propagation delay. Similar to the carry cell, the bias margin is reduced by about 20% (relative
to the bias margin at low frequency) as the frequency is increased from 25GHz to 40 GHz. And the bias
margin value mentioned above is at 40GHz which is lower than the slow operating situation.
Figure 3.7: Schematic of the single-stage RSFQ sum cell.
Table 3.2 shows the component values for the schematic of Fig. 3.7. Fig. 3.8 shows the margins of key
components of the sum cell. Fig. 3.9 shows the layout of the sum cell with the same dimension as the
carry cell. And the post-layout simulation is done to prove the correct operation using the same method
mentioned above and the simulation waveform are shown in Fig. 3.10.
93
Table 3.2: Key components values of the sum cell
Component Value Component Value
J1, J2 187µA L3, L4 8.8pH
J3, J4 109µA L5 1.57pH
J5 138µA L10, L11 2.81pH
J7, J10 141µA L12 3.0pH
J14 80µA L13 4.01pH
J15 85µA Bias 2.5mV
Figure 3.8: Margins of key components of sum cell.
3.1.4 One-bitFullAdderDesign
One-bit full adder can be implemented by using one carry cell and one sum cell, with no extra logic gates.
As shown in Fig. 3.11, three signals (A, B and C) from the input ports go through three splitter cells
individually and feed to the carry cell and sum cell. JTLs and splitter cells are used to route input signals
including clock to the Sum and Carry cells as required. Layouts for two different full adder realizations are
shown in Fig.3.12. In this figure, the top layout corresponds to the conventional full adder using standard
logic gates (AND, OR, XOR, and DFF) whereas the bottom layout corresponds to the single-stage complex
gates introduced in this paper. Only JTL connections are used for the full adder integration to minimize
the area cost and JJ count. The JTL consists of two JJs with 130µA critical current and consumes 180µA
94
Figure 3.9: Layout of the sum cell.
bias current. If desired, the repeated JTLs could be merged further for the two full adders. Fig. 3.13 gives
the simulation result of the full adder integration using single-stage complex gates showing the correct
functionality at 40GHz with a clock-to-Q delay about 30 ps including JTL connections and splitters. Note
that the logic depth of the single-stage full adder design is one whereas it is three for the conventional full
adder.
Table. 3.3 show the comparison between the full adder in this paper, the full adder using standard
library (STDL) and the design in [46]. Compared with the STDL, it can be seen that with the single-stage
complex gate design, approximately 50 % savings are achieved in terms of JJ count, layout area, and the
total bias current. Moreover, timing performance of the single-stage complex gate design also outperforms
that of the standard logic gate design. The theoretical minimum clock period for the single-stage complex
gate design is 7.8ps because this value is the larger of the clock-to-Q delay of the sum and carry cells, and
95
Figure 3.10: Simulation waveform for the sum cell and the bias margin.
this value is larger than the summation of the maximum set-up time (2ps) and the maximum hold time
(2ps) of the two cells.
max(T
c2Q,sum
,T
c2Q,carry
)>max(T
se,sum
,T
se,carry
)+max(T
ho,sum
,T
ho,carry
) (3.1)
where T
c2Q
is the clock-to-Q delay. T
se
denotes the set-up time whereas T
ho
is the hold time. For the
conventional (standard gate) design, path delays are different for different signal paths, especially from
the outputs of the first stage to the inputs of second stage. To process the data of the second stage, one has
to wait until all signals reach the second stage from the first stage. The theoretical minimum clock period
for this design is calculated as the summation of the largest clock-to-Q delay of the first stage gates (which
is 7.18ps for the XOR2 gate), the largest path delay between the first stage to the second stage (which is
96
Figure 3.11: Block-level diagram of the novel one-bit full adder design employing single-stage complex
RSFQ gates.
18ps for the connection comprising 4 JTLs and 1 splitter) and the largest set-up time of the second stage
gates (which is 1ps for the DFF gate), that is, the result is 26.18ps.
T
min
=max(T
c2Q,st1
)+max(T
path
)+max(T
se,st2
) (3.2)
whereT
min
is the minimum clock period,T
c2Q,stage1
denotes the clock-to-Q delays of the first stage gates,
T
path
is the path delays from output of the first stage to the input of the second stage gate, and T
se,stage2
de-
notes the set-up times of the second stage gates. The design using single-stage complex gates outperforms
the standard gates design by about three times in terms of the maximum clock frequency. Furthermore,
the standard gate adder design has a cycle latency of 3 whereas our design (using complex single-stage
gate) has a cycle latency of 1. In other words, the proposed design is roughly 9 times better in terms of its
end-to-end latency compared to the conventional standard gate design. To compare our results with those
of the full adder of [46], we ran into the issue that their full adder was done using a different process and
additionally the paper did not provide the design parameters for the cells. Therefore, we ended up setting
the parameter values and doing the layout for their full adder in the MIT Lincoln lab’s SFQ5ee process.
Results reported herein are therefore comparison of their design generated as described above with our
own design. Note that the full adder of [46] does not have balanced set-up and hold times for 3-Majority
97
Figure 3.12: Comparison of the layouts using standard RSFQ gates and single-stage complex RSFQ gates.
and 3-XOR cells. If this is required, then delay cells must be inserted before the 3-Majority cell of [46],
which will further increase their layout area cost.
3.1.5 8-bitMultiplierDesign
The implemented multiplier is an 8-bit parallel array multiplier as shown in Fig. 3.14. All the RSFQ (logic)
gates used in the multiplier are synchronized, which results in a gate-level pipelined structure. For an 8-bit
design, there are 18 stages (logic levels), where the stage with most gates consists of three full adders, one
half adder, and three AND gates. The connections between stages follow a similar pattern: the (full or
half) adders in the current stage receive data from the adders and AND gates of previous stages, whereas
98
Figure 3.13: Simulation waveform of one-bit full adder employing single-stage complex RSFQ gates.
Table 3.3: COMPARISON TABLE OF FULL ADDER USING STANDARD GATES WITH SINGLE-STAGE
ADDER DESIGN
FA using STDL [46] This design
JTL 27 15 15
Splitter 9 4 4
DFF 4 0 0
XOR 2 0 0
AND 2 0 0
OR 1 0 0
carry/3-Majority cell 0 1 1
sum/3-XOR cell 0 1 1
Total JJs 157 84 80
Area (mm
2
) 0.055 0.0235 0.0218
Total Current (mA) 12.5 7.09 6.15
the AND gates in the current stage receive pairs of data bits (X[i] and Y[i] fori∈ [0...7]). Notice that, to
avoid crowding the figure, path balancing DFFs that ensure that the path length from any circuit inputs to
any circuit outputs is the same (i.e., 18 stages) are not shown in Fig. 3.14. Those DFFs need to be inserted
on the input ( X[0:7] and Y[0:7]) to output paths (P[0:14]) as needed. For example, a chain of 17 DFF’s will
be inserted after the AND gate that produces P0 whereas a chain of 16 DFF’s will be inserted after the the
half-adder (HA) gate that produces P1, and so on. Fig. 3.15 shows the floor-plan of the multiplier. Each
stage of the layout consists of two path balancing DFF arrays at top and bottom, which carry the input
99
data (X[0:7] and Y[0:7]) propagating along the whole chip and do the path balancing. The required bits
of the input data are taken from the DFF paths, feeding into the AND gates. The AND gates are in a gate
group with the full adders and half adders sitting in the middle of each layout stage. On side of the data
path, there are clock networks going through every stage. Data therefore is evaluated from left to right,
stage by stage, driven by the clocks.
Figure 3.14: Block-level diagram of the 8-bit multiplier.
There are several commonly used clocking methods to choose from, e.g., concurrent flow clocking,
counter-flow clocking, clock-follow-data clocking, and so on [47]. This multiplier utilized a clock-follow-
data timing scheme in which a single clock pulse carries the data through the whole system from input
to output. Similar to the concurrent flow, the clock flows in the same direction as data, but in the clock-
follow-data scheme, the clock signals which are released by previous cells arrive at the current cells later
than the data. As shown in Fig. 3.16, the timing meetsT
clock
>T
i
+T
data
whereT
clock
denotes the clock
delay from thei
th
stage to the(i+1)
st
stage (including the splitter delay, JTL, PTL and any intentionally
inserted delay),T
i
denotes the logic delay of thei
th
stage andT
data
denotes the data path delay between
100
Figure 3.15: Floorplan of the 8-bit multiplier.
adjacent stages. The clock-follow-data scheme has the potential to achieve the same minimum clock period
as concurrent clocking, though a more precise timing check is required.
Designs employing the clock-follow-data and concurrent clocking schemes tend to be more sensitive to
fabrication process variation and if the manufactured chip malfunctions due to improper timing, unlike the
counter-flow which has a good chance of working correctly if one lowers the operating clock frequency, the
clock-follow-data and concurrent clocking schemes cannot be "repaired" by dropping the clock frequency.
A potential solution to mitigate this issue is to separate the biasing circuitry of the clock distribution
network from that for all other gates. Note that, to achieve proper timing, the clock distribution network
typically contains several delay cells (e.g., a splitter with one output port grounded with a resistor as
shown in Fig. 3.17). By changing the bias of these delay cells, one can tune the clock versus data signal
timing relationships after the chip fabrication, and thereby, "repair" some of the chips that employ the
101
clock-follow-data or concurrent clocking schemes but fail to meet their timing requirements. Note that
a "repaired" circuit will typically have a lower maximum operating clock frequency. Fig. 3.18 is a plot
showing the propagation delay of the cell as a function of the bias current. The delay time increases when
the bias current is reduced.
Figure 3.16: Clock-follow-data clocking scheme.
Figure 3.17: Schematic of the delay cells.
3.1.6 DesignUsingAutomationTools
There has been significant progress in development of design automation tools for SFQ circuits [48], [49],
including synthesis [50], placement[51], routing [52], and timing analysis [53]. It is instructive to examine
the synthesis results of some larger SFQ circuits using the new carry and sum cells and compare it with
the conventional full adder design. Note that most of the current automation tools assume the use of
passive transmission lines (PTLs) for the connections among different cells implying the use of additional
cells (PTL Transmitter and PTL Receiver) attached to each output and input of the every gate in the circuit.
102
Figure 3.18: Input-to-Output delay of the employed delay cell for different bias current conditions.
However, using PTLs for interconnections among cells, the cost of using simple logic gates tend to be more
than it appears. For example, three PTL receivers and one PTL transmitter need to be added to a simple
2-input AND gate for using it in the context of any circuit. Thus, using complex gates that combine the
functionality one multiple simple gates becomes even more beneficial. Table 3.4 shows the implementation
result of the array multiplier with different input bits ( 8-bit, 16-bit and 32-bit). The red numbers are the
results using a cell library with the newly designed single-stage adder (STDL+SA) and the black numbers
are the results using the same library along (STDL) and with the gates designed in [46]. For the 8-bit, 16-bit
and 32-bit cases, the savings on cell number, JJ number, area and static power are about 40 percent to 50
percent compared with STDL and 20 percent compared with the 3-XOR/3-Majority design. For the 32-bit
array multiplier, due to the limitation of computing resources, the design automation tools can not handle
the system using standard cell library while the implementation is still feasible thanks to the simplification
by using single-stage adder.
103
Table 3.4: ARRAY MULTIPLIER DESIGN AUTOMATION TOOL IMPLEMENTATION RESULT
STDL+SA/STDL+[46]/STDL ArrMu8 ArrMu16 ArrMu32
Depth 21/30/40 45/63/116 93/160/NA
Logic Gate Number 197/210/320 1,5071600/1,924 6,602/7323/NA
DFF Number 395/584/734 1,865/2634/4,865 8,451/14,797/NA
Splitter Number 876/1217/1,373 5,373/6499/8,663 23,872/26,486/NA
Total Cell Number 1,468/2011/2472 8,745/10,733/15,452 38,925/48,606/NA
Total JJs 12,475/17,294/21,544 73,521/91,767/136,079 327,473/413,151/NA
Area(mm
2
) 12.39/17.20/22.41 90.20/109.45/142.6 412.26/520.6/NA
Static Power(mW ) 1.193/1.51/1.973 7.109/9.23/12.56 31.6541.19/NA
3.1.7 SimulationResults
The final 8-bit multiplier chip layout using the structure described above is shown in Fig. 3.19. The layout
is folded after the 10th stage to maintain a square-like core layout. Fig. 3.20 depicts example full-chip sim-
ulation waveforms demonstrating the correct circuit functionality at 40GHz. Five different input sets are
fed in sequentially, including an all-zero set. The multiplier processes sign-magnitude integer representa-
tion number where the MSB (X[7], Y[7] and OUT[14]) represents the sign and the remaining lower-order
bits represent the magnitude of the number. Table 3.5 is a performance summary of the 8-bit multiplier.
The multiplier consists of 3,674 gates, including 35 carry cells, 35 sum cells, DFFs, JTLs, AND gates, PTL
drivers, PTL receivers and so on. There are 12,120 JJs, occupying11.1mm
2
and consuming about2.9mW
of static (DC) power at a bias voltage of2.5mV .
Table 3.5: summary of 8-bit parallel multiplier
Process MIT Lincoln Lab SFQ5ee
Input Bits 8x8
Cell Number 3674
Total JJs 12,120
Clock rate 40GHz
Area 3.7mm× 3.0mm=11.1mm
2
Static Power 2.929mW
104
Figure 3.19: Chip layout of the 8-bit multiplier.
3.1.8 Conclusion
The 1-bit full adder function is implemented as two RSFQ single-stage complex gates, a carry gate and a
sum gate. Both schematics and layouts are given. The correct functionality of this design is verified by
post-layout JSIM simulations. The single-stage full adder design saves half of the JJs compared with multi-
stage full adder using standard gates, resulting in a saving of half of the area and power consumption.
Compared with previous complex adder design, a balanced timing characteristic is achieved by this new
adder. Later, an 8-bit multiplier design is done using the said full adder design. Results for the multiplier
are provided, showing that, by using the single-stage full adder, we can greatly reduce the depth of the
circuit, the gate count, and number of PTL connections, ultimately resulting in an area saving and power
saving compared to a design of the 8-bit multiplier utilizing (standard) simple logic gates. In the end,
the implementations of larger size multipliers by the design automation tools are demonstrated, showing
reasonable savings on cell number, JJ number, area and static power. This work is published in [54], [55].
105
Figure 3.20: Simulation waveform of 8-bit multiplier.
3.2 AllDigitalPhase-Lock-LoopDesigninRapidSingleFluxQuantum
Technology
3.2.1 BackgroundandPriorWork
Compared with the complementary metal–oxide–semiconductor (CMOS) circuits, most of the RSFQ logic
gates have to be clocked, which leads to challenges related to near-zero-skew on-chip clock distribution.
Although there are systems built with clock-less cells[56], or self-timed techniques such as [57], [58],
the majority of SFQ designs still rely on a clock for gate-level synchronization. As a result, this unique
characteristic of the SFQ circuits puts more emphasis on the clock network, including clock generation or
synthesis, clock distribution and clocking strategy.
106
Realizing the significance of the clock, a lot of work has been done on developing advanced clock
strategies for general SFQ system design. For example, reference [59] focuses on algorithms that opti-
mize the construction of a specific type of clock network topology (called (HC)
2
LC) that uses an asyn-
chronous clock distribution network to provide the timing for a fully synchronous circuit. There is also
work attempting to reduce the large overhead of path balancing D-flip flops in RSFQ circuits by using dual
clocks[60]. These works focus on finding an optimal clocking solution for the RSFQ circuit assuming that
a precise and noiseless on-chip clock source is available. Normally, this on-chip clock source is provided
by a phase locked loop (PLL).
A PLL is a negative feedback circuit that synchronizes an output signal (generated by an oscillator)
with a reference or input signal in terms of both frequency and phase [61]. As seen in Fig.3.21, the basic
blocks in a PLL are (i) a voltage controlled oscillator (VCO) which generates the oscillating signal, (ii)
a phase detector (PD) which extracts the phase difference between the output signal and reference, and
(iii) a low-pass filter (LPF) which gets rid of the high frequency components in the output of the PD to
generate the VCO control voltage. More precisely, the phase information, which is generated by the PD,
goes through the LPF and controls the VCO so that it oscillates at a target frequency with a constant phase
shift from the reference signal. In most cases, the reference signal is generated off-chip at a relatively slow
rate. As a result, a frequency divider must be inserted between the VCO’s output and the PD input so that
the PLL can produce an oscillating output at the much higher target frequency compared to the reference
signal as shown in Fig.3.21 (b)[62].
While the analog PLL (APLL) is still widely used, the digital PLL (DPLL) outperforms APLL in several
aspects: it is smaller in size, consumes less power, has a better ability to scale with the process technology
node, and is robust to process-voltage-temperature (PVT) variations [61]. Some variants of the DPLL’s
replace some blocks of an APLL with digital cells, leaving certain parts like the VCO as an analog blocks.
107
In contrast, an all-digital PLL (ADPLL) is implemented with all-digital blocks, including the VCO, which
is called a digital controlled oscillator (DCO).
Figure 3.21: (a) Basic idea of a PLL; (b) a PLL with frequency divider.
Few researchers have worked on PLL design using the SFQ technology. A bias controlled phase-lock-
loop (PLL) at 50GHz was presented by Hypres [63] in 2000. In this design the phase detector and frequency
divider are digital, whereas the loop filter and voltage-controlled oscillator are analog. In [63] a single
critically damped JJ and an RS flip-flop with DC output were used as the voltage-controlled oscillator
(VCO) and a phase detector (PD) for signal rates up to 10MHz, respectively. The output of the PD was
sent through an on-chip DC amplifier and a resistor-divider network, providing an extra voltage on the
VCO besides the pre-adjusted bias current. Reference [64] presented an on-chip oscillator using an under-
damped Josephson Junction array along with a frequency divider and phase detector to be used in a PLL.
3.2.2 DigitalPhaseDetector
A nondestructive readout (NDRO) cell with proper drivers can serve as a digital phase detector. To fulfill
the phase detection function, the reference clock from the off-chip CLK
ref
and the on-chip clockC
0
are
connected to the RESET and SET ports of the NDRO, respectively, as shown in Fig.3.22 (a). A second clock
108
signalC
180
which has a 180-degree phase shift fromC
0
is used to drive the READOUT port of the NDRO.
During one period of C
180
, the state of NDRO is set or reset depending on the relative arrival times of
the reference clock C
ref
and on-chip clock C
0
. Precisely, when C
0
arrives before C
ref
, the NDRO will
be first set to ’1’ and then reset to ’0’, leaving the final stored state of NDRO as ’0’ (EARLY case). On the
other hand, when C
0
arrives after C
ref
, the final stored state will be ’1’ because the NDRO is first reset
and then it is set to ’1’ (LATE case). Table. 3.6 shows the logic function of the PD. Later, the phase detector
generates an output pulse when aC
180
pulse arrives at the READOUT port of the NDRO. Fig.3.22 (b) shows
input/output waveforms for the NDRO based digital PD.
Figure 3.22: (a) A NDRO performs as phase detector; (b) the waveform showing function of the PD.
Table 3.6: Truth table of the phase detector logic.
PD output Phase
0 EARLY
1 LATE
109
3.2.3 DigitalControlledOscillator
There are different methods to generate an oscillating signal in the SFQ technology. For example, in [63]
a single critically damped JJ is used to provide the oscillating clock whereas in [64] a JJ array is used as
the voltage-controlled oscillator. Note that in both references, control signals were analog bias voltages.
In our proposed all-digital PLL design, an adjustable delay loop is used to achieve the digital controlled
oscillation as shown in Fig.3.23. There are mainly three parts in this digital controlled oscillator (DCO)
design: a merger which is used to initiate the oscillation; an adjustable delay part (ADP) which is used to
tune the frequency of the DCO, and a passive transmission line (PTL) which is used to complete the loop.
The merger cell (or the confluence junction cell) produces an output pulse whenever an incoming pulse
comes from either of the input ports. An initial pulse can be injected from one of the merger inputs. This
pulse will then propagate along the ADP and PTL loop and back to the other input port of the merger cell
where the pulse goes to the next cycle of this iterative process. The output is tapped by a splitter inserted
in the loop. As one can see, the oscillation period is proportional to the loop delay of the DCO which can
itself be divided into two parts: a fixed delay part which accounts for the merger delay, PTL delay, PTL
driver/receiver delay, and Splitter delay and an adjustable delay part (ADP) which is made of an array of
tunable delay Josephson Transmission Lines (JTLs).
Fig.3.24 shows the schematic of a tunable delay JTL cell using a design similar to the one proposed
in [65], [66], where the re-configurable cells are implemented by adding a current adjustment branch on
top of a JJ. By doing so, we can switch between two different bias modes, and thus, the function can be
re-configured to be that of a JTL/DFF, AND/OR, NOR/NAND, or to exhibit different delays like in this
design. When there is an SFQ pulse at the SET port, J1 is triggered and provides a flux through the J1-L1-
J2-L3-J4 loop which in turn adds more current to junction J4. When the RESET signal comes, the current
adjustment branch is reset by the flux released from junction J3, and the bias current flow in junction J4
is reset to its original value. In practice, the tunable delay JTL is used with the IN port and the OUT port
110
in the data path and the SET and RESET ports are used as the configuring controls. Initially, the signal
propagates through the cell from IN to OUT with a large delay. When a SFQ pulse is fed from the SET
port, the cell is configured for the fast mode where the propagation delay becomes smaller. Moreover, an
SFQ pulse received at the RESET port will reset the cell back to its initial mode (that is, the slow mode).
More details about how the current is adjusted are provided in [65], [66].
For our DCO design, instead of re-configuring the function of the cell, the current adjustment branch is
used to tune the delay of a JTL. Fig.3.25 is a plot showing how the delay of a JTL cell changes with the bias
current. This plot was obtained by sweeping the bias current of a Josephson Junction with130µA critical
current and recording the propagation delay of a SFQ pulse through the JJ. As we expect, the higher the
bias current is, the shorter the delay will be. To fulfill a good design (larger margins and better tunability),
the nominal bias current of J4 is set at around80% of its critical current. Notice that if the nominal bias
is too low, the delay varies a lot along the bias as shown in Fig.3.25. If the nominal bias is set too close
to the critical current, there is a chance that the junction starts to oscillate after fabrication. To further
manipulate the delay difference for fast and slow modes, a designer can change values of L1 or L3. With a
larger loop inductance, when the cell is set (a flux is stored in the J1-L1-J2-L3-J4 loop), the increased loop
current will be smaller, therefore the delay difference will be smaller. Eight tunable delay cells constitute
the ADP, which enjoys a 9-level delay tunability for the DCO. This DCO takes digitally-controlled bits
from a digital loop filter as will be explained in the next section.
The tunable delay cell has large margins with respect to the bias and all cell parameters due to its simple
structure. However, one must examine the effect of process variations on the tunable delay cell since the
operations of DCO depend on the SFQ pulse propagation delay, which is generated by the tunable delay
cell. Therefor, we run Monte Carlo simulations on the entire DCO, recording the oscillation period of the
DCO for different control codes. For each control code, 1000 Monte Carlo simulations were executed with
process variations for the critical current of JJ (I
c
), the inductance of a metal inductor and the resistance
111
of all resistors. More specifically, the I
c
spread is set to 3% as the largest value mentioned in [17] for JJ
sizes from 1500nm to 500nm. For the inductance and resistance, we assumed the spread values of3% and
15% respectively.
Fig.3.26 shows the histogram of the MC simulated periods for different control codes. Table 3.7 shows
more details about the results including the codes, the mean values of the clock period, and its standard
deviation (STD). Notice that, due to the process variations, there are some overlaps between the periods
corresponding to different codes. But for one specific MC sample, the period of the DCO is monotonically
increasing along with the control codes, which is critical to the proper operation of the DCO. The mean of
the minimum period is18.08ps which corresponds to55.3GHz whereas the mean of the maximum period
is22.57ps which is44.3GHz. Together, these define the possible locking range( ± 5.3GHz or± 10%) for
the PLL. Note that the delay/period steps between smaller codes are larger than those between codes that
are close to the fastest rate. For example, the period difference between code ’00000000’ and ’00000001’
is 1.12ps whereas the differences for subsequent pairs of codes are around 0.5ps. This is because of the
bias current leakage effect as explained next. When the bias current of a cell increases, cells adjacent to it
also experience a small increase in their junction bias currents. This effect becomes weaker as the nominal
bias of a junction gets closer to its critical current level. Therefore, the first jump of the code will not only
increase the bias of the first tunable delay cell, but also incite the cells following and followed by it. The
DCO period step is another important specification as shown in Fig.3.27 with a mean value of 0.49ps and
a standard deviation of0.08ps .
3.2.4 DigitalLoopFilter
In a phase locked loop, the digital loop filter (DLF) is used to provide the low pass filtering function to
eliminate the in-band noise contributed by the PLL reference clock and the phase detector. In this RSFQ
all-digital phase locked loop, the DLF consists of a digital accumulator and a direct tapping of the phase
112
Figure 3.23: Block diagram of the digital controlled oscillator.
Figure 3.24: Schematic of the tunable delay JTL cell: critical current J1 = 131.7µA,J 2 = 60µA,J 3 =
271.6µA,J 4 = 130µA,J 5 = 149.7µA,J 6 = 100µA ; inductance L1 = 3.3pH,L2 = 1.87pH,L3 =
4.0pH; bias currentI
1
=100µA,I
2
=31.25µA,I
3
=125µA .
detector output, as shown in Fig.3.28 (a). Firstly, an inverse signalPD is generated from the phase detector
outputPD by an invertor. A DFF is inserted to keep the path balance. Later on, both the PD output and
its inverse are used to control one of the tunable delay cells in DCO. Meanwhile, the same signals feed the
digital accumulator whose function is to add/subtract one from the previous value according to the signs
of the inputs. As shown in Fig.3.28 (b), the z-transform of the DLF transfer function is:
113
Figure 3.25: The delay of JTL with different Josephson Junction bias current on a 130µA -critical-current
JJ.
Table 3.7: Summary of the DCO Monte Carlo simulations.
Code Mean of period Standard deviation (STD)
00000000 22.57ps 0.56ps
00000001 21.45ps 0.45ps
00000011 21.02ps 0.43ps
00000111 20.4ps 0.50ps
00001111 19.92ps 0.44ps
00011111 19.49ps 0.46ps
00111111 19.00ps 0.46ps
01111111 18.52ps 0.46ps
11111111 18.08ps 0.46ps
H(z)=
Y(z)
X(z)
=
1
1− z
− 1
+1 (3.3)
The digital accumulator is an array of the accumulator units (AU). Shown in Fig.3.29, each of the AUs
consists of two NDROs and two AND gates. The SET and RESET signals are connected opposite of one
another for the two NDROs, which ensures that whenever one of the NDROs is set, the other NDRO is reset.
By doing so, the NDRO pair provides a set of complementary signals (C
andC
). For theα NDRO
114
Figure 3.26: Histograms of the Monte Carlo simulation as a function of the DCO oscillation period showing
the means and standard deviation (STD).
Figure 3.27: Histogram of the Monte Carlo simulation as a function of the DCO frequency step.
115
generatingC
, the SET port is tied to an AND gate where the two inputs are from PD and theα NDRO
of the last AU (C
). Thus, C
will only be set when both the phase detection result is ’1’ (late)
and the last bit (C
) is ’1’. Similarly, the β NDRO generating C
is set when the phase detection
result is ’0’ (early) and the next bit (C
) is ’0’. Here we construct a unary-coding (or thermometer-
code) accumulator where the output bits accumulate ’1’s initiating atC
<0>
; each bit has the same weight;
and the magnitude of output is represented by the number of the ’1’s. For example, for a 7-bit unary-
coding accumulator, ’1111111’ is seven and ’0000011’ represents two. In the end, the 7-bit-unary-coding
complementary outputs are routed to the control ports of the tunable delay cells in the DCO whereC
<0:6>
are connected to the SET ports andC
<0:6>
are connected to the RESET ports (along with thePD andPD
signals mentioned above). By doing so, the digital loop filter generates an 8-bit control code for the DCO
oscillating frequency.
Figure 3.28: (a) Structure of the digital loop filter; (b) its mathematical equivalent.
116
Figure 3.29: Block diagram of the digital accumulator.
3.2.5 10-bitFrequencyDivider
The frequency divider is realized by using ten cascaded toggle flip-flops (TFF) as shown in Fig.3.30. The
state of a TFF toggles between ’0’ and ’1’ any time an input signal arrives. The TFF gives an output pulse
when its internal state returns to ’0’ from ’1’. Thus, the TFF performs equivalently as a divide-by-two
frequency divider. In a RSFQ system, since most of the logic cells are intrinsically clocked, the fastest
working rate is typically limited by the propagation delay of a logic cell plus the connection delays (both
JTL and PTL), which results in several pico-seconds but certainly less than twenty pico-seconds. Therefore,
many of the well-designed RSFQ system can operate at a clock rate close to 50GHz. Meanwhile, the off-
chip reference clock cannot be set to a very high rate because the long connections to access the RSFQ
chips and the bonding wires result in a limited-bandwidth communication channel. On the other hand, a
reference clock which is set to a low frequency increases the updating time for the phase detector and thus
adversely affects the stability of the PLL loop. So we chose 50GHz as the target output clock frequency
and 48.8MHz as the off-chip reference clock, which results in the ratio of those two clocks to be 1024 ( 2
10
).
By cascading 10 stages of TFFs, the 50GHz high speed system clock is down converted to the reference
117
frequency which is 48.8MHz. Note that, the phase detector discussed above requires two clock signals with
180 degrees phase shift from one another. So an extra XOR gate is utilized to tap the output signals from
the TFFs of last two stages and the exclusive function creates the desired 180 degree shifted signal versus
the last TFF output as illustrated in Fig.3.30.
Figure 3.30: Block diagram of the frequency divider.
3.2.6 RSFQAllDigitalPLL
Fig.3.31 is the top-level block diagram of an RSFQ all-digital PLL. From the digital phase detector, early/late
information is extracted and sent through a digital loop filter to the DCO. As mentioned in section 3.2.3
and the, the DCO consists of eight tunable delay cells where C
<0:6>
are connected to seven of the SET
ports and C
<0:6>
are connected to the RESET ports. At the same time, the PD and PD are connected
to the SET and RESET ports of the eighth tunable delay cell respectively as another pair of controlling
signals. The output frequency of the digital controlled oscillator (DCO) changes according to the control
code bits. Later, the 50GHz clock from DCO is divided by a 10-bit FD. The down-converted low-rate clocks
are sent to PD where they are utilized along with an external reference clock to produce the early/late
information mentioned above. By doing so, the phases of the divided clocks are locked to the reference
as well as the 50GHz clock generated by the DCO. Fig.3.32 is the layout of PLL done in the MIT Lincoln
118
lab’s SFQ5ee process [17]. The PLL uses 1688 JJs, consumes about0.45mW of total power and7.76mm
2
of area. Although the JJ count is not very large, there are challenges to greatly reducing the overall layout
area. This is because most of the area is consumed by connections (to ensure inter-cell connectivity and
to satisfy timing requirements e.g., the loop filter blocks must capture the phase information from the PD
simultaneously and the same for the DCO units to receive the controlling code.)
Fig.3.33 is the mathematical equivalent of the RSFQ all-digital PLL: the phase detector is represented
by a phase adder with a minus sign on one of the inputs; the digital loop filter has a transfer function
as described in section 3.2.4; the DCO is equivalent to a phase integrator with a gain of K
DCO
; and the
frequency divider is a divider in the phase domain. The overall transfer functionH(z) may be written as:
G(z)=(
1
1− z
− 1
+1)(
K
DCO
1− z
− 1
) (3.4)
[Φ
in
(z)− Φ out
(z)/1024]G(z)=Φ out
(z) (3.5)
H(z)=
Φ out
(z)
Φ in
(z)
=
1024· G(z)
1024+G(z)
(3.6)
The phase error function is:
Φ err
(z)=Φ in
(z)− Φ out
(z)
1024
=
1024
1024+G(z)
Φ in
(z) (3.7)
Because the reference clock is about 48.8MHz, a very long simulation time (severalµs ) is required to
properly observe the behavior of the PLL, which makes it extremely difficult to simulate the PLL using
circuit level simulators like JSIM [32] or JoSIM [33]. Here to examine the PLL design, we built verilog-a
models for every RSFQ cell, considering the effects of set-up time, hold time, propagation delay, digitally
119
Figure 3.31: Block diagram of the RSFQ all-digital PLL.
Figure 3.32: Layout of the RSFQ all-digital PLL.
controlled oscillations, and so on. Next, a systematic simulation is performed using those behavioral mod-
els. During the simulation, an initial start-up pulse is applied to trigger the oscillation as described in
section 3.2.3 and the DCO is reset to 00000000 (or any other arbitrary code). After that, the PLL will lock
the FD output signal to the reference clock as shown in Fig.3.34. In this figure, the red pulses mark the
off-chip 48.8MHz reference pulses whereas the blue pulses are from one of the frequency divider outputs
120
Figure 3.33: Mathematical equivalent of the RSFQ all-digital PLL.
as mentioned in section 3.2.2 and section 3.2.5. As we can see, the phase of the FD output toggles around
the reference and is automatically pulled back as needed, which demonstrates the correct phase-locking
function.
Furthermore, Fig.3.35 shows the period jitter result of the 50GHz DCO output. At beginning, the PLL
takes about0.2µs to reach the steady state. After that, the DCO oscillating at 50GHz with a2.93ps peak-
to-peak period jitter, which corresponds to a 489fs rms jitter. It is also beneficial to study the effect of
the DCO oscillation frequency step on the final output. Fig.3.36 gives the results for different DCO steps.
As we can see, the larger the step is, the bigger the output jitter will be. This design uses a step value of
about0.49ps. Note that although the step value can be reduced to0.3ps, this would reduce the operating
margins of the tunable delay cell and undermine the robustness of the system. The steps of DCO could
be changed in the designing stage by changing the delay difference of the individual tunable delay cells as
mentioned in section 3.2.3. To further improve the jitter performance, the loop filter may be optimized to
do higher order filtering; the tunable delay cell can be optimized to achieve finer tunability, and the phase
detector may be replaced with a higher resolution PD.
Table 3.8 is a feature summary of the proposed SFQ all-digital PLL and the one reported in [63]. Com-
pared with [63], this new design implements the PLL with all-digital blocks, exhibits robust performance
in spite of the process variations, and has a larger locking range (± 10%). In terms of the 50GHz output
121
oscillating signal, reference [63] does not present information about the jitter or phase noise, so we can
only report our simulation results without any comparison.
Figure 3.34: Simulation waveform of the locked signal with reference.
Figure 3.35: Simulation waveform for the period jitter.
122
Figure 3.36: Plot of the output jitter vs. the DCO oscillating frequency step in ps.
Table 3.8: Summary of the features of the SFQ all-digital PLLs.
This Design [63]
Target frequency 50GHz 50GHz
Oscillating control mode Digital Analog
Locking range ± 10% ± 1.5%
Reference clock 48.82MHz 12MHz
Frequency divider 10-bit 12-bit
Jitter
rms
489fs N/A
Jitter
pp
2.93ps N/A
Area 7.76mm
2
N/A
Power 0.45mW N/A
3.2.7 Conclusion
An all-digital PLL is implemented using the RSFQ circuit technology. The PLL comprises a digital phase
detector (PD), a digital controlled oscillator (DCO), a digital loop filter (DLF), and a digital frequency divider
(FD). The digital phase detector uses an NDRO with two 180-degree phase-shifted signals to fulfill the
phase detection function. The DCO relies on a newly designed tunable delay cell to realize the oscillating
frequency controlling. The digital loop filter is made of a direct tapped branch from the PD and a digital
accumulator which also uses only common digital cells like NDRO cell and AND gate. Ten cascaded toggle
123
flip-flops compose the divide-by-1024 frequency divider. Monte Carlo simulations were performed on the
DCO with newly designed cells, demonstrating its robust operation. Simulation results confirm the correct
functionality of the digital PLL, which takes a 48.8MHz off-chip reference and produces a 50GHz on-chip
high speed clock signal with a2.93ps peak-to-peak jitter and489fs rms jitter, consuming only0.45mW
total power and7.76mm
2
area. This work is published in [67].
3.3 MemoryPrototypeUsingMulti-FluxonStorageCell
3.3.1 BackgroundandPriorWork
In both mature complementary metal-oxide semiconductor (CMOS) processes and superconducting quan-
tum computing systems, memory plays a crucial role. The success of a microprocessor heavily relies on
its ability to read and write data, which in turn is linked to the design of the memory block. The key
factors to consider when designing memory blocks include, but are not limited to, memory density, power
consumption, accessibility, robustness, and so on. In recent years, numerous studies have focused on de-
signing memory blocks for superconducting quantum computing systems. For instance, [68] proposes a
Josephson-CMOS hybrid memory design that combines the benefits of both technologies. [69] introduces
the Vortex Transitional (VT) memory cell and constructs a random access memory (RAM) array using it.
[70] proposes a high-speed, low-power, cryogenic matrix memory that utilizes π -shifted ferromagnetic
Josephson junctions (π -junctions). Additionally, [71] explores the potential of a high-capacity memory
block that can store more than one fluxon in each register file.
3.3.2 Multi-fluxonDestructiveReadoutDesign
Figure 3.37 illustrates the schematic of the multi-fluxon destructive readout (MFDRO) cell. The MFDRO
design is based on the commonly used SFQ D flip-flop cell. Since there are multiple fluxons entering at
the ’IN’ port, the input buffer junction, which is used to cancel the excess incoming fluxon in the DFF, is
124
removed. Additionally, to store more than one fluxon, the loop inductance L1 in J1-L1-J2 is increased. The
optimized circuit parameters are presented in Table 3.9, along with their simulated margins in Fig. 3.38.
The MFDRO operates as follows: Its initial state is ’0’. When a fluxon (pulse) is detected at the input port
’IN’, the stored fluxon count increases by one. If the cell reaches its maximum capacity of three fluxons,
the next input fluxon will simply pass through the cell and exit at the ’OUT’ port. This scenario should
be avoided by the system designer. When a pulse arrives at the ’CLK’ port, the cell will release one fluxon
per ’CLK’ pulse and generate an output pulse at the ’OUT’ port until the internal state becomes ’0’ again.
A state machine diagram is depicted in Fig. 3.39.
Figure 3.37: Schematic of the MFDRO cell.
Table 3.9: Circuit parameter
Parameters Values Margins
J1 187µA − 35%/+20%
J2 75µA − 50%/+32%
J3 88µA − 20%/+19%
J4 184µA − 20%/+20%
L1 29.5pH − 50%/+50%
L2 2.2pH − 50%/+50%
I1 118µA − 35%/+50%
I2 118µA − 22%/+50%
125
Figure 3.38: Margins of the MFDRO cell.
The MDFRO cell was fabricated using the MITLL SFQ5ee process and its functionality was verified
through low-speed measurements. The photo of the chip is presented in Fig. 3.40, while the layout struc-
ture of the cell is shown in Fig. 3.41, which is a screenshot from the CAD tool. The measured waveforms,
illustrated in Fig. 3.42, display four pairs of signals, each consisting of the output signal of the MDFRO cell
(upper) and the clock signal (bottom). The level changes of the output signal signify the value ’1’, whereas
the rising edge of the clock signal represents the input ’1’. The cases depicted in the figure correspond to
four different initial states of the cell, i.e., three, two, one, and zero fluxons, and a series of clock pulses were
used to read out the stored information. For instance, in case (a), the output signal changed its level three
times in succession, indicating that there were three fluxons stored in the MDFRO cell. The measured bias
margin is approximately8.6%. However, the measured margin is not optimal, and hence the MDFRO was
further optimized after measurement to improve the margins. The circuit parameters presented in Table.
3.9 have been updated accordingly.
126
Figure 3.39: Diagram of the state machine for MFDRO.
Figure 3.40: Chip photo of a test structure for MFDRO.
Figure 3.41: Layout of the MFDRO cell.
127
Figure 3.42: Measured waveform of the MFDRO cell. (a) Case for three fluxons; (b) case for two fluxons;
(c) case for one fluxon and (d) case for zero fluxon.
3.3.3 Multi-bitRandomAccessMemoryDesign
The MFDRO can be utilized to create a multi-bit random access memory (RAM) by exploiting its unique
feature, wherein two bits of data can be stored in one register file, thus increasing the memory density.
To improve the memory density further, the loop inductance of the MFDRO can be increased to enable
it to store more fluxons, for example, seven. However, this trade-off comes with the cost of reduced ro-
bustness. Therefore, we have chosen to store three fluxons corresponding to two bits, with the possibility
of expanding the capacity upon achieving better MFDRO design. Fig. 3.43 illustrates the register file’s
structure, consisting of an MFDRO cell, a high capacity writing block (HC-write), and two AND gates.
The boxes labeled with ’T’ in the figure denote the delay blocks, either intrinsic wiring delay or an in-
tentionally inserted delay block. By adding delays to the clock path of two AND gates, the register file
128
is self-clocked. The two bits of data, in[k] and in[k+1], are masked by a writing address selection signal
(WI[i]) before converting the parallel signals into a single line serial data by the high capacity writing
block. The functionality and structure of the HC-write block are illustrated in Fig. 3.44. The MSB and LSB
branches of the signal are merged, followed by the merging of the resulting signal with the MSB signal
again. The HC-write block produces a series of pulses whose amount is determined by the binary number
formed by the MSB and LSB.
∆ τ 1
=τ 1
− τ 2
(3.8)
∆ τ 2
=τ 0
+τ 1
− τ 2
+d
merger
(3.9)
The output signal is connected to the ’IN’ port of the MFDRO cell. Meanwhile, the clock signal is con-
trolled by the read address selection signal (RO[i]). During the read operation, three sequential pulses are
generated from the ’RO[i],’ and the MFDRO provides the output to the bit line based on the stored fluxon
number.
Figure 3.45 illustrates the connection configuration of the memory bank, wherein the write and read
data signals are shared in one column, while the address selection controls are linked through the row.
Subsequently, the memory bank is regulated by additional control circuits as depicted in Figure 3.46. Firstly,
an address decoder is employed to mask the write/read signals using the decoder’s outputs. Given that
the memory cell can store up to three fluxons, the reading operation necessitates three sequential pulses
to entirely unload the two bits information, which is accomplished by the "HC-clock" structure in Figure
3.47. This mechanism splits and merges the incoming "CLK_IN" signal twice, producing one input pulse
into three consecutive pulses with time intervals determined by corresponding delay components.
∆ τ 1
=τ 1
+τ 4
− (τ 0
+τ 3
) (3.10)
129
∆ τ 2
=τ 1
+τ 2
+τ 3
+d
splitter
− (τ 1
+τ 4
) (3.11)
The output data of the memory is a serial signal on a single line, which can be transformed into a two-bit
binary format using two T1-flipflops (T1FF) to create a serial-to-parallel counter illustrated in Figure 3.48.
The T1FF consists of an input port that can toggle the T1FF’s internal state between ’0’ and ’1’. The ’C’
output is the asynchronous carry output that generates a pulse when the internal state toggles from ’1’
back to ’0.’ The ’S’ output is the synchronous sum output that, upon receiving the ’CLK’ signal, produces
a pulse at ’S’ if the internal state is ’1’ and resets its state to ’0.’ The circuit implementation can be found
in [72].
Figure 3.43: Register file for the multi-bit random access memory.
3.3.4 Error-resistMemoryDesign
The MFDRO can be repurposed from a high capacity memory to an error-resistant memory. In practice,
external factors such as power line interference or routing complexity may cause memory states to be
falsified. To mitigate such issues, the MFDRO can be used as a one-bit storage cell instead of multi-bit
storage. In this case, states ’0’ and ’1’ are recognized as stored ’0’, while states ’2’ and ’3’ represent ’1’, as
shown in Table 3.10. This approach provides a buffer margin of one fluxon. The RAM memory structure
in Fig. 3.49 remains valid but requires modifications. The data write block ’HC-write’ must be replaced
with an error-resistant writing block (ER-write) using the same ’HC-clock’ block as shown in Fig. 3.47.
This results in a data ’1’ being stored as three fluxons in the register file. If the stored fluxons are tampered
130
Figure 3.44: Structure and waveform diagram of the HC-write block.
Figure 3.45: Connection of the register files.
with, or one pulse is missed during reading, the memory still treats it as a data ’1’. Fig. 3.50 shows the
error-resistant memory diagram, in which the ’ER-clock’ is identical to the ’HC-clock’. To read the stored
data, a threshold detecting circuit is needed instead of the T1FF-based counter. This circuit senses the
output pulse count and produces output according to Table. 3.10. A soma circuit, described in [73], is an
131
Figure 3.46: Diagram of the multi-bit random access memory.
Figure 3.47: Structure and waveform diagram of the HC-clock block.
efficient threshold detecting circuit, as shown in Fig. 3.51. The data path is a Josephson-transmission-line-
like structure with a resistor and a coupling inductor. When input pulses come from the ’IN’ port, the
loop current of J1-L
Loop
-R
Loop
-J2 accumulates until it exceeds the threshold set by the circuit parameters.
However, the resistor gradually depletes the loop current, ensuring that the current reading is not affected
132
Figure 3.48: Design and waveform diagram of the serial-to-parallel counter using T1FF.
by the residue of the previous cycle. This approach forms an error-resistant memory that can tolerate the
error of one fluxon.
Table 3.10: Error-resist memory logic table
MFDRO state function
0 designed 0
1 buffered 0
2 buffered 1
3 designed 1
133
Figure 3.49: Register file structure for the error-resist memory.
Figure 3.50: Diagram of the error-resist memory.
3.3.5 Conclusion
This section presents the design and measurement verification of a multi-fluxon destructive readout cell
using the MITLL SFQ5ee process. The critical design parameters and their simulated margins are also
presented. In addition, two potential applications of the cell are proposed and discussed: a multi-bit random
access memory for achieving higher memory density, and an error-resistant memory for a more robust
design.
134
Figure 3.51: (a) Schematic of the soma; (b) Waveform of the soma circuit operating as a threshold detector.
135
Chapter4
CellCharacterizationPrototype
4.1 BackgroundandPriorWork
The majority of Rapid Single Flux Quantum (RSFQ) cells exhibit synchronous behavior, which is a unique
feature of this technology. This property has led to an increased focus on the study of timing-related
topics in the literature [74]–[78]. In these works, cells are typically modeled using hardware description
languages (HDL) such as VHDL or Verilog, with basic timing parameters such as cell delay, setup time,
and hold time defined. These models are subsequently utilized to analyze system operation and optimize
the timing scheme. To further improve system performance, [74] introduced three finer parameters: soft
setup time, conventional setup time, and hard setup time, which replace the setup time parameter.
Given the significance of timing analysis in RSFQ technology, the ability to measure the real timing of
a cell is highly desirable for two primary reasons: first, to provide data for electronic design automation
(EDA) tools; and second, to serve as a reference and validation for circuit simulations. However, several
challenges must be addressed to achieve this goal. Firstly, the time scale of SFQ behavior is extremely
short, ranging from several to tens of picoseconds. Secondly, non-ideal chip input/output (I/O) interfaces
such as SFQ/DC, DC/SFQ, bonding wire, or probes make it difficult to measure SFQ behavior directly with
such a short time resolution. Thirdly, the conventional on-chip measurement techniques employed in
CMOS technology cannot be easily adapted to SFQ technology due to the lack of essential blocks such as
136
high-speed, high-resolution digital-to-analog converters (DACs) and analog-to-digital converters (ADCs).
Thus, alternative measurement methods are necessary to accurately and validly measure SFQ timing, as
evidenced by various examples from both CMOS and SFQ technologies.
In [79], a measurement system in CMOS 65nm process is demonstrated using an on-chip sampling
oscilloscope consisting of a digital-to-analog converter (DAC) and a comparator to detect if the input
voltage crosses the reference level. This method can measure delay precisely, but it is not feasible for SFQ
technology due to the lack of necessary blocks. To address this limitation, [80], [81] present alternative
methods for measuring time intervals in CMOS and SFQ technology. However, for cell characterization,
smaller intervals are required. In [82], a ring oscillator is formed with and without the device under test
(DUT), and the DUT’s delay is obtained by measuring the difference in oscillating frequency. Similarly, [83]
applies the same approach with SFQ circuits and successfully demonstrates the capability of measuring
cell delay as small as several picoseconds.
This chapter presents a prototype for characterizing cells, which is based on the same principle as
the methods proposed in [82] and [83]. Apart from measuring cell delay, this prototype also allows for
the measurement of setup and hold times. The rest of the paper is organized as follows. Section 4.2
elaborates on the design of a bias-controlled demultiplexer (DEMUX), which serves as a critical component.
Subsequently, section 4.3 describes the cell delay measurement prototype, while section 4.4 and section 4.5
explain the setup time and hold time measurement techniques, respectively.
4.2 DEMUXDesign
In [84], a multiplexer design is presented where the selection function is achieved by decreasing the bias
current of the input branches. In a similar manner, the demultiplexer (DEMUX) used in this study adopts a
bias-controlled structure based on a splitter, as shown in Fig. 4.1. Unlike a typical splitter, the split branches
are biased independently. Specifically, Ia1 and Ia2 share the same power connection, while Ib1 and Ib2 are
137
connected to the same node through their bias resistors. Junctions J5 and J4 are utilized to eliminate the
reverse pulse that may originate from either output port and the input pulse when the subsequent output
branch is turned off. For example, in branch ’a’, when Ia1 and Ib2 are powered up with 2.5mV , the input
pulse triggers junctions J7 and J9, resulting in an output at port ’OUTa’. When the branch is turned off
(grounded), instead of triggering J7, junction J5 cancels the input flux. Thus, a manually controlled DEMUX
design is realized. The cell layout is depicted in Fig. 4.2, and the circuit parameters are presented in Table.
4.1. The simulation results, shown in Fig. 4.3, reveal that as the controlled bias Vbias A/B switches, the
corresponding output branch is turned on and off.
Figure 4.1: Schematic of the DEMUX.
Table 4.1: Circuit parameter list of DEMUX
Circuit parameter Value Margin
L1 1.7pH − 50% 50%
L2 1.5pH − 50% 50%
L3, L4 2.9pH − 50% 50%
L5, L6 3.8pH − 50% 50%
L7, L8 2.6pH − 50% 50%
L9, L10 4.0pH − 50% 50%
J1 108µA − 50% 50%
J2, J3 117µA − 50% 50%
J4, J5 74µA − 50% 50%
J6, J7 112µA − 50% 50%
J8, J9 82µA − 50% 44%
I1 80µA − 50% 50%
Ia1, Ib1 83µA − 50% 44%
Ia2, Ib2 88µA − 29% 50%
138
Figure 4.2: Layout of the DEMUX.
Figure 4.3: Simulation waveform the DEMUX.
4.3 CellDelayMeasurement
The present section describes a delay measurement prototype that utilizes a newly designed DEMUX, as
shown in Fig. 4.1. The prototype can be divided into three parts for clarity: the oscillating loop, the NDRO
or input storage part, and the output frequency divider. The core portion of the prototype is the oscillating
loop, which is divided into two paths controlled by the DEMUX. A starting pulse can be injected through
a merger from the ’START’ port, and the pulse is then guided along the loop path. The splitter ’S1’ taps
139
the looping signal to the output, and the splitter ’S2’ triggers the NDRO bank. The pulse then passes
through the DUT with a propagation delay of its clock-to-Q (C2Q) delay and returns to the starting point
M2, forming the first loop (case 1). Similarly, a second loop (case 2) is formed, in which the pulse uses a
bypass path that is controlled by the DEMUX. The NDRO bank is carefully timed and pre-programmed
to provide the proper input combination to the DUT before the clock signal arrives, allowing the delay to
be measured with different input combinations. A series of cascaded toggle-flip-flops (TFF) is used as a
frequency divider to down convert the oscillation frequency to several MHz so that it can be sent off chip
and measured by lab equipment. The clock-to-Q delay of the DUT can be calculated from the frequency
difference of case 1 and case 2. A layout example is shown in Fig. 4.5.
Figure 4.4: Diagram of the cell delay measurement prototype.
Figure 4.5: Diagram of the cell delay measurement prototype.
To avoid the effect of various mismatches, this design reuses the circuits for both cases as much as pos-
sible. However, two small portions in the loops cannot be reusable, namely the connection from DEMUX
to the first merger ’M1’ and the connection from DEMUX to DUT and DUT to ’M1’. A Monte-Carlo (MC)
140
simulation is performed based on the variety distribution provided by [17]. A DFF is used as the DUT
in the MC simulation, and the circuit parameters, including resistance, inductance, and critical current,
are sampled randomly around the designed value for all system blocks except the DUT. The measuring
procedure is performed 1000 times, and the histogram chart is reported as in Fig. 4.6. It should be noted
that the DFF is connected with passive transmission lines (PTL) so that the delays of attached PTL receiver
and driver are included as DFF delay in this test. The scatter caused by circuit mismatch is relatively small
compared with the cell delay.
Figure 4.6: Histogram of the Monte-Carlo simulation result of the DFF delay using the proposed prototype.
4.4 SetupTimeMeasurement
As illustrated in Figure 4.7, the conventional definition of setup time ( T
2
) refers to the interval between
the input and clock signals at which the cell delay increases by10%. To improve the static timing analysis
(STA) performance, [74] proposes two new setup times: the soft setup time (T
1
) at which the clock-to-Q
delay of the cell begins to increase, and the hard setup time (T
3
) at which the cell fails. Therefore, it is
essential to design a setup measurement structure that can precisely control the input-to-clock interval
141
while monitoring changes in the cell delay. The proposed setup structure, as shown in Figure 4.8, consists
Figure 4.7: Relationship between the cell delay and the input-to-clock time interval∆ t.
of three main parts: the middle part that monitors the clock-to-Q delay and detects the setup event, the
top loop that measures the clock path delay, and the bottom loop that measures the signal path delay. Each
part adopts the same structure as described in the cell delay measurement section. However, unlike Figure
Figure 4.8: Diagram of the setup/hold time measurement prototype.
142
4.4, the middle loop includes tunable delay blocks on both the clock and signal paths. These tunable delay
blocks are constructed from a series of cascaded Josephson transmission lines (JTLs) whose propagation
delay depends on the bias level [67]. The block is divided into several banks with different numbers of JTLs
to enable coarse and fine tuning simultaneously. During testing, the D1, D3, D5, and D6 detour branch is
Figure 4.9: The schematic of tunable delay block using cascaded JTLs.
disabled, as shown in Figure 4.10. The default cell delay is measured when the inputs arrive significantly
earlier than the clock. Then, the tunable delay blocks C (for clock) and S (for signal) are manually adjusted
to bring the input signals closer to the clock while monitoring the cell delay. Once the desired setup time
(conventional, soft, or hard) is reached, the tuned delays are fixed, and the D1 and D3 selection is switched
to form another two measuring loops with D5 and D6, as shown in Figure 4.11. Finally, the signal and clock
delays are measured separately to calculate the setup time as the difference between them. The layout of
setup measurement is shown in Fig. 4.12 and Monte-Carlo simulations are also performed on a DUT of
AND gate as shown in Fig. 4.13
4.5 HoldTimeMeasurement
The hold time is conventionally defined as the clock-to-data time interval when the cell fails to catch the
input from the previous clock cycle. For an SFQ cell, if an input misses the current clock due to the violation
of the hard setup time, it will be captured by the next clock, meeting the threshold for the hold time of
the next cycle. As a result, there is no clear distinction between the hard setup time for the current clock
cycle and the hold time for the next clock cycle, as stated in previous works [75], [77]. While the proposed
setup time measurement scheme theoretically can be used for hold time measurement, a simpler solution is
143
Figure 4.10: Configuration to detect setup event.
Figure 4.11: Configuration to measure the clock path delay and signal path delay.
needed to facilitate measuring the hold time with different input patterns. Therefore, the structure shown
in Fig. 4.8 is adjusted to Fig. 4.14. The loop of the middle part and the frequency divider are removed,
144
Figure 4.12: Histogram of the Monte-Carlo simulation result of the AND gate hold time.
Figure 4.13: Histogram of the Monte-Carlo simulation result of the AND gate setup time.
since only the logic failure needs to be detected rather than the changing of clock-to-Q delay. Meanwhile,
an input pattern generator is added on the input path, providing the prototype with the ability to program
145
Figure 4.14: Diagram of the hold time measurement prototype.
two sequential input patterns. The input pattern generator block is shown in Fig. 4.15 as two DFF banks
driven by the same clock signal with mergers in between. The two patterns are programmed into DFF
banks in parallel and sent out to the DUT serially.
Figure 4.15: Diagram of the input pattern generator block.
4.6 Conclusion
The present section delineates a prototype for cell timing characterization that employs a recently devel-
oped demultiplexer cell. The proposed prototype is suitable for determining the clock-to-Q delay, setup
146
Table 4.2: Summary of the hold time simulation of the AND gate (ps)
P1
P2
00 01 10 11
00 / / / -6.6
01 / / -3.3 -3.3
10 / -3.2 / -3.2
11 / / / /
time, and hold time. Monte-Carlo simulations are conducted on sample devices to confirm the effectiveness
of the design and examine the impact of process variation. The results demonstrate that the prototype is
capable of accurately measuring the relevant timing properties in the presence of reasonable process vari-
ation.
147
Chapter5
FinalRemarksandLookingAhead
This dissertation proposes a methodology to address the challenges encountered in the design of SFQ cir-
cuits and enhance their performance. Specifically, the dissertation presents the development of a normal
Josephson junction-based standard cell library, referred to as qSportLib, using the MITLL SFQ5ee process.
The dissertation outlines the design flow for a cell in qSportLib and describes the abutment connection
strategy. The library cells are introduced, including their circuit schematics, parameter values, simulated
margins, physical layout design, and simulated/measured waveform. The dissertation also presents newly
designed cells, such as complex logic gates and multi-fluxon destructive readout cells. Some of the li-
brary cells are fabricated and confirmed to be functional through measurement. Moreover, the dissertation
demonstrates the efficacy of adding complex logic gates to the cell library through the synthesizing results
of the EDA tool.
In addition, this dissertation presents the design of a standard cell library utilizing2ϕ Josephson junc-
tions that can be triggered by half a flux quantum. This attribute ensures that the dynamic power con-
sumption is reduced by half compared to conventional RSFQ circuits. Moreover, the2ϕ -junction-based cell
obviates the need for inductors, thereby improving scalability and conserving area. The circuit schematics
are illustrated, and the parameter values are enumerated along with their margins. The layouts based on
148
a virtual process are displayed, and simulation verifies the correct functioning of each block. Notably, this
marks the first time that a standard cell library utilizing 2ϕ -junctions has been proposed.
Moreover, the dissertation elucidates the implementation of three RSFQ circuits utilizing the newly
developed qSportLib, which includes an 8-bit multiplier, an all-digital phase-lock-loop (PLL) design, and a
multi-fluxon destructive readout (MFDRO). Firstly, the 8-bit multiplier is executed utilizing a single stage
full adder design with complex gates, SUM, and CARRY. The dissertation demonstrates the multiplier struc-
ture and layout floor plan, and confirms its functional operation at 40GHz through post-layout simulation.
The synthesizing results of array multipliers with various bits are compared between the standard cell
library with and without the SUM and CARRY cell, highlighting the advantages of utilizing complex cells.
Secondly, the all-digital PLL design serves as a clock synthesizing block for the RSFQ system, marking
the first all-digital PLL implemented in RSFQ technology. The PLL generates a 50GHz clock signal from
a 48.82MHz reference clock with a 2.93ps peak-to-peak jitter. Thirdly, the MFDRO is presented, including
its circuit schematic, parameter values, and layout. The cell is fabricated and tested for functionality. The
dissertation also showcases a high-capacity random access memory and an error-resistant random access
memory to demonstrate the application of the MFDRO.
Finally, this dissertation presents a novel cell characterization prototype that measures the clock-to-Q
delay, setup time, and hold time. This proposed design is the first of its kind to include setup time and
hold time measurement capabilities. A manually controlled demultiplexer is designed for the prototype,
and its characterization scheme is validated through Monte-Carlo simulation. Moreover, the dissertation
documents the implementation and fabrication of cell delay measurement and setup time measurement.
For future research, there are several areas of investigation that could be pursued. Firstly, it is necessary
to improve existing EDA (Electronic Design Automation) tools in order to meet the unique requirements
of superconducting circuits. As compared to CMOS circuits, superconductor electronics face a number of
challenges such as larger size, limited drive strength, need for clock signal, path-balance considerations,
149
expensive interconnections, and others. Therefore, the development of more efficient system integration
methods through improved EDA tools is necessary in order to generate feasible results for practical appli-
cations.
Secondly, researchers must improve the yield of SFQ (Single Flux Quantum) or HFQ (Half Flux Quan-
tum) circuits. Currently, superconducting circuits have not demonstrated the same robustness as their
semiconductor counterparts. A more in-depth study and wider experimentation on process variation, flux
trapping, magnetic field isolation, and other factors is required to achieve better performance and yield.
Thirdly, there are opportunities for cross-disciplinary research involving superconducting circuits,
such as their potential use in machine learning, quantum computing, and other advanced applications.
These areas represent promising fields where further research and development could yield significant
advancements in technology.
150
Bibliography
[1] W. Buckel, R. Kleiner, and R. Huebener, Superconductivity: Fundamentals and Applications (Physics
textbook). Wiley, 2004,isbn: 9783527403493.doi: 10.1002/9783527618507.
[2] H. Kamerlingh-Onnes, “Further experiments with liquid helium. c. on the change of electric
resistance of pure metals at very low temperature etc. iv. the resistance of pure mercury at helium
temperatures,” Communications of the Physical Laboratory of the University of Leiden, vol. 124C,
1911.
[3] P. Seidel, Applied Superconductivity: Handbook on Devices and Applications (Encyclopedia of
Applied Physics). Wiley, 2015,isbn: 9783527412099.
[4] F. London and H. London, “The electromagnetic equations of the superconductor,” Proceedings of
the Royal Society of London. Series A - Mathematical and Physical Sciences, vol. 149, no. 866,
pp. 71–88, Mar. 1935,issn: 1471-2946.doi: 10.1098/rspa.1935.0048.
[5] Wikipedia, Meissner effect , 2023. [Online]. Available:
https://en.wikipedia.org/wiki/Meissner_effect.
[6] T. Van Duzer and C. Turner, Principles of Superconductive Devices and Circuits. Elsevier, 1981,isbn:
9780444004116. [Online]. Available: https://books.google.com/books?id=rBpRAAAAMAAJ.
[7] W. Anacker, “Josephson computer technology: An ibm research project,” IBM Journal of Research
and Development, vol. 24, no. 2, pp. 107–112, 1980.doi: 10.1147/rd.242.0107.
[8] K. Likharev and V. Semenov, “Rsfq logic/memory family: A new josephson-junction technology
for sub-terahertz-clock-frequency digital systems,” IEEE Transactions on Applied Superconductivity,
vol. 1, no. 1, pp. 3–28, 1991.doi: 10.1109/77.80745.
[9] O. Mukhanov, V. Semenov, and K. Likharev, “Ultimate performance of the rsfq logic circuits,” IEEE
Transactions on Magnetics, vol. 23, no. 2, pp. 759–762, 1987.doi: 10.1109/TMAG.1987.1064951.
[10] J. Bascom S. Deaver and W. M. Fairbank, “Experimental evidence for quantized flux in
superconducting cylinders,” Phys. Rev. Lett., vol. 7, 12 1961.
[11] R. Doll and M. Näbauer, “Experimental proof of magnetic flux quantization in a superconducting
ring,” Phys. Rev. Lett., vol. 7, 12 1961.
151
[12] X. Nan, “An investigation of superconducting rsfq digital circuit(thesis),” Xidian University, 2004.
[13] S. S. Martinet, “An investigation of a new superconducting logic family: Design and experimental
low-speed testing of its circuits (doctoral dissertation),” University of Rochester, 1995.
[14] H. NUMATA and S. TAHARA, “Fabrication technology for nb integrated circuits,” IEICE
Transactions on Electronics, vol. E84C, 1 2001.
[15] S. Nagasawa, T. Satoh, K. Hinode, Y. Kitagawa, M. Hidaka, H. Akaike, A. Fujimaki, K. Takagi,
N. Takagi, and N. Yoshikawa, “New nb multi-layer fabrication process for large-scale sfq circuits,”
Physica C: Superconductivity, vol. 469, 15-20 2009.
[16] S. NAGASAWA, K. HINODE, T. SATOH, M. HIDAKA, H. AKAIKE, A. FUJIMAKI, N. YOSHIKAWA,
K. TAKAGI, and N. TAKAGI, “Nb 9-layer fabrication process for superconducting large-scale sfq
circuits and its process evaluation,” IEICE Transactions on Electronics, vol. E97.C, no. 3, pp. 132–140,
2014.doi: 10.1587/transele.E97.C.132.
[17] S. K. Tolpygo, V. Bolkhovsky, T. J. Weir, A. Wynn, D. E. Oates, L. M. Johnson, and M. A. Gouker,
“Advanced fabrication processes for superconducting very large-scale integrated circuits,” IEEE
Transactions on Applied Superconductivity, vol. 26, no. 3, pp. 1–10, 2016.doi:
10.1109/TASC.2016.2519388.
[18] V. V. Ryazanov, “Josephson superconductor—ferromagnet—superconductorπ -contact as an
element of a quantum bit (experiment),” Physics–Uspekhi, vol. 42, 1999.doi:
10.1070/PU1999v042n08ABEH000600.
[19] V. V. Ryazanov, V. A. Oboznov, A. Y. Rusanov, A. V. Veretennikov, A. A. Golubov, and J. Aarts,
“Coupling of two superconductors through a ferromagnet: Evidence for aπ junction,” Phys. Rev.
Lett., vol. 86, 2001.doi: 10.1103/PhysRevLett.86.2427.
[20] G. Blatter, V. B. Geshkenbein, and L. B. Ioffe, “Design aspects of superconducting-phase quantum
bits,” Phys. Rev. B, vol. 63, 2001.doi: 10.1103/PhysRevB.63.174511.
[21] R. Caruso, D. Massarotti, A. Miano, V. V. Bolginov, A. B. Hamida, L. N. Karelina, G. Campagnano,
I. V. Vernik, F. Tafuri, V. V. Ryazanov, O. A. Mukhanov, and G. P. Pepe, “Properties of
ferromagnetic josephson junctions for memory applications,” IEEE Transactions on Applied
Superconductivity, vol. 28, no. 7, pp. 1–6, 2018.doi: 10.1109/TASC.2018.2836979.
[22] E. Goldobin, D. Koelle, R. Kleiner, and A. Buzdin, “Josephson junctions with second harmonic in
the current-phase relation: Properties ofϕ junctions,” Phys. Rev. B, 2007.doi:
10.1103/PhysRevB.76.224523.
[23] M. J. A. Stoutimore, A. N. Rossolenko, V. V. Bolginov, V. A. Oboznov, A. Y. Rusanov, D. S. Baranov,
N. Pugach, S. M. Frolov, V. V. Ryazanov, and D. J. V. Harlingen, “Second-harmonic current-phase
relation in josephson junctions with ferromagnetic barriers,” Phys. Rev. Lett., 2018.doi:
10.1103/PhysRevLett.121.177702.
152
[24] I. Salameh, E. G. Friedman, and S. Kvatinsky, “Superconductive logic using2ϕ —josephson
junctions with half flux quantum pulses,” IEEE Transactions on Circuits and Systems II: Express
Briefs, 2022.doi: 10.1109/TCSII.2022.3162723.
[25] R. Kashima, I. Nagaoka, M. Tanaka, T. Yamashita, and A. Fujimaki, “64-ghz datapath demonstration
for bit-parallel sfq microprocessors based on a gate-level-pipeline structure,” IEEE Transactions on
Applied Superconductivity, vol. 31, no. 5, pp. 1–6, 2021.doi: 10.1109/TASC.2021.3061353.
[26] F. Ke, O. Chen, Y. Wang, and N. Yoshikawa, “Demonstration of a 47.8 ghz high-speed fft processor
using single-flux-quantum technology,” IEEE Transactions on Applied Superconductivity, vol. 31,
no. 5, pp. 1–5, 2021.doi: 10.1109/TASC.2021.3059984.
[27] A. Inamdar, J. Ravi, S. Miller, S. S. Meher, M. Eren Çelik, and D. Gupta, “Design of 64-bit arithmetic
logic unit using improved timing characterization methodology for rsfq cell library,” IEEE
Transactions on Applied Superconductivity, vol. 31, no. 5, pp. 1–7, 2021.doi:
10.1109/TASC.2021.3061639.
[28] Q. Zhou, L. Li, Z. Wang, G. Xu, L. Luo, X. Xie, K. Li, and J. Ren, “Design and test of transmission
line in sfq circuit,” in 2021 22nd International Conference on Electronic Packaging Technology
(ICEPT), 2021, pp. 1–4.doi: 10.1109/ICEPT52650.2021.9568131.
[29] C. J. Fourie and K. Jackman, “Experimental verification of moat design and flux trapping analysis,”
IEEE Transactions on Applied Superconductivity, vol. 31, no. 5, pp. 1–7, 2021.doi:
10.1109/TASC.2021.3051582.
[30] B. Zhang and M. Pedram, “Qsta: A static timing analysis tool for superconducting
single-flux-quantum circuits,” IEEE Transactions on Applied Superconductivity, vol. 30, no. 5,
pp. 1–9, 2020.doi: 10.1109/TASC.2020.2970218.
[31] N. Kito, K. Takagi, and N. Takagi, “Logic-depth-aware technology mapping method for rsfq logic
circuits with special rsfq gates,” IEEE Transactions on Applied Superconductivity, vol. 32, no. 4,
pp. 1–5, 2022.doi: 10.1109/TASC.2021.3129719.
[32] E. S. Fang and T. V. Duzer, “A josephson integrated circuit simulator (jsim) for superconductive
electronics application,” Extended Abstracts of 1989 International Superconductivity Electronics
Conference (ISEC’89), 1989.
[33] J. A. Delport, K. Jackman, P. l. Roux, and C. J. Fourie, “Josim—superconductor spice simulator,”
IEEE Transactions on Applied Superconductivity, vol. 29, no. 5, pp. 1–5, 2019.doi:
10.1109/TASC.2019.2897312.
[34] W. R. Incorporated, Ic design software for linux, os x and windows, 2023. [Online]. Available:
http://wrcad.com/.
[35] C. J. Fourie, A. Takahashi, and N. Yoshikawa, “Fast and accurate inductance and coupling
calculation for a multi-layer nb process,” Supercond. Sci. Technol., 2015.doi:
10.1088/0953-2048/28/3/035013.
153
[36] Y. Yamanashi, T. Kainuma, N. Yoshikawa, I. Kataeva, H. Akaike, A. Fujimaki, M. Tanaka, N. Takagi,
S. Nagasawa, and M. Hidaka, “100 ghz demonstrations based on the single-flux-quantum cell
library for the 10kA/cm
2
nb multi-layer process,” IEICE Transactions on Electronics, vol. E93.C,
pp. 440–444, 4 2010.doi: 10.1587/transele.E93.C.440.
[37] C. J. Fourie, C. L. Ayala, L. Schindler, T. Tanaka, and N. Yoshikawa, “Design and characterization of
track routing architecture for rsfq and aqfp circuits in a multilayer process,” IEEE Transactions on
Applied Superconductivity, vol. 30, no. 6, pp. 1–9, 2020.doi: 10.1109/TASC.2020.2988876.
[38] A. Shukla, B. Chonigman, A. Sahu, D. Kirichenko, A. Inamdar, and D. Gupta, “Investigation of
passive transmission lines for the mit-ll sfq5ee process,” IEEE Transactions on Applied
Superconductivity, vol. 29, no. 5, pp. 1–7, 2019.doi: 10.1109/TASC.2019.2898685.
[39] I. I. Soloviev, V. I. Ruzhickiy, S. V. Bakurskiy, N. V. Klenov, M. Y. Kupriyanov, A. A. Golubov,
O. V. Skryabina, and V. S. Stolyarov, “Second-harmonic current-phase relation in josephson
junctions with ferromagnetic barriers,” Phys. Rev. Applied, 2021.doi:
10.1103/PhysRevApplied.16.014052.
[40] D. Hasegawa, Y. Takeshita, F. Li, K. Sano, M. Tanaka, T. Yamashita, and A. Fujimaki,
“Demonstration of interface circuits between half- and single- flux- quantum circuits,” IEEE
Transactions on Applied Superconductivity, vol. 31, no. 5, pp. 1–4, 2021.doi:
10.1109/TASC.2021.3072846.
[41] N. K. Katam and M. Pedram, “Logic optimization, complex cell design, and retiming of single flux
quantum circuits,” IEEE Transactions on Applied Superconductivity, vol. 28, no. 7, pp. 1–9, 2018.doi:
10.1109/TASC.2018.2856833.
[42] T. Filippov, M. Dorojevets, A. Sahu, A. Kirichenko, C. Ayala, and O. Mukhanov, “8-bit
asynchronous wave-pipelined rsfq arithmetic-logic unit,” IEEE Transactions on Applied
Superconductivity, vol. 21, no. 3, pp. 847–851, 2011.doi: 10.1109/TASC.2010.2103918.
[43] X. Peng, Q. Xu, T. Kato, Y. Yamanashi, N. Yoshikawa, A. Fujimaki, N. Takagi, K. Takagi, and
M. Hidaka, “High-speed demonstration of bit-serial floating-point adders and multipliers using
single-flux-quantum circuits,” IEEE Transactions on Applied Superconductivity, vol. 25, no. 3,
pp. 1–6, 2015.doi: 10.1109/TASC.2014.2382973.
[44] M. Dorojevets, A. K. Kasperek, N. Yoshikawa, and A. Fujimaki, “20-ghz 8 x 8-bit parallel carry-save
pipelined rsfq multiplier,” IEEE Transactions on Applied Superconductivity, vol. 23, no. 3,
pp. 1 300 104–1 300 104, 2013.doi: 10.1109/TASC.2012.2227648.
[45] I. Nagaoka, M. Tanaka, K. Inoue, and A. Fujimaki, “29.3 a 48ghz 5.6mw gate-level-pipelined
multiplier using single-flux quantum logic,” in 2019 IEEE International Solid- State Circuits
Conference - (ISSCC), 2019, pp. 460–462.doi: 10.1109/ISSCC.2019.8662351.
[46] K. Takahashi, S. Nagasawa, H. Hasegawa, K. Miyahara, H. Takai, and Y. Enomoto, “Design of a
superconducting alu with a 3-input xor gate,” IEEE Transactions on Applied Superconductivity,
vol. 13, no. 2, pp. 551–554, 2003.doi: 10.1109/TASC.2003.813944.
154
[47] K. Gaj, E. Friedman, M. Feldman, and A. Krasniewski, “A clock distribution scheme for large rsfq
circuits,” IEEE Transactions on Applied Superconductivity, vol. 5, no. 2, pp. 3320–3324, 1995.doi:
10.1109/77.403302.
[48] C. J. Fourie, “Digital superconducting electronics design tools—status and roadmap,” IEEE
Transactions on Applied Superconductivity, vol. 28, no. 5, pp. 1–12, 2018.doi:
10.1109/TASC.2018.2797253.
[49] M. Pedram, “Superconductive single flux quantum logic devices and circuits: Status, challenges,
and opportunities,” in 2020 IEEE International Electron Devices Meeting (IEDM), 2020,
pp. 25.7.1–25.7.4.doi: 10.1109/IEDM13553.2020.9371914.
[50] G. Pasandi and M. Pedram, “A dynamic programming-based, path balancing technology mapping
algorithm targeting area minimization,” in 2019 IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), 2019, pp. 1–8.doi: 10.1109/ICCAD45719.2019.8942053.
[51] S. N. Shahsavani, T.-R. Lin, A. Shafaei, C. J. Fourie, and M. Pedram, “An integrated row-based cell
placement and interconnect synthesis tool for large sfq logic circuits,” IEEE Transactions on Applied
Superconductivity, vol. 27, no. 4, pp. 1–8, 2017.doi: 10.1109/TASC.2017.2675889.
[52] T.-R. Lin, T. Edwards, and M. Pedram, “Qgdr: A via-minimization-oriented routing tool for
large-scale superconductive single-flux-quantum circuits,” IEEE Transactions on Applied
Superconductivity, vol. 29, no. 7, pp. 1–12, 2019.doi: 10.1109/TASC.2019.2915771.
[53] B. Zhang, M. Li, and M. Pedram, “Qssta: A statistical static timing analysis tool for
superconducting single-flux-quantum circuits,” IEEE Transactions on Applied Superconductivity,
vol. 30, no. 7, pp. 1–12, 2020.doi: 10.1109/TASC.2020.3005082.
[54] H. Cong, N. K. Katam, and M. Pedram, “Design of an sfq full adder as a single-stage gate,” in 2019
IEEE International Superconductive Electronics Conference (ISEC), 2019, pp. 1–3.doi:
10.1109/ISEC46533.2019.8990964.
[55] H. Cong, M. Li, and M. Pedram, “An 8-b multiplier using single-stage full adder cell in
single-flux-quantum circuit technology,” IEEE Transactions on Applied Superconductivity, vol. 31,
no. 6, pp. 1–10, 2021.doi: 10.1109/TASC.2021.3091963.
[56] T. Kawaguchi, K. Takagi, and N. Takagi, “Rapid single-flux-quantum logic circuits using clockless
gates,” IEEE Transactions on Applied Superconductivity, vol. 31, no. 4, pp. 1–7, 2021.doi:
10.1109/TASC.2021.3068960.
[57] Z. Deng, N. Yoshikawa, S. Whiteley, and T. Van Duzer, “Data-driven self-timed rsfq digital
integrated circuit and system,” IEEE Transactions on Applied Superconductivity, vol. 7, no. 2,
pp. 3634–3637, 1997.doi: 10.1109/77.622205.
[58] G. Moon, J. Park, S. Lee, S. Lee, and J.-K. Wee, “Delay analysis of self-timing-aligned clock
synchronization technique for superconductive sfq logic circuits,” IEEE Transactions on Applied
Superconductivity, vol. 15, no. 2, pp. 288–291, 2005.doi: 10.1109/TASC.2005.849798.
155
[59] R. N. Tadros and P. A. Beerel, “Optimizing (hc)
2
lc, a robust clock distribution network for sfq
circuits,” IEEE Transactions on Applied Superconductivity, vol. 30, no. 1, pp. 1–11, 2020.doi:
10.1109/TASC.2019.2933366.
[60] G. Pasandi and M. Pedram, “An efficient pipelined architecture for superconducting single flux
quantum logic circuits utilizing dual clocks,” IEEE Transactions on Applied Superconductivity,
vol. 30, no. 2, pp. 1–12, 2020.doi: 10.1109/TASC.2019.2955095.
[61] S. Al-Araji, Z. Hussain, and M. Al-Qutayri, Digital Phase Lock Loops: Architectures and Applications.
Springer US, 2007,isbn: 9780387328645.doi: 10.1007/978-0-387-32864-5.
[62] B. Razavi, RF Microelectronics (Prentice Hall communications engineering and emerging
technologies series). Prentice Hall PTR, 1998,isbn: 9780138875718. [Online]. Available:
10.1007/978-3-319-28376-0_1.
[63] D. Brock and M. Pambianchi, “A 50 ghz monolithic rsfq digital phase locked loop,” in 2000 IEEE
MTT-S International Microwave Symposium Digest (Cat. No.00CH37017), vol. 1, 2000, 353–356 vol.1.
doi: 10.1109/MWSYM.2000.861014.
[64] V. Kaplunenko, “On-chip clock oscillator for high precision rsfq applications,” IEEE Transactions on
Applied Superconductivity, vol. 13, no. 2, pp. 575–578, 2003.doi: 10.1109/TASC.2003.813950.
[65] Y. Yamanashi, I. Okawa, and N. Yoshikawa, “Design approach of dynamically reconfigurable single
flux quantum logic gates,” IEEE Transactions on Applied Superconductivity, vol. 21, no. 3,
pp. 831–834, 2011.doi: 10.1109/TASC.2010.2090856.
[66] S. Nishimoto, Y. Yamanashi, and N. Yoshikawa, “Design method of single-flux-quantum logic
circuits using dynamically reconfigurable logic gates,” IEEE Transactions on Applied
Superconductivity, vol. 25, no. 3, pp. 1–5, 2015.doi: 10.1109/TASC.2014.2387251.
[67] H. Cong and M. Pedram, “All-digital phase-locked loop in single flux quantum circuit technology,”
IEEE Transactions on Applied Superconductivity, vol. 32, no. 3, pp. 1–8, 2022.doi:
10.1109/TASC.2022.3151728.
[68] Y. Hironaka, Y. Yamanashi, and N. Yoshikawa, “Demonstration of a single-flux-quantum
microprocessor operating with josephson-cmos hybrid memory,” IEEE Transactions on Applied
Superconductivity, vol. 30, no. 7, pp. 1–6, 2020.doi: 10.1109/TASC.2020.2994208.
[69] V. K. Semenov, Y. A. Polyakov, and S. K. Tolpygo, “Very large scale integration of
josephson-junction-based superconductor random access memories,” IEEE Transactions on Applied
Superconductivity, vol. 29, no. 5, pp. 1–9, 2019.doi: 10.1109/TASC.2019.2904971.
[70] Y. Takeshita, F. Li, D. Hasegawa, K. Sano, M. Tanaka, T. Yamashita, and A. Fujimaki, “High-speed
memory driven by sfq pulses based on0− π squid,” IEEE Transactions on Applied
Superconductivity, vol. 31, no. 5, pp. 1–6, 2021.doi: 10.1109/TASC.2021.3060351.
[71] H. Zha, N. K. Katam, M. Pedram, and M. Annavaram, “Hiperrf: A dual-bit dense storage sfq
register file,” in 2022 IEEE International Symposium on High-Performance Computer Architecture
(HPCA), 2022, pp. 415–428.doi: 10.1109/HPCA53966.2022.00038.
156
[72] J. Lin, V. Semenov, and K. Likharev, “Design of sfq-counting analog-to-digital converter,” IEEE
TransactionsonAppliedSuperconductivity, vol. 5, no. 2, pp. 2252–2259, 1995.doi: 10.1109/77.403034.
[73] A. Bozbey, M. A. Karamuftuoglu, S. Razmkhah, and M. Ozbayoglu, “Single flux quantum based
ultrahigh speed spiking neuromorphic processor architecture,” Jul. 2020. [Online]. Available:
arXiv:1812.10354%20[cs.ET].
[74] F. Wang, B. Zhang, M. Pedram, and S. Gupta, “Static timing analysis (sta) with timing bleed:
Certifying much higher performance for rapid single flux quantum (rsfq) logic,” J. Phys.: Conf.,
2020.doi: 10.1088/1742-6596/1559/1/012003.
[75] K. Gaj, C.-H. Cheah, E. Friedman, and M. Feldman, “Functional modeling of rsfq circuits using
verilog hdl,” IEEE Transactions on Applied Superconductivity, vol. 7, no. 2, pp. 3151–3154, 1997.doi:
10.1109/77.622000.
[76] K. Gaj, Q. Herr, V. Adler, D. Brock, E. Friedman, and M. Feldman, “Toward a systematic design
methodology for large multigigahertz rapid single flux quantum circuits,” IEEE Transactions on
Applied Superconductivity, vol. 9, no. 3, pp. 4591–4606, 1999.doi: 10.1109/77.791915.
[77] C. J. Fourie, “Extraction of dc-biased sfq circuit verilog models,” IEEE Transactions on Applied
Superconductivity, vol. 28, no. 6, pp. 1–11, 2018.doi: 10.1109/TASC.2018.2829776.
[78] S. Intiso, I. Kataeva, E. Tolkacheva, H. Engseth, K. Platov, and A. Kidiyarova-Shevchenko,
“Time-delay optimization of rsfq cells,” IEEE Transactions on Applied Superconductivity, vol. 15,
no. 2, pp. 328–331, 2005.doi: 10.1109/TASC.2005.849823.
[79] X. Zhang, K. Ishida, H. Fuketa, M. Takamiya, and T. Sakurai, “On-chip measurement system for
within-die delay variation of individual standard cells in 65-nm cmos,” IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 20, no. 10, pp. 1876–1880, 2012.doi:
10.1109/TVLSI.2011.2162257.
[80] Y. Yanagawa, K. Hirose, H. Saito, D. Kobayashi, S. Fukuda, S. Ishii, D. Takahashi, K. Yamamoto, and
Y. Kuroda, “Direct measurement of set pulse widths in 0.2-µ m soi logic cells irradiated by heavy
ions,” IEEE Transactions on Nuclear Science, vol. 53, no. 6, pp. 3575–3578, 2006.doi:
10.1109/TNS.2006.885110.
[81] P. Rosenthal, “Sub-picosecond measurement of time intervals using single flux quantum
electronics,” IEEE Transactions on Applied Superconductivity, vol. 3, no. 1, pp. 2645–2648, 1993.doi:
10.1109/77.233971.
[82] B. Prasad Das, B. Amrutur, H. Jamadagni, N. Arvind, and V. Visvanathan, “Within-die gate delay
variability measurement using re-configurable ring oscillator,” in 2008 IEEE Custom Integrated
Circuits Conference, 2008, pp. 133–136.doi: 10.1109/CICC.2008.4672039.
[83] J. Ravi, S. S. Meher, A. Sahu, B. Chonigman, and A. Inamdar, “Differential propagation delay
measurement of rsfq library cells using ring oscillators,” IEEE Transactions on Applied
Superconductivity, vol. 32, no. 4, pp. 1–7, 2022.doi: 10.1109/TASC.2022.3149233.
157
[84] G. Krylov and E. G. Friedman, “Design for testability of sfq circuits,” IEEE Transactions on Applied
Superconductivity, vol. 27, no. 8, pp. 1–7, 2017.doi: 10.1109/TASC.2017.2759239.
158
Abstract (if available)
Abstract
As semiconductor technology scales down to sub-nanometer levels, maintaining Moore's law becomes increasingly challenging. Superconducting electronics (SCE) logic families, such as Single-Flux-Quantum (SFQ) that utilize Josephson junctions, present a viable solution for developing a faster and more energy-efficient next-generation computer. To address the challenges, the dissertation introduces a standard cell library using RSFQ technolgoy, a standard cell library using 2phi-junction, an 8 bit-multiplier design in RSFQ, an all digital phase-lock-loop design in RSFQ, ransom access memory using multi-fluxon destructive readout cell and a cell characterization prototype for RSFQ cells.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Advanced cell design and reconfigurable circuits for single flux quantum technology
PDF
Verification and testing of rapid single-flux-quantum (RSFQ) circuit for certifying logical correctness and performance
PDF
Electronic design automation algorithms for physical design and optimization of single flux quantum logic circuits
PDF
Towards efficient edge intelligence with in-sensor and neuromorphic computing: algorithm-hardware co-design
PDF
Multi-phase clocking and hold time fixing for single flux quantum circuits
PDF
Energy-efficient computing: Datacenters, mobile devices, and mobile clouds
PDF
Development of electronic design automation tools for large-scale single flux quantum circuits
PDF
Energy-efficient shutdown of circuit components and computing systems
PDF
An asynchronous resilient circuit template and automated design flow
PDF
SLA-based, energy-efficient resource management in cloud computing systems
PDF
Algorithms and frameworks for generating neural network models addressing energy-efficiency, robustness, and privacy
PDF
Average-case performance analysis and optimization of conditional asynchronous circuits
PDF
Power-efficient biomimetic neural circuits
PDF
Integration of energy-efficient infrastructures and policies in smart grid
PDF
Efficient machine learning techniques for low- and high-dimensional data sources
PDF
Memristive device and architecture for analog computing with high precision and programmability
PDF
Compiler and runtime support for hybrid arithmetic and logic processing of neural networks
PDF
Multi-level and energy-aware resource consolidation in a virtualized cloud computing system
PDF
Design of modular multiplication
PDF
Graph machine learning for hardware security and security of graph machine learning: attacks and defenses
Asset Metadata
Creator
Cong, Haolin
(author)
Core Title
High performance and ultra energy efficient computing using superconductor electronics
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2023-05
Publication Date
04/27/2023
Defense Date
03/23/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
complex gate,OAI-PMH Harvest,RSFQ,standard cell library,superconducting circuits
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Pedram, Massoud (
committee chair
), Beerel, Peter (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
conghaolin@gmail.com,haolinco@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113088951
Unique identifier
UC113088951
Identifier
etd-CongHaolin-11729.pdf (filename)
Legacy Identifier
etd-CongHaolin-11729
Document Type
Dissertation
Format
theses (aat)
Rights
Cong, Haolin
Internet Media Type
application/pdf
Type
texts
Source
20230427-usctheses-batch-1032
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
complex gate
RSFQ
standard cell library
superconducting circuits