Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Verification and testing of rapid single-flux-quantum (RSFQ) circuit for certifying logical correctness and performance
(USC Thesis Other)
Verification and testing of rapid single-flux-quantum (RSFQ) circuit for certifying logical correctness and performance
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Verication and Testing of Rapid Single-Flux-Quantum (RSFQ) Circuit for
Certifying Logical Correctness and Performance
by
Fangzhou Wang
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Electrical Engineering)
May 2020
Copyright 2020 Fangzhou Wang
Dedication
I dedicate this dissertation
To my parents, Shuang Yu and Ping Wang
as well as my parents-in-law Chunmei Li and Yudong Li
and especially to my dearest wife, Shuo Li.
ii
Acknowledgements
My doctoral journey at University of Southern California is one of the most memorable experiences
in my life. It would not have been possible for me to complete this journey without the support
from my advisor, mentors, colleagues, friends, and family members.
First and foremost, I would like to thank my advisor Professor Sandeep Gupta. I have learned
a whole new way of thinking and problem solving from him during the past ve years. Everything
that I know about scientic research comes from him. Through his wisdom, enthusiasm, and
encouragement, I learned how to identify an impactful research direction, how to properly dene
a problem, how to shape and clarify the objective, how to identify, justify, and appreciate the
novelties of the contribution, how to think about the problem in a much broader context and
always remember the big picture, how to carefully proof every detail of the claims, how to present
the research in a clear and structural way to highlight the
ow of key ideas and results, and lots
more. He supported me in every possible way including carefully reviewing every single line of
each and every one of my drafts, attending every conference talk of mine no matter when it is
and where it is. There is nothing more benecial than those insightful research discussions with
him. If there has to more, then they must be his intriguing life stories. Those stories help me
become a better researcher, more importantly, they could keep inspiring me in my future career
and life. Professor Gupta was, is and will always be an inspiration and a role model to me. I will
keep ghting on and loving what I am doing.
Next, I would like to thank other members in my committee: Professor Peter Beerel, Professor
Massoud Pedram, and Professor Aiichiro Nakano for their insightful feedback on my work. More
iii
specically, I would like to thank Professor Peter Beerel for the discussions about my proofs
regarding test generation, Professor Massoud Pedram for his suggestions about timing analysis,
and Professor Aiichiro Nakano for his inputs from the physics about my research topics.
I would also like to show gratitude to my colleagues in the ColdFlux team: Dr. Naveen K.
Katam, Dr. Ramy N. Tadros, Ting-Ru Lin, Bo Zhang, Haipeng Zha, Soheil Shahsavani, Ghasem
Pasandi, and Arash Fayyazi. Special gratitude to Naveen for introducing me to the eld of
superconducting electronics and making my defense schedule possible. Very special thanks to
Ting-Ru Lin and Bo Zhang for their codes and suggestions.
Sincere thanks to Diane Demetras, in our ECE department, she is the mother leader who helped
us turn on the light and wipe the
oor so that I would focus on my research. Special thanks to
Ted Low and Annie Hua for helping me work as a research assistant and to Ryan Pineda for
securing my master degrees. Also thanks to the rest of the sta: In particular, Mayumi Thrasher,
Shane Goodo, Estela Lopez, David Ho, and Cathy Huang. Memorable thanks to Tim Boston,
may his soul rest in peace.
During my Ph.D. study at USC, I am very proud to be the teaching assistant for Professor
Gandhi Puvvada for almost all the semesters. He kept improving his courses over the past thirty
years, and working day and night to develop new materials for students. He has shown me the
true meaning of dedication and striving for perfection. He is the most responsible and dedicated
instructor that I have ever met during student life in the past my twenty years. Sincere thanks
to Professor Puvvada for being a role model to me.
My sincere thanks also go out to Barry Griner. Even though I have been learning English for
a long, when I rst came to the U.S., I can barley communicate. Any that time, during meetings
with my advisor, there are lots of ideas in my mind but I just could not articulate them. When
I realized how hard it would be to become successful in Ph.D. study if I could not explain my
thoughts in a clear manner. I turned to Barry for help. Besides taking his advance pronunciation
course again and again, he help me improving my oral English from every aspect. He patiently
iv
guided me to ne-tune my pronunciation of every single vowel and consonant, practice speaking
with proper stress and intonation, and understand the culture behind the language itself. Without
his help, I would not have been able to present my research so clearly and condently.
I would also like to thank my lab mates and friends at USC: Dr. Da Cheng, Dr. Byeongju Cha,
Dr. Jizhe Zhang, Dr. Xuan Zuo, Jianwei Zhang, Soowang Park, and many others. My special
thanks go to Jizhe Zhang, who has been the best friend of mine at USC. May our friendship be
everlasting.
Heartfelt thanks to my parents, Shuang Yu and Ping Wang. I feel so grateful and blessed to
be their son. They have oered me the strongest support during my entire life and have not let
a single second pass without loving me unconditionally.
Last but certainly not least, the best outcome from this long journey is nding my soul-mate,
Shuo Li, whom I would like to spend my life with. Our love story began on USC campus in a
lovely morning. Ever since then, she has been a great supporter and the only person who can truly
appreciate my eorts, love, and sense of humor. My doctoral journey, especially the latter half,
has not been an easy ride. I am extremely grateful that she was always sticking by my side and
supported me in every way that a person can be supported. Our commitment and determination
to each other are getting strengthened each day and we are fully prepared to live our lives to the
fullest.
The research is based upon work supported by the Oce of the Director of National Intel-
ligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the U.S. Army
Research Oce grant W911NF-17-1-0120. The views and conclusions contained herein are those
of the authors and should not be interpreted as necessarily representing the ocial policies or
endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The
U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes
notwithstanding any copyright notation herein.
v
Table of Contents
Dedication ii
Acknowledgements iii
List Of Tables ix
List Of Figures x
Abstract xiv
Chapter 1: Introduction 1
1.1 Superconducting electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Design automation for RSFQ technology . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2: RSFQ Background 6
2.1 Operation of RSFQ cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Quantized pulse-based operation . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Transferring an SFQ pulse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Storing an SFQ pulse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Cell library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.5 Pulse-based logic operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5.1 DFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5.2 INV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5.3 AND2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5.4 OR2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.5.5 XOR2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Simulator and simulation environment . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Simulation environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 New properties of RSFQ relative to CMOS . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Structure and parameter of a RSFQ cell . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Bi-directionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.3 Single-pattern delay excitation and propagation . . . . . . . . . . . . . . . . 15
Chapter 3: Multi-Cell Characterization 17
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Single-cell characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Cell library under study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
vi
3.2.2 Single-cell characterization approaches . . . . . . . . . . . . . . . . . . . . . 20
3.3 Proposed multi-cell characterization approach . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Key ideas and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.3 Conguration based multi-cell characterization . . . . . . . . . . . . . . . . 24
3.3.4 Identifying root causes of failures . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.4.1 JJ is triggered at a wrong time . . . . . . . . . . . . . . . . . . . . 30
3.3.4.2 A wrong JJ is triggered . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.5 Renement of the cell library . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.6 In-situ multi-cell characterization . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 4: Cell characterization under non-idealities 43
4.1 Eects of non-idealities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.1.1 Biasing current redistribution . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.1.2 Process variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.3 Inductive coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Cell characterization under process variations . . . . . . . . . . . . . . . . . . . . . 46
4.2.1 Cell characterization under normal input conditions . . . . . . . . . . . . . 47
4.2.2 Timing bleed: Cell characterization under additional delay at cell's input(s) 52
4.2.2.1 The eect of additional delay at inputs . . . . . . . . . . . . . . . 53
4.2.2.2 Propagation of timing bleed . . . . . . . . . . . . . . . . . . . . . 57
4.2.2.3 Isolation of timing bleed . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.3 Summary of the new phenomena . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 5: Static Timing Analysis with Timing bleed 60
5.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Key Ideas and Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.1 Circuit model and divide and conquer . . . . . . . . . . . . . . . . . . . . . 63
5.2.2 Unique characteristics of RSFQ delays . . . . . . . . . . . . . . . . . . . . . 64
5.3 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 Cell timing characterization with timing bleed . . . . . . . . . . . . . . . . 66
5.3.2 Timing library extraction for our STA approach . . . . . . . . . . . . . . . 69
5.3.3 Static timing analysis with timing bleed . . . . . . . . . . . . . . . . . . . . 71
5.3.3.1 Levelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3.3.2 Timing analysis with a given clock period . . . . . . . . . . . . . . 72
5.3.3.3 Identifying minimum clock period . . . . . . . . . . . . . . . . . . 75
5.4 Proofs of Correctness and Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4.1 Correctness of our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.2 Optimality of our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Chapter 6: Timing Independent ATPG 91
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.2 Proposed ATPG paradigm for RSFQ logic . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.1 Circuit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.2 Target path selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.3 Delay excitation conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2.4 Delay sensitization conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 98
vii
6.2.5 Logic error propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2.6 ATPG paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.6.1 Uncovered path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2.6.2 Untestable path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3 Theoretical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.1 Robustness of our test patterns . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.2 Proof of the robustness of our test pattern . . . . . . . . . . . . . . . . . . . 106
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.4.1 Coverage of our test sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.4.2 Eectiveness of our test sets - invoking maximum delays . . . . . . . . . . . 113
6.4.3 Eciency of our test sets - numbers of patterns . . . . . . . . . . . . . . . . 118
6.5 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Reference List 123
viii
List Of Tables
2.1 Propagation conditions of logic-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Multi-cell characterization result before renement . . . . . . . . . . . . . . . . . . 28
3.2 Result of dierent renements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Renements identied by our multi-cell characterization approach . . . . . . . . . 38
3.4 Multi-cell characterization result after renement . . . . . . . . . . . . . . . . . . . 40
3.5 Simulation results of arithmetic benchmark circuits using exhaustive set of patterns 41
3.6 Simulation results of arithmetic benchmark circuits using in-situ ATPG patterns . 41
4.1 Summary of experiments on process variations and SFQ-specic non-idealities . . . 46
4.2 Process variation values for MITLL 100 process . . . . . . . . . . . . . . . . . . . . 48
5.1 Clock period for benchmark circuits under dierent timing bleed conditions . . . . 88
6.1 Delay excitation and sensitization conditions [1] . . . . . . . . . . . . . . . . . . . . 99
6.2 Delay sensitization conditions and logic error propagation conditions . . . . . . . . 101
6.3 Number of test patterns and its coverage for benchmark circuits . . . . . . . . . . 113
6.4 Patterns generated for a Full-Adder . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.5 Minimum clock period for each input pattern for one particular Monte Carlo in-
stance a Full-Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.6 Number of patterns for benchmark circuits . . . . . . . . . . . . . . . . . . . . . . 122
ix
List Of Figures
2.1 (a) The schematic of a JTL in RSFQ [2]. (b) Simulation results of a JTL in RSFQ. 8
2.2 (a) Schematic of a DFF in RSFQ [2]. (b) Simulation result of a DFF in RSFQ. (c)
State diagram of a DFF in RSFQ [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 (a) Schematic of an AND2 in RSFQ [2]. (b) State diagram of an AND2 in RSFQ [3]. 11
2.4 (a) Schematic of OR2 cell [2]. (b) State diagram of OR2 cell [3]. . . . . . . . . . . 12
2.5 Simulation environment used in our research [3]. . . . . . . . . . . . . . . . . . . . 13
2.6 (a) The schematic of a JTL in RSFQ [2]. The schematic of an AND2 in RSFQ [2].
(c) The schematic of a uni-directional buer in RSFQ [2]. . . . . . . . . . . . . . . 14
2.7 (a) Symbol of an OR2 cell. (b) Simulation results for input pattern (01). (c)
Simulation results for input pattern (00). . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Simulation environment of the cell characterization process [3]. . . . . . . . . . . . 20
3.2 Mealy state diagram of two RSFQ AND cell designs. (The FSM diagrams are taken
from [4]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Margins for each parameter of AND2. (The gure is taken from reference [5]). . . . 22
3.4 The schematic of a 3-input OR circuit. . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Conguration of circuit under test. (a) Without interconnect. (b) With intercon-
nect (splitter). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 An overview of our automated verication process. . . . . . . . . . . . . . . . . . . 27
3.7 For AND2 cell: (a) single-cell characterization setup. (b) multi-cell characterization
setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 Simulation results of AND2 cell for single-cell characterization case. . . . . . . . . 32
3.9 Simulation results of AND2 cell for multi-cell characterization case. . . . . . . . . . 33
x
3.10 Simulation results of AND2 cell for (a) Single-cell characterization case. (b) Multi-
cell characterization case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.11 Schematic of AND2 cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.12 A multi-cell characterization setup for OR2 cell. . . . . . . . . . . . . . . . . . . . . 35
3.13 Simulation waveform of OR2 cell which fails in multi-cell characterization. . . . . . 36
3.14 Simulation waveform of OR2 cell which passes in multi-cell characterization. . . . . 37
3.15 In-situ multi-cell characterization of a 1-bit full adder. . . . . . . . . . . . . . . . . 39
4.1 The biasing condition of a JTL which is connected to a DFF. (a) The internal state
of the DFF is logic-0. (b) The internal state of the DFF is logic-1. . . . . . . . . . 44
4.2 (a) Simulation result of a normal JTL. (b) Simulation result of a JTL with a large
input inductance due to process variation. . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Timing diagram of a DFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Distributions of clock-to-Q delay values for DFF cell under process variations on
the values of inductors and junction areas. Input pattern 1. . . . . . . . . . . . . . 49
4.5 Distributions of clock-to-Q delay values for AND2 cell under process variations on
the values of inductors and junction areas. Input pattern 11. . . . . . . . . . . . . 49
4.6 Distributions of clock-to-Q delay values for OR2 cell under process variations on
the values of inductors and junction areas. (a) Input pattern 01. (b) Input pattern
10. (c) Input pattern 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.7 Distributions of clock-to-Q delay values for XOR2 cell under process variations on
the values of inductors and junction areas. (a) Input pattern 01. (b) Input pattern
10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.8 Distributions of clock-to-Q delay values for INV cell under process variations on
the values of inductors and junction areas. Input pattern 0. . . . . . . . . . . . . . 50
4.9 An experimental conguration used to study the eect of additional delay caused
by process variations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.10 The probability of timing failure increases as DC decreases. . . . . . . . . . . . . 51
4.11 The probability of timing failure for pipelines with dierent numbers of stages. . . 52
4.12 Amount of clock-to-Q delay under dierent input delay conditions for dierent cells. 53
4.13 Amount of clock-to-Q delay under dierent input delay conditions for DFF cell. . . 54
4.14 An experimental conguration used to measure the performance overhead of con-
ventional setup time constraint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
xi
4.15 The performance overhead of conventional setup time constraint for pipelines with
dierent numbers of levels of combinational logic cells between consecutive
ip-
ops. 56
4.16 (a) Timing bleed via non-inverting cells. (b) Timing bleed via non-inverting cells
is masked by o-path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.17 (a) Timing bleed blocked at inverting cells. (b) Logic error at inverting cells . . . . 58
5.1 The schematic of a RSFQ Full-Adder. . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Amount of clock-to-Q delay of DFF under dierent input delay conditions. . . . . 66
5.3 Cell timing characterization results for an instance of OR2 cell. . . . . . . . . . . . 68
5.4 Timing characteristics of an instance of DFF cell. . . . . . . . . . . . . . . . . . . . 71
5.5 Timing diagram of a two consecutive pipeline stages. . . . . . . . . . . . . . . . . . 77
5.6 Distribution of minimum clock periods for dierent benchmark circuits. . . . . . . 89
5.7 Distribution of minimum clock periods for dierent benchmark circuits. . . . . . . 89
5.8 Distribution of minimum clock periods for dierent benchmark circuits. . . . . . . 90
5.9 Distribution of minimum clock periods for dierent benchmark circuits. . . . . . . 90
6.1 (a) Timing bleed via non-inverting cells. (b) Timing bleed via non-inverting cells
is masked by o-path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2 Full-Adder as a running example [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3 Timing diagram of a two-level pipeline stage. . . . . . . . . . . . . . . . . . . . . . 110
6.4 Timing diagram of a pipeline stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.5 Minimum clock period for each input pattern for a Full-Adder with 100% coverage
and 0% untestable ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.6 Minimum clock period for each input pattern for a 4-bit KSA with 100% coverage
and 0% untestable ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.7 Minimum clock period for each input pattern for a 4-bit multiplier with high
untestable ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.8 Minimum clock period for each input pattern for a 4-bit integer divider with high
untestable ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.9 Minimum clock period for each input pattern for a Full-Adder with 100% coverage
and 0% untestable ratio using compressed patterns. . . . . . . . . . . . . . . . . . . 119
xii
6.10 Minimum clock period for each input pattern for a 4-bit KSA with 100% coverage
and 0% untestable ratio using compressed patterns. . . . . . . . . . . . . . . . . . . 120
6.11 Minimum clock period for each input pattern for a 4-bit multiplier with high
untestable ratio using compressed patterns. . . . . . . . . . . . . . . . . . . . . . . 121
6.12 Minimum clock period for each input pattern for a 4-bit integer divider with high
untestable ratio using compressed patterns. . . . . . . . . . . . . . . . . . . . . . . 122
xiii
Abstract
Rapid Single Flux Quantum (RSFQ) logic, based on Josephson Junctions (JJs), is seeing a resur-
gence as a way for providing high performance in the era beyond the end of physical scaling of
CMOS.
However, new characteristics of RSFQ technology necessitate development of new paradigms,
models, and methods for characterization, verication, and testing essential for harnessing the
benets of RSFQ.
In the rst part of our research, we present a new method for characterization of RSFQ cells to
expose a much larger set of vulnerabilities, a systematic approach for identifying the root causes
of these vulnerabilities to guide the renement of designs of cells, and a new way to extend test
generation approaches to perform design verication at the circuit level. We demonstrate that
our new methods and tools expose a large number of vulnerabilities and help identify root causes
leading to rened cell designs which completely eliminate these vulnerabilities. Finally, we verify
that our rened cells can indeed be composed to create error-free circuits.
In the second part of our research, we systematically study the eect of process variations
and other RSFQ-specic non-idealities. We show that because of the nature of its quantized
pulse-based operation, even highly-distorted pulses are interpreted logically correctly by cells, but
the timing is aected. Therefore, timing verication and delay testing increase in importance in
RSFQ. We also show that due to the gate-level pipelined nature of RSFQ circuits, imposing a
guard-band to resolve the setup time issue will reduce the performance dramatically and lose some
of its performance, a key benet of RSFQ. More importantly, inserting scan logic at every pipelined
xiv
gate in RSFQ will cause astronomical area overheads. Therefore, increased clock-to-Q delay (i.e.,
timing bleed) must be allowed through multi-cycle paths. We develop a new static timing analysis
method that allows larger increases in clock-to-Q delay, i.e., timing bleed, whenever the data input
arrives late. We present results of simulations for benchmark circuits with process variations to
demonstrate that our new method certies much higher speeds for RSFQ logic.
Furthermore, we develop a completely new paradigm for automatic test pattern generation
(ATPG) to address these new phenomena in RSFQ technology to ensure that designs and fab-
ricated chips provide desired performance. We identify delay excitation conditions, sensitization
conditions, and conditions for propagation of the logic errors caused by timing violations due to
process variations. Our ATPG utilizes these new phenomena to select multi-cycle paths as targets
and to generate test patterns that are guaranteed to excite the worst-case delay along each target
multi-cycle path. Finally, we present theoretical proofs and Monte Carlo simulation results for
benchmark circuits under process variations to demonstrate that the patterns generated by our
new ATPG are eective (invoke maximum delays of target multi-cycle paths) and ecient (require
small numbers of patterns).
xv
Chapter 1
Introduction
As the improvements due to CMOS technology scaling are slowing down, technologies such as su-
perconducting electronics (SCE) based on Josephson junctions (JJs) oer a promising alternative
to satisfy the demand of low power and high-performance computing for many applications.
1.1 Superconducting electronics
The principles of superconductive devices such as Josephson junctions (JJs) were discovered and
modeled decades ago [6]. Several logic families, including RSFQ, have been developed based
on Josephson junctions to harness the performance and power benets of this technology [2].
For example, RSFQ circuit has been demonstrated to operate in excess of 100GHz with energy
consumption of 1aJ per gate [7]. A simple RSFQ logic test circuit manufactured with feature
size of 1.5m runs at clock speeds up to 370 GHz [8]. Even though cryogenic operation of RSFQ
circuits incurs cooling overhead, high energy eciency (in terms of performance per unit power)
can still be achieved via large-scale integration and dense packaging [9]. Hence, this technology
oers a promising alternative to satisfy the demand of low-power and high-performance computing,
especially for data centers. As the improvements due to CMOS technology scaling are slowing
down, the interest in JJs and RSFQ has grown in communities that need higher performance
and higher energy eciency. Two leading fabs namely Hypres [10] and MIT Lincoln Laboratory
1
Supercomputing Center [11] are fabricating such chips. There is also a viable community who are
not only developing technologies and architectures but also implementing computer-aided design
(CAD) tools to make this technology viable and go beyond the scope of small applications in which
they have been shown very successful [12] [13] [14]. Several programs such as Cryogenic Computing
Complexity (C3) [15], Electronic Design Automation tools for Superconducting Electronics (EDA
for SCE) [16], and SuperTools [17] are funded by the Intelligence Advanced Research Projects
Activity (IARPA). Hence, beyond Moore's Law, SCE, especially RSFQ logic, serves as a high-
performance and highly energy ecient alternative to conventional CMOS technology, especially
for large data centers.
1.2 Design automation for RSFQ technology
With the recognition that design automation methods and tools are essential for RSFQ logic to
become a realistic option for realizing large-scale, high-performance, and energy-ecient comput-
ing systems of the future [18], electronic design automation (EDA) tools have been the subject of
some research [19] [20]. However, compared to the semiconductor industry where sophisticated
design tools are developed to enable the design of very-large-scale integrated circuits, the state of
RSFQ design tools lags far behind.
Since we already have so many advanced tools in CMOS, the initial instinct is to make minimal
modications to these tools so that they can be applied to RSFQ designs. However, due to the
fundamental dierences between CMOS and RSFQ, several unique challenges of RSFQ require
major changes to the existing tools for CMOS or even completely new methods. Even for those
tools for CMOS which can be applied to RSFQ with small modications, the early designs for the
new RSFQ technology are highly likely to be heavily in
uenced by MOS technology which may
not provide optimal designs.
2
Thus, it becomes imperative to develop new methods and approaches for tackling the challenges
of RSFQ and harnessing the full potential of RSFQ that can go beyond the barrier placed by the
existing CMOS approaches.
1.3 Thesis Contribution
In this thesis, we systematically study the unique features of RSFQ technology, identify the
associated new phenomena, and develop methods and tools for characterization, verication, and
testing to fully harness the benets of RSFQ.
First, we present a new method for characterization of RSFQ cells to expose a much larger set
of vulnerabilities, a systematic approach for identifying the root causes of these vulnerabilities to
guide the renement of cell designs, and a new method to extend test generation approaches to
perform design validation at the circuit level. We demonstrate that our new methods and tools
expose a large number of vulnerabilities in a cell library and help identify root causes leading
to rened cell designs which almost completely eliminate these vulnerabilities. We then describe
our extensions of ATPG for circuit level verication and use it to verify that our rened cells can
indeed be composed to create error-free circuits.
Second, we use extensive simulations for a range of types, instances, severities of variations
and non-idealities (e.g., process variations, coupling via mutual inductance, etc.) to systemat-
ically analyze the behavior of RSFQ circuits under high-levels of variations and non-idealities.
Analysis of simulation results shows that in RSFQ timing verication and delay testing increase
in importance. Hence, with a specic focus on timing verication, we systematically study the
behavior of each logic cell under dierent input timing conditions as well as the eect of delayed
pulses at cell inputs in logic circuits, where cells interact with each other. We show that the
additional delay caused by a non-ideality (e.g., process variation) at single cell is typically not
3
large enough to cause an immediate logic error at a particular cell. However, the accumulated
eect of non-idealities in cells is not small enough to be eectively guard-banded.
Third, we propose a new static timing analysis (STA) approach which relaxes the conventional
setup time constraint and performs timing analysis along multi-cycle paths to eciently tackle
the timing challenges imposed by process variations. The simulation results for an extesnive set of
benchmark circuits show that our proposed approach greatly improves the performance of circuits
while guaranteeing correctness.
Fourth, we propose new methods and tools for timing verication and delay testing of fully
path balanced RSFQ logic circuits that use concurrent-
ow clocking scheme. The radically new
phenomena in RSFQ technology, especially the existence of single-pattern delay tests and the need
to propagate delayed values via multiple pipeline stages are addressed. We identify delay excitation
conditions, sensitization conditions, and conditions for propagation of the logic errors caused by
timing violations due to process variations. We then propose a completely new paradigm for
automatic test pattern generation (ATPG) which utilizes the new phenomena of RSFQ to select
multi-cycle paths as targets and to generate test patterns that are guaranteed to excite the worst-
case delay along each target multi-cycle path. Monte Carlo simulation results for benchmark
circuits under process variations demonstrate that the patterns generated by our new ATPG are
eective (invoke maximum delays of target multi-cycle paths) and ecient (require small numbers
of patterns).
1.4 Thesis Organization
In Chapter 2, we describe the basic simulation setup, operation of RSFQ cells at a high level to
present some of the key characteristics of this technology and its dierences from CMOS. Then
the proposed multi-cell characterization approach is described in Chapter 3. In Chapter 4, we
characterize RSFQ cells in the presence of variations and identify radically new phenomena in
4
RSFQ technology. Our proposed new STA approach is presented in Chapter 5 and the proposed
ATPG paradigm is presented in Chapter 6.
5
Chapter 2
RSFQ Background
In this chapter, we provide a short summary of [2] to introduce the operation of RSFQ cells at
a high level to present some of the key characteristics of this technology. Schematics of cells
used in this section are redrawn based on [2] and state diagrams of cells are generated by a cell
characterization tool named TimEx [3].
2.1 Operation of RSFQ cells
RSFQ circuits are designed to operate at cryogenic temperatures (4K) with extremely high clock
frequency and very low voltage. In this section, the operation of RSFQ cells are describe using cell
schematics, simulation results, and a state diagram (where necessary). Based on the cell schematic,
we describe the scenario qualitatively (i.e., the initial state of the cell, the input pattern applied),
and explain the operation (i.e., sequence of events that occur) as well as simulation results, and
nally capture the operation in a state transition diagram. A complete description of the operation
of the cell can be found in [2], here we only describe the basic operation of some RSFQ cells with
a specic focus on particular problems that may aect the operation of RSFQ cells.
6
2.1.1 Quantized pulse-based operation
Unlike in CMOS, the logic representation in RSFQ is not level based. Here logic-1 is represented
by the presence of a single-
ux-quantum (SFQ) pulse, which is a voltage pulse with a quantized
area
0
=
h
2e
= 2:07mVps [2]. Logic-0 is represented by the absence of an SFQ pulse. Application
of an SFQ pulse to a properly biased Josephson Junction (JJ), induces a quantized phase leap
of 2. Then, according to the phase-to-voltage relation, this 2-leap triggers another SFQ pulse.
Logic operations are then realized by transferring, storing, and reproducing SFQ pulses.
2.1.2 Transferring an SFQ pulse
A Josephson Transmission Line (JTL) is a fundamental element in RSFQ logic. As shown in
Figure 2.1(a), junctionJ
1
whose critical current isI
c1
is biased using a biasing currentI
b1
, where
I
b1
is slightly smaller than I
c1
. The value of the input inductance L
1
is carefully chosen so that
it is neither too small to cause an input SFQ to spread across multiple junctions nor too large to
trap the input
ux in the inductance loop [2]. The other junction, namely J
2
, is set in a similar
fashion. As shown in Figure 2.1(b), when an input pulse arrives at point A, it induces a 2-leap
for J
1
which is biased close to its critical current. As a result of the phase leap, another pulse is
generated across J
1
. The newly generated pulse is then served as the input pulse to the second
stage and triggers J
2
to leap. Therefore, the input pulse at point A can be transferred via the
JTL comprised of J
1
and J
2
to point B.
The operation of JTL re
ects an important consideration while designing RSFQ cells. The
function of a RSFQ cell not only depends on structure of the cell but also depends
on its parameter values. For instance, in the JTL, if the critical current of J
1
is too large
while the biasing current of J
1
is too low, then the input pulse at point A would not be able to
trigger J
1
, as a result, the JTL would fail to transfer the pulse from input to output.
7
Figure 2.1: (a) The schematic of a JTL in RSFQ [2]. (b) Simulation results of a JTL in RSFQ.
2.1.3 Storing an SFQ pulse
Figure 2.2(a) shows the schematic of a DFF design in RSFQ. This DFF is similar to the JTL
except that (1) the value of L
2
is large enough so that it can hold a
ux within the J
1
-L
2
-J
2
; (2)
J
0
is added to the input to prevent additional pulses from entering the loop while it holds
ux;
(3) I
b2
, the biasing current for J
2
, is replaced with an additional input, namely the clock input.
Initially, logic-0 is stored in the inductance loop, which is indicated by the counter-clockwise
current
owing through L
2
. This represents that the DFF is in its state-0. When the rst input
pulse arrives at point D, it triggersJ
1
and subsequently causes the current in the inductance loop
to
ip its direction to clockwise, which represents that a logic-1 is stored in the loop and the
DFF is in its state-1. The clockwise current increases the biasing current of J
2
and decreases the
biasing current of J
1
. Due to the changes in the biasing condition, any subsequent inputs from
point D will cause J
0
to
ip and the additional input pulses will be dissipated. When the clock
pulse arrives, if the DFF is in its state-1, J
2
will be triggered because it is biased closer to its
critical current. An output pulse will be generated at point Q as shown in Figure 2.2(b). On the
other hand, if the DFF is in its state-0, J
2
will not be triggered due to a low biasing current and
the clock pulse will be dissipated by J
3
, as a result, no output pulse is generated. In both cases
above, the internal state of the DFF is completely reset after the above operation is completed.
8
Figure 2.2(c) shows the state diagram associated with the DFF. This state diagram is generated
by TimEx [3].
Figure 2.2: (a) Schematic of a DFF in RSFQ [2]. (b) Simulation result of a DFF in RSFQ. (c)
State diagram of a DFF in RSFQ [3].
2.1.4 Cell library
The cell library used in our research is a parameterized cell library for logic synthesis using RSFQ
logic [4]. The latest version of this cell library includes basic RSFQ logic cells such as Josephson
Transmission Line (JTL), DFF, AND2, OR2, INV (invertor), XOR, and nondestructive readout
(NDRO). In addition to the basic logic cells, two special cells namely DC-to-SFQ converter and
SFQ-to-DC converter are used to convert a DC signal to SFQ as well as convert a SFQ back to DC
signal. In this library, passive transmission line (PTL) is used in every cell to cell interconnect.
Hence, a PTL driver (PTLTX) and a PTL receiver (PTLRX) are embedded into each cell in
the cell library. Compared to standalone PTL drivers and receivers, embedding the driver and
receiver into each cell reduces the cost and provides a more compact layout. This design method
isolates cells from each other using PTLs, so that non-ideal interactions between cells, such as
biasing current steering and re
ections, are minimized [21]. This approach has been adopted to
enable design of large chips and to develop tools to automate design of large RSFQ chips.
Hence, at electrical level, every cell is designed to be isolated from others.
9
2.1.5 Pulse-based logic operation
As mentioned in Section 2.1.1, in RSFQ technology, a quantized voltage pulse, namely an SFQ
pulse, is used to represent logic-1. On the other hand, logic-0 is represented by the absence of
an SFQ pulse. The operation of an RSFQ cell relies on storing, reproducing, transferring, and
processing of SFQ pulses.
2.1.5.1 DFF
The detailed operation of a DFF is discussed in Section 2.1.3. As a high-level summary, there is
a loop internal to the DFF cell. First, the input pulse is stored in this loop; subsequently, when
the clock pulse arrives, the stored pulse is released to generate an output pulse. In the absence
of an input pulse, no pulse is stored, hence no output pulse is generated upon arrival of the clock
pulse.
2.1.5.2 INV
The INV cell is similar to the DFF. First, the input pulse is stored in the internal loop of the cell;
subsequently this stored pulse is suppressed by the clock pulse and no output pulse is generated.
In absence of the input pulse, the clock pulse is released as the output pulse.
2.1.5.3 AND2
So far we have shown that pulses in RSFQ circuits are synchronized by clock, and logic cells in
RSFQ have internal states dened by the inductance loops within the cell. However, we have not
described the detailed logic operation of the RSFQ circuits. Therefore, we use the AND2 cell as
an example to show the details of logic operations in this technology. Figure 2.3(a) shows the
schematic of an AND2 cell. Since RSFQ is a pulse-based logic, all the inputs of the AND2 cell need
to be synchronized by clock to implement the AND function. The AND2 cell is a combination of
two DFFs and an output junction (i.e., J
5
) with a higher critical current. If
ux arrives at point
10
A, it will be stored in the corresponding DFF and the state of the cell becomes state-1. Then
if another
ux arrives at point B, it will be stored in the other DFF and the state changes to
state-3. In this case, when the clock
ux arrives, the stored
uxes will be released simultaneously
and merged, the induced current in the output junction is large enough to generate an output
ux. If at most one of the input DFFs has captured
ux before the clock
ux arrives, then the
induced current will not be sucient to switch the output junction. In both cases above, at the
end of each clock cycle, the internal state of the cell is completely reset. Figure 2.3(b) shows the
state diagram associated with AND2.
Figure 2.3: (a) Schematic of an AND2 in RSFQ [2]. (b) State diagram of an AND2 in RSFQ [3].
2.1.5.4 OR2
The schematic of an OR2 cell is shown in Figure 2.4(a). The OR2 gate is a combination of two
JTLs with their outputs connected to the input of a DFF. The rst
ux that arrives at any of the
two inputs will be stored in the DFF. Any subsequent
uxes will be blocked by either J
1
or J
2
.
When the clock
ux arrives, the stored
ux will be released. If the internal DFF has captured
ux before the clock
ux arrives, then the induced current will switch the output junction to
generate an output pulse. If the input pattern at each of the two inputs is 0, which means no
ux
is captured in the OR2 cell prior to the arrival of the clock pulse, then no pulse will be triggered
11
Figure 2.4: (a) Schematic of OR2 cell [2]. (b) State diagram of OR2 cell [3].
at the output. In both cases above, at the end of each clock cycle, the internal state of the cell is
completely reset to its unique initial state, state-0 as shown in Figure 2.4(b).
2.1.5.5 XOR2
The XOR2 cell is a combination of a DFF and an INV. Pulses that arrive at the inputs are stored;
if only a pulse arrives at any one input before the clock pulse arrives, then the stored pulse is
released by the clock pulse. If a pulse arrives at each of the two inputs, then these two pulses
suppress each other and no output pulse is generated.
In all above cases, at the end of each clock cycle, the internal state of the cell is completely
reset.
As every logic cell includes a DFF, at micro architectural level, every cell is a pipeline stage.
Therefore, RSFQ circuits are highly pipelined, and each logic cell is an individual pipeline stage.
12
2.2 Simulator and simulation environment
2.2.1 Simulator
The above cell library is developed using WRspice [22], which is a standard simulator that is
widely used in the community. A cell characterization tool named TimEx [3] has been developed
on top of the above basic simulator to perform detailed timing analysis for a given RSFQ cell. In
our research, we use these two simulators to simulate and evaluate our circuits under test.
2.2.2 Simulation environment
Figure 2.5 shows the simulation environment used in our research. DC-SFQ converters are added
at the primary inputs of the circuit to convert articial DC inputs we provide to realistic SFQ
pulses. PTLs are inserted between logic cells. PTL and PTLRXs are added at the primary
outputs of the circuit and resistors are used as load sinks at the outputs of the PTLRXs.
Figure 2.5: Simulation environment used in our research [3].
2.3 New properties of RSFQ relative to CMOS
2.3.1 Structure and parameter of a RSFQ cell
Reproduced in Figure 2.6(a) is a JTL in RSFQ logic. As mentioned in Section 2.1, the function
of a RSFQ cell not only depends on structure of the cell but also depends on its
parameter values. For example, the if the input inductance L
1
is too small, then an input
SFQ would spread across multiple junctions, on the other hand, if L
1
is too large, then it would
13
trap the input
ux in the inductance loop. The area of the junction is also an important design
parameter which aects the critical current of a junction. For instance, in the JTL, if the critical
current of J
1
is too large while the biasing current of J
1
is too low, then the input pulse at point
A would not be able to triggerJ
1
, as a result, the JTL would fail to transfer the pulse from input
to output.
Figure 2.6(b) shows the schematic of an AND2 cell. The logic operation of the AND2 cell relies
on the output junction which has a higher critical current. The output JJ will only be triggered
if the
uxes stored in the two DFFs are released simultaneously by the clock signal and merged
to generate an induced current in the output junction that is large enough to generate an output
ux. If the critical current of J
5
is too large, it would not be triggered by two combined pulses.
Similarly, if the critical current is too low, it would leap with only one pulse. In both cases, the
AND2 cell would fail. Above examples show that the structure and the parameter values
determine the function of an RSFQ cell together.
2.3.2 Bi-directionality
Figure 2.6: (a) The schematic of a JTL in RSFQ [2]. The schematic of an AND2 in RSFQ [2].
(c) The schematic of a uni-directional buer in RSFQ [2].
As shown in Figure 2.6(a), input and output of the JTL are reciprocal. Therefore, the trans-
ferring of an SFQ pulse within this RSFQ cell is not unidirectional. (This is also the case for some
14
other cells.) A pulse can propagate forward from input to output, it may also propagate backward
from output to input. Unlike in CMOS, where the gate of the transistor isolates the output from
the input. In RSFQ, the cell itself may not provide isolation for the input and output.
Figure 2.6(c) shows a buer design which is widely used in RSFQ cells [2]. This buer isolates
the output (i.e., point B) from the input (i.e., point A) using J
2
. The critical current of J
2
is
smaller than the one ofJ
1
. When an SFQ pulse arrives at point A, it only aectsJ
1
and the pulse
gets propagated to point B. On the other hand, when an SFQ pulse arrives at point B, it aects
both J
1
and J
2
. However, since J
2
has a lower critical current compare to J
1
, it gets triggered
rst and therefore prevents J
1
from leaping. As a result, the pulse cannot propagate from point
B back to point A.
In RSFQ cell, the pulse not only travels forward from input to output, it may
also travel backward. Additional JJs with appropriate parameter values may be used in a cell
design to prevent signals from propagating in undesired directions.
2.3.3 Single-pattern delay excitation and propagation
In CMOS circuits, delay is caused by transitions, 0-to-1 as well as 1-to-0. Even at a line where
steady-state values for two consecutive vectors are identical, glitches between the two steady state
values can cause delay. In contrast, in RSFQ circuits, when there is an SFQ pulse at a gate's
input(s), it potentially triggers a pulse at its output with some delay after the clock pulse arrives.
On the other hand, in the absence of an SFQ pulse at any input of a gate, no delay is invoked upon
the arrival of the clock pulse. Hence, in RSFQ circuits, delay is excited by logic-value and not by
transition. In other words, in RSFQ logic cells, delay is only associated with the propagation or
generation of logic-1.
Figure 2.7 shows the simulation results for OR2 cell for dierent input patterns. As shown
in Figure 2.7(b), input pattern (01) causes the cell to generate an output pulse with a delay of
15
Figure 2.7: (a) Symbol of an OR2 cell. (b) Simulation results for input pattern (01). (c) Simulation
results for input pattern (00).
Table 2.1: Propagation conditions of logic-1
Cell name Propagation of logic-1
DFF Always propagates logic-1
INV Always blocks the propagation of logic-1 at input
and starts a new propagation path for logic-0 at input
AND2 Only propagates logic-1 if both inputs are logic-1
OR2 Propagates logic-1 if at least one input is logic-1
XOR2 Propagates logic-1 at one input if and only if the other input is logic-0
6:5ps. On the other hand, no pulse will be triggered at the output for input pattern (00), hence
there is no delay associated with this input pattern as shown in Figure 2.7(c).
Based on the behavior of the cells, we summarizes the propagation conditions of logic-1 in
Table 2.1.
16
Chapter 3
Multi-Cell Characterization
Researchers have designed RSFQ logic cells and built logic circuits via composition of cells. Some
designs have been fabricated and their functionalities and performance veried. This hierarchical
approach relies on an abstraction for characterization of cells and composition of cells to design
circuits.
Researchers have developed such abstractions and methods for characterization of RSFQ cells.
However, some instances of cells that are certied as being robust during cell characterization fail
when incorporated into circuits. This motivated this research, which we view as the rst step
in the development of a systematic methodology for verication of cells and circuits in emerging
technologies (such as RSFQ) leading to development of more robust abstractions and methods.
In this chapter, we present a new method for characterization of RSFQ cells to expose a
much larger set of vulnerabilities, a systematic approach for identifying the root causes of these
vulnerabilities to guide the renement of cells designs, and a new way to extend test generation
approaches to perform design validation at the circuit level. We demonstrate that our new methods
and tools expose a large number of vulnerabilities and help identify root causes and enable the
design of rened versions of cells which almost completely eliminate these vulnerabilities. Finally,
we describe our extensions of in-situ circuit level characterization and use it to verify that our
rened cells can indeed be composed to create error-free circuits.
17
3.1 Introduction
RSFQ chip design teams have often used the following approach: design cells (logic and inter-
connects), build circuits by composing the cells together, and adjust individual cells manually as
necessary to get the circuit to work [23]. With the recognition that a hierarchical design
ow
with clear levels of abstractions are vital for any technology moving toward very large scale inte-
gration and design automation, developing a robust cell library has become a growing subject of
research [4] [24]. Researchers have developed abstractions and methodologies for characterizing
cells in RSFQ libraries [3]. Researchers have also developed methods to characterize each indi-
vidual cell and make it robust to fabrication variations by ne tuning cell parameters to achieve
maximum margin to variations [24]. These eorts assume that these cells can be composed to-
gether to build larger circuits. However, even when cells are designed for maximum robustness
individually, this does not guarantee that when the cells are used to compose a circuit, the logic
function of the circuit will be the composition of the logic functions of the individual cells.
In fact, as we will show ahead, logic errors do occur in circuits composed using cells that have
passed rigorous such single-cell verication. One key reason for this is that the abstractions and
methods for characterization of cells are adopted from CMOS. In CMOS, gate capacitances of
transistors isolate the internal nodes of one cell from those of the next. In contrast, in RSFQ
technology internal nodes of one cell are not isolated from those of the next. Further, the in-
terconnections are implemented using transmission lines hence interactions between cells can be
forward as well as backward.
In this chapter, we propose multi-cell characterization approaches which perform locally ex-
haustive simulations and analysis to identify weaknesses in cell designs and abstractions. We also
present our approach to modify the cells and show that our rened cells can indeed be used to
design large circuits as composition of cells. In the next section, we start with a summary of recap
the existing single cell characterization approaches.
18
In Section 3.3, we describe the proposed multi-cell characterization process and the weaknesses
of the cell abstraction revealed by our multi-cell characterization which are not revealed by single-
cell characterization. We then describe how we use the characterization results to identify root
causes of failures and to guide renements of the cell designs. Experimental results are shown in
Section 3.4 and demonstrate the eectiveness of our new methods for characterizing the RSFQ
cells and improving the robustness of the cells. Finally, the chapter ends with a summary of
ongoing research.
3.2 Single-cell characterization
3.2.1 Cell library under study
As mentioned in Chapter 2, the cell library used in this research is a generic RSFQ cell library [4]
which is designed to be used to construct larger circuit by directly composing cells together. Note
that due to the characteristics of interconnects in RSFQ, the cell library contains not only logic
cells but also interconnect cells, namely the splitter (one-input, two-output) and JTL. Further,
to maximize compatibility, the cells are designed with integrated drivers and receivers for Passive
Transmission Lines (PTLs). The parameters of each cell are optimized to achieve maximum
operating margin under process variations.
Because of the limited number of fanins and fanouts, the number of cells in RSFQ technology is
small compared to its CMOS counterpart. The maximum number of combinational logic functions
that a two-input cell can have is 16. After eliminating trivial functions (i.e., constant logic-0 and
constant logic-1) as well as the isomorphic ones (e.g., A
B and
AB), the number of boolean
functions reduces to 8. The cell library (version 1.1.13) used in this research contains 5 principal
logic cells and 2 interconnect cells. The number of cells in other libraries are also relatively small.
For example, there are 9 principal cells in the CONNECT cell library [25]. 5 principal logic cells
19
and 15 interconnect cells in the AIST standard library [26], and 17 cells in SUNY RSFQ cell
library [27].
3.2.2 Single-cell characterization approaches
Reproduced in Figure 3.1 is the simulation environment used in the single-cell characterization
process [3]. Circuit Under Test (CUT) is the cell under characterization, the DCSFQ cells at the
inputs are used to convert articial current stimuli to realistic SFQ pulses, the sink resistor is
used to eventually consume the SFQ pulses at the output.
Figure 3.1: Simulation environment of the cell characterization process [3].
For each cell, TimEx [3] is used to extract the cell's logic behavior and timing characteristics.
TimEx starts by identifying all inductance loops in the given circuit which can be used to dene
the states of the cell. It then systematically simulates the cell in the above environment using
all possible input combinations and sequences to extract the logic behavior of the cell. A nite
state machine (FSM) diagram is generated to show the state space of the cell and all possible
transitions between those states. Any erroneous state or invalid transition can be identied in
the FSM diagram. Figure 3.2(a) shows the FSM diagram of a logically correct AND2 cell design.
Figure 3.2(b) shows an AND2 cell design with an erroneous state.
After characterizing the behavior of the cell, margin analysis is performed for the cell. Because
the function of the cell not only depends on the structure of the cell but also relies on the parameter
values of the cell, margin analysis is an important process to ensure that the cell would work
properly under process variations. As described in [5], each parameter of the cell is analyzed to
20
Figure 3.2: Mealy state diagram of two RSFQ AND cell designs. (The FSM diagrams are taken
from [4]).
determine the operating margin in the presence of process variation. An example result of margin
analysis is shown in Figure 3.3. The values of the parameters of the cell are ne-tuned to achieve
maximum margin.
At the end of the single-cell characterization process, all the cells in the cell library are designed
and ne-tuned for robustness to process variations. All of these cells have not only passed the
single cell characterization, they also have shown to have sucient margin to process variations.
However, when composed together to build large circuits that are completely free of logic errors,
some failures occur.
21
Figure 3.3: Margins for each parameter of AND2. (The gure is taken from reference [5]).
3.3 Proposed multi-cell characterization approach
3.3.1 Motivation
Single-cell characterization produces cells that are maximally robust to process variations. How-
ever, when cells are composed together to build circuits, some circuits may fail. For example,
Figure 3.4 shows a 3-input OR circuit composed using two 2-input OR cells, OR2. (Since every
cell is clocked, a DFF cell, g
3
, is needed to balance the sequential depths of all paths via the
circuit.) Individually, each of the OR2 cells works correctly; however, when composed together as
shown, the 3-input OR circuit fails. Specically, the input pattern 011 produces logic-0 instead
of logic-1 at the circuit output (details ahead).
This example motivated our research by showing that the CMOS-inspired abstraction of cells
is not accurate and the existing cell characterization setup is not general enough to expose all
critical failures of RSFQ cells.
22
Figure 3.4: The schematic of a 3-input OR circuit.
3.3.2 Key ideas and overview
Our objective is to develop a method to establish that the cell designs are consistent with the
abstraction which allows logic circuits to be composed using cells.
To achieve this, we propose a new method to characterize and verify the cells for composability.
If any cell under study fails verication, we identify the corresponding failure cases, followed by
root-cause analysis. We then identify candidate changes to cell designs (changes to parameter
values and/or topology) to eliminate failures and select a set of changes and rene cell designs.
We then verify the rened cell designs, and repeat all subsequent steps until we obtain cells that
satisfy our requirements. Next, we brie
y outline the three key steps of our process.
Multi-cell characterization: To ensure that the cells are robust enough to be composed
in an arbitrary manner, we not only need to characterize individual cells, but also need to verify
that our logic abstractions are still true when multiple cells are composed together. Therefore, we
developed an approach to study each cell in our library (cell under study, i.e., CUS) in the larger
context in which the CUS will be used in a circuit.
In this simulation environment, we rst characterize not just every possible single-cell CUS,
but compose multiple cells at a time (three cells, in our approach to date) and apply locally
exhaustive patterns. We use this setup to perform extensive simulations to characterize the
23
behavior of basic RSFQ logic cells and check for logic composability in the following manner. We
simulate the cells at SPICE-level (i.e., at the level of JJs, inductances, and specic SFQ pulses)
and convert simulation results to logic values. We compare the logic-value results for each CUS for
our multi-cell characterization with the results of single-cell characterization. As we show ahead,
our method identies many failures.
Identify root-causes of failures: Any deviation from the single-cell characterization in-
dicates a failure and a violation of our desired abstraction. We analyze each such deviation to
identify its root cause. As we show ahead, our method identies new types of failures.
Renement of cell library: Once we identify the root cause of each failure, we identify
candidate changes to the cell's designs we can make to eliminate the failure. We then identify a
set of renements that eliminate all (most) failures observed for all cells in our library.
We then repeat all the above steps, until we obtain a rened cell library that we consider
satisfactory. As we show ahead, our method helps eliminate all of the failures.
Finally, we evaluate our rened cell library by using it to design large benchmark circuits and
simulating the benchmark circuits to show the eectiveness of our new approach.
Our multi-cell characterization approach not only veries the logical correctness of the cell,
but also captures the delay values associated with each multi-cell conguration. Once all the logic
failures are eliminated, our multi-cell characterization approach would capture the global worst
case delay values across all congurations so that the timing analyzer can use these delay values
as a robust delay model to perform timing analysis.
3.3.3 Conguration based multi-cell characterization
The single-cell characterization implicitly assumes an abstraction and isolation of the cells that
are similar to CMOS. However, the fact that these cells cannot be directly composed together
to design large circuits indicates that these cells have weaknesses which are not revealed by this
characterization process. Therefore, to study the behavior of the cell under study (CUS) in the
24
actual context of larger logic circuits, we propose a new approach that carries out multi-cell
characterization.
Our approach uses the simulation environment that is the same as the one shown in Figure
3.1, expect that the circuit under test (CUT) consists of multiple cells instead of a single cell.
Since the drive cell and the load cell for a particular CUS may aect the operating conditions
of the CUS, we propose to characterize each cell in all possible k-level circuits for the given cell
library. This combinatorial approach is feasible because the number of cells in cell libraries are
relatively small, and the values of cell parameters are xed. Specically, in our implementation,
we use k=3 for two main reasons. First, the eects of cell-to-cell interactions diminishes as the
level of logic cells increases. Second, since RSFQ circuits are pipelined at the cell level, the timing
relations between the SFQ pulses at inputs of CUS are relatively independent of k.
We select each cell in the cell library as the CUS and simulate it in combination with every
possible cell at its input(s) and every possible cell at its output. Because in RSFQ technology,
in addition to ve logic cells, the cell library also contains a cell for interconnects, namely the
splitter, we study the cases where an input of the CUS is with and without splitters.
As shown in Figure 3.5(a), the circuit under test consists of three cells, namely the driver, the
CUS, and the load. We also perform simulations for the congurations shown in Figure 3.5(b) in
which a splitter is inserted between the driver and the CUS. For each cell as CUS, we generated 50
dierent congurations, namely 5 dierent driver cells, 5 dierent load cells, and whether or not an
additional splitter cell is inserted between driver and the CUS. Collectively, these congurations
capture the more realistic environments in which the CUS operates in a circuit. A much richer
set of congurations may be generated by inserting splitters at dierent locations, enumerating
dierent load cells for the splitters, etc., but our current implementation is adequate as shown by
the results on large benchmark circuits.
These multi-cell characterization congurations are only needed at the initial design phase of
the cells to expose vulnerabilities of the cells. The cells are then rened to achieve the level of
25
Figure 3.5: Conguration of circuit under test. (a) Without interconnect. (b) With interconnect
(splitter).
isolation that is required to compose large circuits. Therefore, for the state of the art cell libraries,
because of the small number of cells (n), the complexity of our method, O(n
3
), is acceptable. As
the understanding the RSFQ technology develops, the abstraction of the operating environment of
the cells will become mature, and we will rene our method to prune the number of congurations
to make it suitable for cell libraries that have larger numbers of cells.
During our multi-cell characterization, we apply exhaustive patterns to verify whether the
logic of the circuit under test is the composition of the logic of individual cells.
For any input pattern, if there is any discrepancy between the above mentioned logic simulation
and the WRspice simulation, then that pattern is marked as a failure case.
26
Figure 3.6: An overview of our automated verication process.
We developed an approach which automatically veries the results of a multi-cell congura-
tion using WRspice simulations and generates a summarized report for designers to verify the
correctness of the conguration. Our approach dramatically reduces the amount of human eort
required to perform verication in WRspice. It also reduces the amount of information that must
be processed by human experts.
Figure 3.6 shows a summary of our automated verication approach. The inputs are: (1) a
le which describes the structure of the CUT, and (2) a standard cell library which contains the
WRspice sub-circuit netlists of all the cells under study.
We then combine these inputs and generate an equivalent WRspice circuit netlist for the given
CUT. During this process, we also add all the necessary auxiliary circuits required to create a
realistic WRspice simulation environment (e.g., a DCSFQ cell is added to each primary input to
convert an articial input signal (typically, a logic value) to a realistic SFQ signal).
27
Based on the number of primary inputs in the CUT, we generate exhaustive set of input
patterns for simulating the WRspice netlist. The set of input patterns is then converted to a
current v.s. time representation for WRspice simulation.
Along with the WRspice netlist, we also generate the corresponding logic-level netlist based
on the CUT. The abovementioned set of input patterns is also applied to this logic-level netlist
and a build-in logic simulator is used to produce the expected logical output responses for each
given input pattern. The logical behavior of a cell is extracted from the single-cell characterization
which is used as a representation of the designer's intent.
After performing WRspice simulation, we analyze the voltage v.s. time and/or the phase v.s.
time waveforms generated by WRspice and converts the output responses into logic values based
on desired criteria. The logic values computed based on WRspice simulation are then compared
with the logic values generated by our logic simulation. A report summarizing the matches and
errors is generated for the human expert to verify the correctness of the circuit.
In the best case, the report would simply conrm that for the given CUT the WRspice sim-
ulation results match logic simulation results for all patterns and the human expert can avoid
reviewing extensive voltage/phase vs. time waveforms produced by WRspice. In other cases,
the report will show a few failing patterns and the human expert may need to examine in detail
the voltage/phase vs. time waveforms only for these patters for identifying the root cause and
developing a strategy for renement.
Table 3.1 shows the failure rate of each cell in our initial library in terms of number of input
patterns.
Table 3.1: Multi-cell characterization result before renement
Cell name Failure rate based on the number of input patterns
AND2 158/512 (30.86%)
OR2 144/512 (28.12%)
DFF 89/256 (34.77%)
INV 105/256 (41.02%)
XOR2 185/512 (36.13%)
28
As shown in Table 3.1, even though every cell in our original cell library works individually,
a cell may fail when combined with other cells in a large circuit. This clearly shows that the
abstraction we have borrowed from CMOS is not accurate for this technology.
Specically, in CMOS we have achieved a level of understanding and isolation so that we
can characterize a cell by itself by abstracting out the environment. For most typical CMOS
cells (complementary gates), the gate capacitor isolates one cell's internal nodes from the internal
nodes of the next cell. Hence, the environment on the cell output can be abstracted as a load
capacitance, and the input side environment can be abstracted using arrival and rise/fall times of
input waveforms. Recall that, even for CMOS, this abstraction only works for some types of cells
(e.g., complementary gates) and may not work for other logic families (e.g., pass transistor logic
cells without INVs at inputs and outputs). For such cells we need to apply additional constraints
during circuit design. Therefore, the abstraction of the cell may not be clean and perfect, but we
need to identify its weaknesses and understand its limitations through multi-cell characterization
process. Our methodology provides a systematic way for identifying problems in the
abstraction for new technologies.
3.3.4 Identifying root causes of failures
We have shown above that our multi-cell characterization exposes many failures that are not seen
during single-cell characterization. That means that the single cell characterization setup is not
as universal as it should have been. By comparing the dierences between the simulation results
provided by our multi-cell characterization and single-cell characterization, we are able to identify
the root causes of these dierences which provide constructive guidance for renement of these
cells. Even though some cells fail during many instances of multi-cell characterizations, when we
go through the details of the simulation results, we nd that the failure modes are similar across
the instances.
29
We start by categorizing the observed logic failures into two categories, namely logic-1 is
produced when logic-0 is expected, and logic-0 is observed instead of a logic-1. Because RSFQ
is a pulse based technology, logic-1 is represented by the existence of a pulse while the absence
of the pulse represents logic-0. Hence, the functionality of an RSFQ cell relies on triggering
designated JJs due to specic controlling events (e.g., a clock) and/or triggering JJs in a desired
order. Therefore, the above two dierent behaviors can be categorized in an alternative manner:
a JJ gets triggered in an unexpected order, and a JJ is not triggered upon the application of a
specic controlling event. Further, since RSFQ circuit has quantized operation, an SFQ pulse
(i.e., logic-1) cannot disappear without causing some other eect on the circuit. If a JJ (say,
J
i
) is not triggered due to a specic controlling event, then it either gets triggered before the
designated controlling event, or it never gets triggered because another JJ (say, J
k
) consumed the
pulse before J
i
consumed it.
After above reasoning, we categorize the root causes of logic error into two new categories:
(1) the JJ is triggered at a wrong time, relative to a designated controlling event (e.g., a clock),
and (2) a wrong JJ gets triggered. Having arrived at this new categorization, we identify the
root cause of any particular logic failure by identifying the errors in JJ triggering times or the ID
of the triggered JJs, between the simulation results produced by single-cell characterization and
multi-cell characterization.
3.3.4.1 JJ is triggered at a wrong time
We started by comparing the single-cell characterization of the AND2 cell with the most similar
multi-cell setup, namely the setup shown in Figure 3.7.
Recall that even though in CMOS logic values are represented in terms of voltages, in RSFQ
the phase is the fundamental quantity. The logic-0 and logic-1 in RSFQ are not determined by
voltage, but are determined by phase. Therefore, our analysis of the waveforms is based on phase
vs. time. (The voltage values are shown only to help visualize the sequence of events.)
30
Figure 3.7: For AND2 cell: (a) single-cell characterization setup. (b) multi-cell characterization
setup.
For the AND2 cell, to produce a logic-1 at the output, each input of the cell needs to capture
a
ux before the clock pulse. Subsequently, when the clock pulse arrives, both
uxes are released
to trigger the output JJ to generate a logic-1 at the output of the cell as shown in Figure 3.8.
The interpreted logic value at each node is shown in Figure 3.10.
However, in the multi-cell characterization case, the output JJ is triggered right after the input
pulses are captured. It seems that the multi-cell environment, especially the load cell, changes
the biasing condition of the output JJ. In turn, this makes that JJ more sensitive to changes in
current. Our rst attempt of xing the cell was to increase the critical current of the output JJ
of AND2 (see Figure 3.11) to ensure sucient margin of the biasing current. While this did x
the issue, to conrm the root cause, we needed to compare the changes in the biasing current and
verify the sequence of events in terms of phase. Because phase is the fundamental quantity in
RSFQ technology, the root cause of the failure needs to be conrmed based on the analysis of the
changes in phase of all the JJs in the cell.
However, further in-depth analysis showed that even though the load is dierent, the biasing
condition does not change much due to the use of PTLTX and PTLRX. We then look at sur-
rounding circuit elements of the output JJ, namely the nearby JJs, the nearby inductors, and
loops formed by JJs and inductors.
31
Figure 3.8: Simulation results of AND2 cell for single-cell characterization case.
Our further analysis of the simulation results shown in Figure 3.9 revealed that the phase of
the clock JJ has a leap right after the input JJs get triggered (i.e., before the arrival time of the
clock signal), and subsequently causes the phase of the output JJ to leap. The schematic of the
AND2 cell is shown in Figure 3.11 with the clock JJ (J
9
) and the output JJ (J
10
) marked in red
circles.
Having identied the actual root cause, we moved to identifying candidate changes to the cell
design that will eliminate such errors. To prevent the root cause of our errors, i.e., to prevent the
32
Figure 3.9: Simulation results of AND2 cell for multi-cell characterization case.
clock JJ from ring without the clock, a more appropriate renement would be to increase the
critical current of the clock JJ instead of the output JJ.
Table 3.2 shows the values of critical and peak biasing currents for the original design, as well
as for the two proposed renements, namely increasing the critical current of the output JJ and
increasing the critical current of the clock JJ. The results show that by identifying the root cause
of the failure, we are able to identify a better renement to improve the robustness of the cell.
33
Figure 3.10: Simulation results of AND2 cell for (a) Single-cell characterization case. (b) Multi-cell
characterization case.
Figure 3.11: Schematic of AND2 cell.
3.3.4.2 A wrong JJ is triggered
Among all its multi-cell characterization congurations, OR2 cell only fails in some cases.
Figure 3.12 shows a conguration in which OR2 cell fails. In this case, the drivers are DFF
cells, the cell under characterization is the OR2 cell, and the load is also an OR2 cell. As described
34
Table 3.2: Result of dierent renements
Current (mA) Original design
Rened design by
increasing critical
current of output JJ
Rened design by
increasing critical
current of clock JJ
Output
JJ
Critical 1.25 1.8 1.25
Biasing 1.04 1.27 0.96
Dierence 0.21 0.53 0.29
Clock
JJ
Critical 1.22 1.22 1.8
Biasing 1.24 1.18 1.5
Dierence -0.02 0.04 0.3
Figure 3.12: A multi-cell characterization setup for OR2 cell.
in Chapter 2, the input pulse of the OR2 cell changes the biasing condition of the output JJ by
changing the direction of the current within the inductance loop of the OR2 cell. In the ideal
case, once the input pulse sets the direction of the internal inductance loop, the direction remain
unchanged until it is reset by the clock pulse. However, in the failure case, the phase of a JJ in g1
is in
uenced unexpectedly under certain input patterns which subsequently changes the current
in the inductance loop of g1. When the clock pulse arrives, the output JJ of the cell under study
(i.e., g1) is not biased as it should be, therefore, no output pulse is generated. Figure 3.13 shows
the detailed simulation results for this failure.
After identifying the root cause, we rened the OR2 cell by increasing the critical current of
the JJ which is triggered unexpectedly. Simulation results shown in Figure 3.14 shows that the
cell works as expected after the renement.
35
Figure 3.13: Simulation waveform of OR2 cell which fails in multi-cell characterization.
3.3.5 Renement of the cell library
As the cells are simulated in more realistic environments during our multi-cell characterization,
a large number of instances of failures are revealed as shown in Table 3.1. Our above method
identies the root causes of these failures by comparing the results of multi-cell characterization
and single-cell characterization. Once the dierences are found, we create a list of candidate
cell redesigns for eliminating the failure. While in the above, we selected candidate redesigns to
36
Figure 3.14: Simulation waveform of OR2 cell which passes in multi-cell characterization.
eliminate specic failures in individual cells, our actual goal is to eliminate all possible errors in
all the cells in the library. Hence, we carefully select a set of cell renements that minimize the
total number of failures across the entire cell library.
Table 3.3 shows all the renements that we apply to cells in the current cell library.
The eects and the quality of our renements are then veried by performing the multi-cell
characterization again.
37
Table 3.3: Renements identied by our multi-cell characterization approach
Cell name Renement
AND2 Increased the critical current of clock JJ
OR2 Increased the critical current for input JJ
SPLITTER Reduced the inductance value of the input inductor
INV Increased the biasing current for input JJ
3.3.6 In-situ multi-cell characterization
The conguration based multi-cell characterization process achieved the goal of characterizing
cells under a wider range of operating environment which a cell will encounter when used in
larger circuits. Our approach exposes many failures that are not seen during single-cell charac-
terization and identies several renements which completely eliminate the weaknesses found in
the approach.
However, there are still limitations of the proposed conguration based approach. First, since
it is a combinatorial approach, the total number of congurations is limited in the actual imple-
mentation. For example, the level of enumeration can be expanded from k=3 tok=5. Even with
k=3, the number of congurations can still be expanded by enumerating a richer set of connec-
tions. For example, in our current implementation, if the load cell of the CUS has two inputs,
then the other input of the load cell is assumed to be driven by a DFF while other cells in the
cell library could also be used to drive that load cell of the CUS. Second, the conguration based
approach cannot enumerate continuous parameters (e.g., the length of the PTL interconnections).
Even though we could enumerate a set of discrete values of PTL length, the complexity of this
approach can increase dramatically.
In addition, due to the lack of actual circuit design information, some of the non-idealites
cannot be captured in the conguration based approach. For example, the changes in the biasing
current distribution network or the coupling with cells other than the immediate fan-in and fan-out
of CUS can only be captured in a realistic circuit.
38
Therefore, in addition to the conguration based multi-cell characterization, we also propose
the in-situ locally exhaustive approach.
Figure 3.15: In-situ multi-cell characterization of a 1-bit full adder.
Our in-situ approach starts by identifying each cell as a CUS in a given logic circuit. For each
CUS, we identify a local region (i.e., a window) around the CUS which includes the immediate
driver cell and the immediate load cell of the CUS. We then generate exhaustive local patterns
for the local region. Then an ATPG is used to identify the corresponding global patterns (i.e.,
patterns at the primary inputs) of the local patterns. Finally, the entire circuit is used as the CUT
for our automated verication process using WRspice. Any deviations from the logic simulation
would be
agged and the same approach for identifying root causes of failures is used to rene
the CUS.
Figure 3.15 shows an example of the in-situ multi-cell characterization approach of an OR2
cell in a 1-bit full adder.
For a circuit withn cells, the complexity of our in-situ characterization approach is O(n 2
m
),
wherem is the number of inputs of the local region. Since in the current cell library, the number
of inputs of a cell is limited to 2, the complexity of our approach is O(n 2
4
), which is linear
with respect to the size of the circuit. Because the actual circuit is used as CUT in our in-situ
39
approach, additional non-idealities such as changes in biasing current distribution network and
inductive coupling with neighboring cells can also be characterized.
3.4 Results
To show the eectiveness of our new method, we start by presenting the results of multi-cell
characterization after our abovementioned renement of our cell library.
Table 3.4 shows that by applying only four renements, we eliminated all the weaknesses in
this cell library compared to the results for our initial cell designs (see Table 3.1).
Table 3.4: Multi-cell characterization result after renement
Cell name Failure rate based on the number of input patterns
AND2 0/512 (0%)
OR2 0/512 (0%)
DFF 0/256 (0%)
INV 0/256 (0%)
XOR2 0/512 (0%)
To further show the eectiveness of our methods, we use our cells to design logic circuits,
such as arithmetic circuits and ISCAS85 benchmark circuits. For each logic circuit, we create one
version using our original cells and a second version using our rened cells.
We then carry out extensive WRspice simulations for both versions of each logic circuit. The
number of primary inputs for the benchmark circuits shown in Table 3.5 are small, hence we use
exhaustive sets of patterns to verify the correctness of these circuits. However, benchmark circuits
shown in Table 3.6 have large numbers of primary inputs. Therefore to verify these circuits we
use a set of test patterns generated by a modied version of an ATPG [1] to carry out in-situ
characterization of each instance of each cell and verify its correctness.
The results of these simulations are shown in Table 3.5 and Table 3.6. These results clearly
show that our method has helped create a rened cell library which can be used to design error-free
40
circuits via composition. These results also show that our multi-cell characterization approach as
well as our systematic approach for identifying the root causes of failures are indeed eective.
Table 3.5: Simulation results of arithmetic benchmark circuits using exhaustive set of patterns
Circuit under test
Number of
failed patterns
before renement
Number of
failed patterns
after renement
FA 7 / 8 (87.50%) 0 / 8 (0.00%)
KSA4 511 / 512 (99.80%) 0 / 512 (0.00%)
MULT4 225 / 256 (87.89%) 0 / 256 (0.00%)
DIV4 241 / 256 (94.14%) 0 / 256 (0.00%)
Table 3.6: Simulation results of arithmetic benchmark circuits using in-situ ATPG patterns
Circuit under test
Number of
failed patterns
before renement
Number of
failed patterns
after renement
KSA8 592 / 722 (81.99%) 0 / 722 (0%)
KSA16 1656 / 1800 (92.00%) 0 / 1800 (0%)
KSA32 4188 / 4553 (91.98%) 0 / 4553 (0%)
MULT8 3173 / 3526 (89.99%) 0 / 3526 (0%)
c432 1335 / 1649 (80.96%) 0 / 1649 (0%)
c499 2293 / 2763 (82.99%) 0 / 2763 (0%)
c880 2232 / 2255 (98.98%) 0 / 2255 (0%)
3.5 Conclusion and future work
In this chapter, we propose a new multi-cell method for characterization of RSFQ cells and show
that this exposes a much larger set of vulnerabilities of cells when they are used to compose
larger logic circuits. We also present a systematic approach for identifying the root causes of
these vulnerabilities to guide the renement of cells designs. We demonstrate that our new
rened cell designs completely eliminate these vulnerabilities. Finally, we develop in-situ multi-
cell characterization for circuit level verication and use it to verify that our rened cells can
indeed be used to create completely failure free benchmark circuits.
We view these methods for failure identication and ATPG-driven approach for circuit level
verication as the rst step in the development of a systematic methodology for verication of
41
cells and circuits in RSFQ technologies leading to development of more robust abstractions and
methods.
As mentioned in Section 2.1.5, even though the cells studied in our research are all sequential
cells, at the end of each clock cycle, the internal states of the cells are completely reset to their
unique initial states. There are other logic cells (e.g., NDRO) in the RSFQ cell library whose
internal states are not reset at the end of each clock cycle. In our future research, we will develop
a systematical approach to characterize such logic cells.
We will also develop an approach to automate the process of identifying the root cause of the
dierences between multi cell characterization and designer's intention.
This requires the development of methods to capture designer's intention and the methods to
extract sequence of events from simulation result and compare it with designer's intention. We will
also develop a comprehensive abstraction for the operating environment by developing a notion
of margin and identify necessary and sucient constraints for correctness. Then we will develop
a systematical method for cell renement. Performing in-situ characterization in the presence of
biasing current redistribution and inductive coupling with neighboring cells are also subjected to
future research.
42
Chapter 4
Cell characterization under non-idealities
As described in section 2.1,
uxes can be stored in internal inductance loops within the RSFQ logic
cells and dene the state of an RSFQ circuit. The
ux-induced current within those inductance
loops changes the biasing condition of JJs in that circuit. Logic operations in RSFQ circuits are
realized by transferring and storing the
uxes in the circuit.
Compared to CMOS, the power and performance benets of RSFQ come from the use of
Josephson junctions instead of small feature sizes, hence defects are not a major cause of chip
failures. However, process variations and RSFQ-specic non-idealities (e.g., coupling via mutual
inductance, biasing current redistribution) can greatly in
uence the conditions for pulse trans-
ferring and storing which eventually aect the operation of a RSFQ circuit and lead to circuit
failures in the form of erroneous logic values. Therefore, process variations and other non-idealities
become the leading causes of chip failure.
In this chapter, we use extensive simulations for a range of types, instances, severities of
variations and non-idealities (e.g., process variations, coupling via mutual inductance, etc.) to
systematically analyze the behavior of RSFQ circuits under high-levels of variations and non-
idealities.
43
4.1 Eects of non-idealities
4.1.1 Biasing current redistribution
The stored SFQ pulse in a certain RSFQ cell aects the biasing condition of that particular cell.
For example, when an input pulse is stored in a DFF, it causes the current in the inductance loop
to
ip its direction from counter-clockwise to clockwise. This not only changes the state of the
cell from logic-0 to logic-1 but also aects the biasing conditions of the JJs within the cell.
Unlike the conventional CMOS technology where cells are isolated by the gate capacitances,
in RSFQ technology, one cell can aect another cell through the redistribution of biasing current.
Hence, the existence of the stored
ux not only aects the biasing condition of that DFF, it also
aects other cells (e.g., JTLs) which are connected to the DFF as shown in Figure 4.1. This is
Figure 4.1: The biasing condition of a JTL which is connected to a DFF. (a) The internal state
of the DFF is logic-0. (b) The internal state of the DFF is logic-1.
known as the biasing current steering (i.e., biasing current redistribution). The amount of sharing
depends on the state of the cell as well as on the combination of cells. Because the logic operation
of these cells rely on biasing the JJs properly, such steering aects the biasing condition and may
eventually leads to failure in terms of erroneous logic values. In order to avoid steering, JTLs or
PTLs are inserted between cells. Since in our current cell library, PTL is used in all cell to cell
interconnections, the issue of biasing current steering is avoided.
44
4.1.2 Process variations
We then designed an extensive set of simulations for a wide range of types, instances, severities
of variations to study the eects of process variations in RSFQ logic. For instance, as shown in
Figure 4.2, variations in the input inductor of a JTL can highly distort the output
ux (i.e., make
the pulse half as short at peak value and 2 times wider). Because of SFQ's quantized operation,
the area under the curve remains unchanged.
Figure 4.2: (a) Simulation result of a normal JTL. (b) Simulation result of a JTL with a large
input inductance due to process variation.
4.1.3 Inductive coupling
We also designed a test circuit to study the interactions between a cell and its neighboring cells
in the layout due to inductive coupling (mutual inductance). The simulation results show that
inductive coupling indeed causes energy transfer between cells. Specically, the peak value and
the width of the
ux in one cell is aected by the
ux in the other cell. Even though the logic
value is not
ipped, the timing of the pulse is aected.
In all above cases, the eect of variation/non-ideality on timing propagates and may eventually
cause logic error, especially at high clock rates.
45
Table 4.1: Summary of experiments on process variations and SFQ-specic non-idealities
Categories Descriptions
Number
of cases
Number
of logic
errors
Number
of timing
issues
Process variations
Variations in value of inductors
associated with junction
5 0 1
Variations in value of
capacitors associated with junction
2 0 0
Variations in value of load resistor 5 0 1
Variations in value of inductors
in the circuit such as input inductor
10 0 8
Variations in value of biasing current 5 1 0
Variations in the shape
of the pulse
Narrow and tall pulse 5 0 5
Wide and short pulse 5 0 5
Coupling of inductors
Inductor coupled with other
inductor within the same cell
5 0 2
Inductor coupled with other
inductor in a dierent cell
5 0 2
Table 4.1 shows a summary of our experiments on process variations and RSFQ-specic non-
idealities. Most of the simulation instances do not create logic errors, but most eventually cause
timing failures.
Due to quantized operation, even highly distorted pluses are still interpreted as logic-1 and the
shapes of pulses are often restored in the subsequent stages, but the timing is aected. Among all
the non-idealities, process variation has the most signicant impact on timing, therefore, we focus
our study on process variations.
4.2 Cell characterization under process variations
We developed a simulation environment to study the eect of key process variations on key
circuit parameters, namely the areas of Josephson junctions and inductance values. We use this
to perform extensive simulations to characterize the behavior of basic RSFQ logic cells under
process variations. This is the rst step in our development of a model of propagation of SFQ
pulses via an RSFQ circuit with delay deviations caused by process variations.
46
Dening delay: Even though the shape of the signal can vary greatly (i.e., the width of the
pulse, the number of peaks, the maximum height of the pulse, the slope of the rising edge and the
falling edge, etc.), a fundamental assumption is used to determine the value of the signal. If the
area under the voltage v.s. time curve exceeds a certain threshold for a given time period, then
the signal is interpreted as a logic-1, otherwise, the signal is determined as a logic-0 [3].
Since in RSFQ delay is only associated with the existence of a pulse (i.e., logic-1), the delay
of the signal is computed with respect to the arrival time of the corresponding pulse [1]. Two
methods can be used to determine the arrival time of a pulse during the cell characterization
process. One is to identify the peak of the pulse and dene the arrival time of the pulse as the
time when the pulse reaches its peak value [28]. The other method is to compute the area of the
pulse and use the time when it reaches 50% as the arrival time of the pulse [3].
In our approach we dene the delay as the time interval between when the input pulse reaches
50% to when the output pulse reaches 50%.
In this thesis, we use T to denote moments in time (e.g., arrival time of a signal) while is
used to denote time interval between two signals (e.g., the time interval between data and clock).
Figure 4.3 shows a timing diagram of a DFF. The input data signal arrives before the clock signal,
when the clock signal arrives, the output signal is generated. The timing interval between data
and clock, DC, is dened as the time dierence between the arrival of data signal and the clock
signal. The clock-to-Q delay, CQ, is dened as the timing dierence between the clock signal
and the output signal.
4.2.1 Cell characterization under normal input conditions
Monte Carlo simulations are performed to characterize the eects of process variations. A Gaus-
sian distribution is used to model the eects of process variations on each circuit parameter value,
with the mean equal to the nominal value of the parameter and standard deviation () value
extracted from the given process. Process variation values we used for simulating variations of
47
Figure 4.3: Timing diagram of a DFF.
cells designed under MIT LL 100m process are shown in Table 4.2 [5]. For each type of circuit
element, such as the inductors, the correlations between variations are assumed to be 1 for all
elements of that type within a cell, while the correlations across dierent cells are assumed to be
0 [5]. The correlations between dierent types of circuit elements are assumed to be 0.
Table 4.2: Process variation values for MITLL 100 process
Parameter Process variation ()
Junction Area 3%
Inductance 5%
Since delay in RSFQ is only associated with logic-1, we studied all input vectors that cause
each cell to generate a logic-1 at its output. Figure 4.4, 4.5, 4.6, 4.7, and 4.8 show the eects of
process variations on clock-to-Q delay values for dierent input vectors for cells in our library. For
each cell, we show the distribution for every input vector that produces a logic-1 at the output.
The red bar marks the bin into which the cell's nominal delay value falls. The error free ratio
is the percentage of Monte Carlo simulation instances that produces correct logic outputs for all
possible input vectors.
The Monte Carlo simulations show that even under process variations, most of the cells still
produce correct logic outputs. However, the delay of some cells can increase by more than 25%.
The additional delay introduced by process variations causes the pulse to arrive late at the input
of subsequent cells.
48
Figure 4.4: Distributions of clock-to-Q delay values for DFF cell under process variations on the
values of inductors and junction areas. Input pattern 1.
Figure 4.5: Distributions of clock-to-Q delay values for AND2 cell under process variations on the
values of inductors and junction areas. Input pattern 11.
We use the test circuitry shown in Figure 4.9 to illustrate the eect of the additional delay
caused by process variation. We rst create 720,720 Monte Carlo instances of the DFF cell under
process variations and then use these DFFs to construct 360,360 two-stage pipeline circuits. After
that we choose a clock period to operate these circuits so that all of these circuits produce
logically correct results.
Then we use the same set of DFFs to build circuits with dierent number of pipeline stages
(i.e., DFFs) and compute the probability of failure (i.e., having erroneous logic value) for the same
49
Figure 4.6: Distributions of clock-to-Q delay values for OR2 cell under process variations on the
values of inductors and junction areas. (a) Input pattern 01. (b) Input pattern 10. (c) Input
pattern 11.
Figure 4.7: Distributions of clock-to-Q delay values for XOR2 cell under process variations on the
values of inductors and junction areas. (a) Input pattern 01. (b) Input pattern 10.
Figure 4.8: Distributions of clock-to-Q delay values for INV cell under process variations on the
values of inductors and junction areas. Input pattern 0.
50
Figure 4.9: An experimental conguration used to study the eect of additional delay caused by
process variations.
clock period . In each circuit, we used the same number of DFFs to ensure that the dierences
in failure probability were not due to dierent circuit complexities. For example, we built 240,240
three-stage pipeline circuits and identied that 408 (0.17%) circuit instances produce erroneous
logic value when clocked at .
Figure 4.10: The probability of timing failure increases as DC decreases.
As shown in Figure 4.4, the process variation may not be sucient to cause immediate logic
error at a single cell, it may cause an increase in the clock-to-Q delay (i.e., CQ) which reduces
the DC of the next cell along the path. The reduced DC would cause the CQ of the next cell
51
to further increase as shown in Figure 4.10. The additional delays at multiple cells can accumulate
along a path and the failure probability may compound along the path. Figure 4.11 shows that
the probability of timing failure increases as the number of stages increases.
However, due to the dierent timing characteristics between RSFQ cells, if cells with large
delays are followed by cells with small delays, then the failure probability may not keep increasing
with path length. In addition, as we show ahead, cells with specic timing characteristics (e.g.,
inverting cells) would completely block the compounding of the failure probability.
Figure 4.11: The probability of timing failure for pipelines with dierent numbers of stages.
4.2.2 Timing bleed: Cell characterization under additional delay at
cell's input(s)
Process variations in the cells in the transitive fanin (i.e., fanin, or fanin of fanin, and so on) of a
cellC
i
can cause additional delay and hence cause the arrival time of the pulse(s) at the input(s) of
C
i
to move closer to the arrival time of the clock pulse for C
i
. We then studied the accumulation
of additional delays along a sequence of pipeline stages (i.e., multi-cycle path). While we can
52
change the values of circuit parameters for cells under simulation to create additional delay, in
order to model a cell under all possible delay values, we started by indirectly simulating increases
in circuit delays by adjusting the relative timing of the clock pulse and the data pulse. We also
indirectly simulating increases in circuit delays by inserting JTLs into the circuit, the results are
the same as adjusting the relative timing of the clock pulse and the data pulse.
Figure 4.12: Amount of clock-to-Q delay under dierent input delay conditions for dierent cells.
4.2.2.1 The eect of additional delay at inputs
Our rst simulation results conrmed that in RSFQ cells, as the arrival time of the input data
pulse becomes closer to the arrival time of the clock pulse (i.e., DC becomes smaller), at rst
the delay of the output data pulse, namely the clock-to-Q delay (i.e., CQ), stays the same and
then starts to increase as shown in Figure 4.12.
Figure 4.13 shows the clock-to-Q delay of a DFF under dierent input delay conditions, i.e.,
dierent values of DC. On the left side is when the data input arrives much earlier than the
clock input (i.e., at least 7.4 ps before the clock), while on the right side is when the data input
arrives after the clock. As shown in Figure 4.13, the data input can arrive much earlier than the
clock, or close to the clock or even a little after the clock, and the DFF works correctly in terms
of logic value. However, as the data input arrives closer to the clock, the time at which the output
pulse will appear at the output of the cell will increase, namely the clock-to-Q, CQ, delay will
53
increase as shown in Figure 4.13. Essentially, this shows that the additional delay in the current
pipeline stage, which does not cause logic error, can bleed into the next stage. Also, such extra
delay may further increase the delay of the data pulse at the cell in the next stage, and so on. As
the number of stages increases, this accumulated delay eect can eventually cause an erroneous
logic value. Due to dierences between the internal structures of the cells, Figure 4.12 also shows
signicant cell-to-cell dierences in the nominal delays values as well as delay distributions due
to variations.
Figure 4.13: Amount of clock-to-Q delay under dierent input delay conditions for DFF cell.
A similar behavior is also observed in CMOS technology. In CMOS, when the data arrives
suciently before the clock, the corresponding clock-to-Q delay is dened as the nominal delay.
As DC reduces, the clock-to-Q delay starts to increase. When DC reduces to a level where,
the clock-to-Q delay increases by a predetermined amount (i.e., 10%) compared to the nominal
delay, then the corresponding DC (i.e., timing interval between arrival time of the data signal
and the clock) is dened as the setup time. In conventional timing analysis approaches, the setup
time denes the latest time when the data input is required to arrive. During the timing analysis,
for a given clock period, if the data input is found to arrive after the setup time of even one cell
54
in the entire circuit, then that clock period is determined as insucient. In this case, the clock
period needs to be increased and the whole circuit is required to run at a lower clock frequency.
However, as shown in Figure 4.13, even if the data input arrives late which causes the clock-
to-Q delay to increase by 30%, the cell can still produce the correct output logic value. In this
case, the output logic value is not aected, but the additional delay at the input of the cell bleeds
through the cell (i.e., pipeline stage) to its output. If the next cell can capture the pulse correctly
and produce the right value at its output, then the bleeding can continue along the path, and
we call this phenomenon timing bleed. If no cell in the circuit creates a logic error based on
the timing analysis, then the circuit can operate at the given clock period. The fact that the
conventional setup time of some cells are violated does not necessarily lead to the failure of the
circuit. Therefore, the conventional setup time constraint can be relaxed to improve
the performance of the circuit.
One of the major advantages of the conventional setup time approach is that it allows the
timing analysis method to partition the entire circuit into individual pipeline stages, and timing
analysis can be performed on those pipeline stages independently. By preventing the additional
delay to accumulate along a path, the complexity of the timing analysis is reduced, however, this
imposes a signicant performance overhead.
Figure 4.14: An experimental conguration used to measure the performance overhead of conven-
tional setup time constraint.
We then systematically studied the performance overhead of the conventional CMOS approach
using the circuit conguration shown in Figure 4.14. We changed the number of levels of combi-
national logic cells (namely splitters, since the splitter is the only combinational cell in the cell
library we use, i.e., the only cell which does not include a pipeline latch), between consecutive
55
ip-
ops (i.e., DFFs) and compute the performance overhead for each conguration. As shown in
Figure 4.15, as the number of combinational logic cells in a pipeline stage increases, the overhead
of imposing the conventional setup time constraint decreases. In conventional CMOS, there are
typically more than ten levels of combinational logic in a pipeline stage [29]. The setup time and
the clock-to-Q delay of the DFF only constitute a small portion of the overall clock period in
conventional CMOS circuits. Therefore, imposing this guard-band (i.e., setup time constraint)
does not compromise the performance of CMOS circuits by a large amount.
Figure 4.15: The performance overhead of conventional setup time constraint for pipelines with
dierent numbers of levels of combinational logic cells between consecutive
ip-
ops.
However, in RSFQ technology, every logic cell has an internal pipeline register, the setup time
of each cell becomes a major part of the clock period. Therefore, imposing the conventional setup
time constraint will dramatically impact the performance of the circuit as shown in Figure 4.15.
Further, if we want to avoid timing bleed entirely, to test that this is indeed the case in
a fabricated chip, we will need to scan the pipeline
ip-
op within every single RSFQ cell to
observe the existence of timing bleed. The overhead of such full-scan will be astronomical. Hence,
avoidance of such astronomical area overhead is another compelling reason to allow timing bleed
to propagate in RSFQ circuits, as long as it does not cause any logic errors.
56
Hence, to avoid extreme area and performance overheads, we are forced to allow timing bleed
through multi-cycle paths.
Finally, our approach of allowing timing bleed also has the benet of increasing at-speed yield.
This is due to the following observation: Just because one cell in a fabricated chip has a timing
bleed does not mean that the chip would fail, because the next cell may be a cell with low delay.
Hence, every chip with a timing bleed that is compensated by a low delay in the next cell will be
saved by our approach and hence increase at-speed yield.
4.2.2.2 Propagation of timing bleed
Based on delay propagation characteristics, we divide basic RSFQ logic cells into two categories,
namely non-inverting and inverting cells.
For non-inverting cells, as shown in Section 2.3.3, delay propagation occurs when an input
pattern causes a pulse at the cell output. For example, an AND gate produces a pulse at its
output when each of its two inputs has an SFQ pulse, each representing logic-1. As shown in
Figure 4.16(a), if the process variation causes the input X
1
of AND cell (g
i
) to be delayed by a
small amount, then there would be no logic error, but would cause a delayed output. However,
this additional delay in the AND cell (g
i
) propagates (bleeds) to the following stage which is
the OR cell (g
j
), and eventually causes logic error at the output of the OR cell: even though
the delayed pulse manages to propagate via the current cell logically correctly but with a higher
clock-to-Q delay, it may not be able to meet the timing of the next stage cell due to the cascaded
delays internal to the next cell. Alternatively, if the next cell has low delay value, then it may
eliminate the additional delay.
4.2.2.3 Isolation of timing bleed
For inverting cells, such as an invertor, a pulse at input does not propagate to output, hence the
propagation path is always terminated. As shown in Figure 4.17(a), if the input pulse is delayed
57
Figure 4.16: (a) Timing bleed via non-inverting cells. (b) Timing bleed via non-inverting cells is
masked by o-path.
Figure 4.17: (a) Timing bleed blocked at inverting cells. (b) Logic error at inverting cells
by a small amount, then no error would be generated, provided the input pulse can be captured
by the current clock and terminates the propagation of the pulse to the output. As shown in
Figure 4.17(b), if the input pulse is delayed by a large amount, then a logic error would occur.
In the second scenario, the absence of a timely pulse at the input creates a pulse at the output.
This creates a starting point of a new delay propagation path.
In summary, some multi-cycle delay paths in an RSFQ circuit are continuous from inputs to
outputs, some paths are broken, and some paths have starting points within the circuit. Specif-
ically, timing bleeds propagate via non-inverting cells along a path but are blocked at inverting
cells. (An XOR cell with an SFQ pulse at one input will continue to propagate timing bleed via
its output, provided its other input has no SFQ pulse during that clock; otherwise, the timing
bleed will be blocked at its output.) Logic behaviors of cells and their nominal delay values
are important determinants of sites of likely delay failures. Specically, long paths of high-delay
non-inverting cells are more susceptible to cascading delay failures.
58
4.2.3 Summary of the new phenomena
(1) Due to the large feature sizes, defects are less likely in RSFQ technology. However, process
variations and other non-idealities become the major causes of chip failures. (2) Because of the
quantized operation, most of the failures are timing failures. (3) Even though RSFQ circuits
are highly pipelined, in order to avoid huge performance and area overhead, we cannot directly
use conventional CMOS timing analysis or delay testing approaches. Timing bleed must be
allowed to propagate in RSFQ circuits. (4) However, timing bleed does not always propagate
from primary input to primary output, since the delay propagation paths with timing bleed pass
via non-inverting cells but terminate-and-restart at inverting cells.
59
Chapter 5
Static Timing Analysis with Timing bleed
One of the key advantages of RSFQ logic is that it provides a combination of high performance
and low power. It is hence crucial to develop methods and tools for design, verication, and
characterization that maximize performance.
In this chapter, we present a new method and a new tool for timing analysis that certify
maximum performance for RSFQ circuit modules (will dene ahead). RSFQ circuit modules
perform the bulk of the computation, as these span all functional units (such as arithmetic logic
units), decoders and encoders for microprocessors, the core logic associated with controllers, and
so on.
5.1 Motivation and Background
In CMOS (as well as in RSFQ), as the time interval between the data input and the clock
becomes smaller, clock-to-Q delay of a
ip-
op increases. In conventional timing analysis for
CMOS, setup time is dened as the interval where the corresponding clock-to-Q delay increases
by 10%. Further, in conventional static timing analysis (STA) for CMOS, the maximum delay
of every combinational logic block is required to satisfy this setup time requirement. If not, the
clock period is increased. While this approach sacrices some performance, this also dramatically
60
reduces the complexity of STA by partitioning large CMOS circuits into a number of much smaller
combinational logic blocks and performing STA independently for each combinational logic.
Motivation: As RSFQ is pipelined at gate-level, setup time is a signicant portion of the clock
period and makes the guard-band associated with the conventional timing constraint extremely
conservative. Figure 4.15 clearly illustrates that imposing such guard-band signicantly reduces
the performance of RSFQ logic circuits.
Since one of the main reasons for adopting RSFQ is its combination of high performance
and low power, the above observation provided us with a very strong motivation to develop new
methods for timing analysis that avoid conventional setup time constraints.
Background | Prior Research: We started with a review of the timing analysis methods
developed in the past to relax the conventional setup time constraints to improve the performance
of CMOS circuits [30] [31] [32]. These approaches relax the setup time constraints by building
models to capture the relationship among setup time, hold time, and clock-to-Q delay. Researchers
use an iterative approach to determine the minimal clock period which satises all setup and hold
constraints for a given circuit considered as a fully interconnected and interdependent system [30].
Researchers also proposed to use multiple timing models which capture the tradeo between
setup time, hold time, and clock-to-Q delay to provide more
exibility in order to recover lost
performance by considering dierent corners as well as dierent operating modes [31].
Eectiveness: In CMOS circuits, these approaches reduced the clock period by 3-4% [31]. This
is consistent with Figure 4.15, which illustrates that the eectiveness is relatively small in CMOS
since, due to power constraints, large CMOS circuits are comprised of registers (sets of
ip-
ops)
and combinational logic blocks (CLBs) with ten or more levels of combinational gates [29].
Complexity: The above methods have high run-time complexity for CMOS. First, these can
provide lower clock periods only when applied to circuits that span registers and CLBs. Hence,
in contrast to the conventional timing analysis approaches, these methods cannot be applied
61
independently to each individual CLB and hence cannot reduce complexity via such divide-and-
conquer. Second, this causes the circuits to which these methods are applied to include feedback
loops. In turn, this necessitates these methods to use multiple iterations and hence increases
complexity. Third, in CMOS timing analysis, delays are functions of two timing parameters of
each signal transition (between one pattern and the next), namely its arrival time (typically, the
time when the voltage reaches halfway to the nal value) and transition time (i.e., the rise or
fall time of a transition, a measure of the slope of the voltage-vs-time transition). In CMOS, the
nature of this dependence between the delay and the arrival and transition times is such that the
level of relaxation of setup time at a gate that minimizes the clock period (for the entire circuit)
can only be determined via search.
Due to this combination of small reduction in clock period and high run-time complexity, for
CMOS these methods are not widely used. However, as we describe next, we extend and modify
these methods due to unique characteristics of RSFQ logic and enable much greater reductions
in clock period at much lower run-time complexities.
5.2 Key Ideas and Problem Statement
As mentioned earlier, RSFQ logic is pipelined at gate-level and hence new methods for timing
analysis that generalize beyond the classical setup time constraints will signicantly improve clock
period. Also, new methods must be developed to tackle many other characteristics of RSFQ,
especially single pattern delay excitation and propagation, blockage of timing bleed at inverting
cells, the fact that delays in RSFQ are functions of only the arrival time [3] [28] [33], and that the
denition of hold time is RSFQ is still under debate [34] [28] [35] and not typically addressed by
timing analysis methods.
62
Next, we present our key ideas that address both the lessons of the above methods for CMOS
and the unique characteristics of RSFQ to develop a highly eective and yet low-complexity timing
analysis method for RSFQ.
5.2.1 Circuit model and divide and conquer
As shown in Figure 5.1, RSFQ circuit modules (e.g., a full-adder) have architectural registers
at their primary inputs and outputs. Due to the gate-level pipelined nature of RSFQ technology,
the logic cells within the circuit module also have internal registers and the functional unit
within the module is fully path-balanced [36]. Further, all feedback loops pass via architectural
registers, therefore a RSFQ circuit module (such an adder or a multiplier) is an acyclic logic block
that is pipelined at the gate level. These characteristics of RSFQ logic provide us the rst key
component of our new problem statement.
Figure 5.1: The schematic of a RSFQ Full-Adder.
Problem statement (part-1): Our new timing analysis method targets individual RSFQ
circuit modules, where RSFQ circuit modules are obtained by partitioning a large RSFQ logic
circuit at architectural registers (see Figure 5.1). We achieve this by using the conventional setup
63
time approach at architectural registers, while we allow timing bleed at internal registers within
each RSFQ circuit module (i.e., the pipelined registers within logic gates).
As each RSFQ circuit module is a ne-grained pipeline that is also acyclic and path-balanced,
allowing timing bleed within the circuit modules provides signicant decrease in clock period. At
the same time, this dramatically reduces run-time complexity as it allows us to divide a large
RSFQ logic circuit into smaller RSFQ circuit modules (which correspond to functional units, such
as arithmetic logic units) and architectural registers and perform timing analysis on each RSFQ
circuit module independently. Further, since each RSFQ circuit module is free of feedback, no
iterations are required.
This also avoids the astronomical area overhead for design-for-testability (DFT) as described
in Section 4.2.2.1 by allowing us to scan architectural registers but not the internal registers.
5.2.2 Unique characteristics of RSFQ delays
Within an individual RSFQ circuit module, the paths for timing bleed propagate via non-inverting
cells but are blocked at inverting cells. This is in contrast to a CMOS logic block where all delay
paths go from register to register.
Problem statement (part-2): Our timing analysis develops new methods to consider timing
bleed propagation via non-inverting cells but blockage and restart at inverting cells.
Finally, the remaining characteristics of RSFQ enable us to streamline the methods developed
for CMOS.
Problem statement (part-3): Our timing analysis method streamlines the methods for com-
putation of delays to take advantage of three key characteristics of RSFQ logic, namely single
pattern delay excitation and propagation, the fact that delays in RSFQ are functions of only the
arrival time [3] [28] [33], and that the denition of hold time is RSFQ is still under debate and
hence not typically addressed by timing analysis methods (more ahead).
64
The denition of hold time in RSFQ technology is still under debate. Since RSFQ is a pulse
based logic (i.e., return to zero logic), hold time can be dened as the minimum time that must be
preserved to allow sucient relaxation of the RSFQ cell before processing the next pulse [34] [35].
The hold time can also be dened as the timing constraint for the data pulse not to race through
the current clock cycle [28]. Therefore, none of the previous CMOS approaches can be directly
applied to RSFQ technology.
In this chapter, we dene setup time based on the probability of causing a logic error and
develop a new STA method for individual RSFQ circuit modules (analogous to CLBs in CMOS
circuits) that properly handles the timing bleed along a multi-cycle path of cells with dierent
timing characteristics. With our extensions and simplications, we develop a solution which is
provably optimal at polynomial time complexity. In the next section, we describe our proposed
new STA approach and present the unique properties of RSFQ cells identied and used by our
STA. Theoretical results are presented in Section 5.4. In Section 5.5, we present results of simula-
tions for benchmark circuits with process variations to demonstrate that our new method certies
much higher speeds for RSFQ logic.
5.3 Proposed approach
In order to develop an STA approach which allows timing bleed, a timing library which captures
the timing characteristics of the cells under additional input delays is required. In other words,
we need to characterize each cell in a given cell library with the knowledge of timing bleed and
then develop a timing library based on the cell characterization results. Once a timing library is
developed, it can then be used as an input to our proposed STA approach.
65
5.3.1 Cell timing characterization with timing bleed
The cell library used in this chapter is the same generic RSFQ cell library [4] used in Chapter 3.
All the cells in this library have built-in Passive Transmission Line (PTL) drivers and receivers to
achieve maximum isolation between cells when connected together to build logic circuits. We also
use the same simulation environment of the cell characterization process as described in Chapter 3.
Recall that, even though the shape of the signal can vary greatly, in our approach, the delay is
dened as the time interval between when the input pulse reaches 50% to when the output pulse
reaches 50% [3].
Dening soft and hard setup times: As described in Section 4.2.2.1, when the data input
of a cell is delayed, the cell may create a logic error or have a larger than normal clock-to-Q delay
(i.e., timing bleed). However, imposing the conventional setup time to guard-band the increased
clock-to-Q delay will dramatically decrease the performance of RSFQ circuits. Therefore, we
propose to relax the conventional setup time constraint by dening a soft setup time and a
hard setup time.
Figure 5.2: Amount of clock-to-Q delay of DFF under dierent input delay conditions.
As shown in Figure 5.2, we dene the soft setup time as the largest DC (note that the
largest DC occurs for the earliest arrival of data) where the cell starts to experience a larger
66
than normal clock-to-Q delay. We also dene the hard setup time as the largest DC where
the cell generates a logic error and the cell would fail. Between the soft setup time and the hard
setup time, the cell produces logically correct response but the timing is aected and timing bleed
occurs. Therefore, as DC changes the corresponding CQ also changes and causes timing
bleed. In order to capture the newly dened setup times as well as the timing behavior of the cell
under nominal input conditions and delayed input conditions, we need to characterize the cells in
a given cell library under various input conditions.
In summary, for a particular cell, as shown in Figure 5.2, a conventional clock-to-Q delay (i.e.,
CQ
conv
) is dened at exactly the 10% mark. We then dene a soft setup time, before which
the clock-to-Q delay is the normal value (i.e., CQ
norm
). A hard setup time is also dened, after
which the cell would have a logic error. For the entire region between soft setup time and hard
setup time, CQ is adjusted based on DC and denoted by
g
CQ. CQ
conv
and CQ
norm
are
specic values for each cell while
g
CQ is a lookup table based on the curve shown in Figure 5.2.
We use CQ to denote the clock-to-Q delay of a cell which could be any of the three values (i.e.,
CQ
norm
, CQ
conv
, and
g
CQ).
Cell characterization: Since, in RSFQ, delay is only associated with logic-1, to capture the
clock-to-Q delay of a given cell, we simulated the cell using all possible input patterns which cause
a logic-1 at the output of the cell. For example, for the OR2 cell, we simulate input patterns (11),
(01), and (10), while for the AND2 cell, we only simulate the (11) input pattern.
Because process variation has a rst order impact on timing of the cell [37], we not only
characterize the timing behavior of the cells under nominal conditions but also perform the char-
acterization under process variations. Characterizations of other non-idealities such as biasing
current redistribution are subject to future research.
For a given type of cell, we create a comprehensive set of cell instances based on dierent
amounts of variation. (We use the model of variations used in [5] and hence use values of 3%
and 5% for JJ junction area and inductance values, respectively.) Then for each cell instance
67
Figure 5.3: Cell timing characterization results for an instance of OR2 cell.
and each specic input pattern, we perform a comprehensive set of simulations to capture the
timing characteristic of the cell such as the one shown in Figure 5.2. We adjust the timing interval
between the data input and the clock input to identify the soft setup time and hard setup time.
In addition to the normal clock-to-Q delay value (i.e., CQ
norm
), we also capture all the other
clock-to-Q delay values when the data input arrives between the soft setup time and hard setup
time. In other words, a curve similar to Figure 5.2 is generated for each variation instance of each
type of cell.
For cells with two inputs such as an OR2 cell, we not only characterize signal-to-clock timing
behavior, but also capture signal-to-signal timing behavior such as the one shown in Figure 5.3.
Our cell characterization results show that the property of timing bleed depends on the polarity
of the cell.
68
For non-inverting cells: If the time interval between the arrival time of data and clock is
larger than the soft setup time, then the cell produces correct output logic value with a normal
clock-to-Q delay (i.e., there is no timing bleed at the output of the cell). If this timing interval
is smaller than the hard setup time, then the cell produces a logic error. If the timing interval is
smaller than the soft setup time but larger than the hard setup time, then timing bleed occurs
at the cell. In summary, depending on the timing interval between data and clock, the cell (1)
may produce the correct logic value without having an increased clock-to-Q delay; (2) may have
an increased clock-to-Q delay; or (3) may have a logic error (see Figure 5.2).
For inverting cells: If the logic-1 at the input is captured correctly by the clock (i.e., arrives
before the hard setup time), then a logic-0 is generated at the output of the cell. Since there is
no delay associated with logic-0, the signal is always on time (i.e., no timing bleed at the output
of the cell). If the logic-1 misses the clock (i.e., arrives after the hard setup time), then the cell
produces a logic error. If the input is a logic-0 then the signal is always on time, hence the
output clock-to-Q delay does not need to be adjusted. In summary, due to the unique behavior
of inverting cell, the cell (1) may produce the correct logic value without having an increased
clock-to-Q delay; (2) may have a logic error.
Therefore, timing bleed can propagate via non-inverting cells, but the propagation of timing
bleed is blocked at inverting cells. Detailed description of this phenomenon can be found in [37].
5.3.2 Timing library extraction for our STA approach
Timing library is an essential input for our STA approach. Based on our timing characterization
results, we create a timing library to capture the timing characteristics of each cell in the current
cell library.
For a given type of cell, there are many dierent instances due to process variations. We have
two dierent approaches for handling process variations. The rst approach is for each individual
instance, we take the corresponding process variation, and use the cell characterization result
69
for that specic instance. In this approach, the timing analysis results are valid for an instance
by instance analysis. In the other way, we focus on the worst case process variation over the
population of all the instances, we characterize the delay models for every cell for its worst case.
Then the values captured in the delay model used in the timing analysis are valid for the entire
population of instances due to those process variations.
Therefore, for the rst type of analysis, we extract a timing model for each variation instance
of the cell. Then for the second type of analysis, we combine all the variation instances to come
up with a timing model that captures the worst case time characteristic for the entire population
of cell instance in the presence of process variations. In this timing model, for a given type of
cell, we use the largest soft setup time among all its variation instances as the soft setup time for
that cell. Similarly, the maximum adjusted clock-to-Q delay among all instances is used for each
DC. The hard setup time is also computed by taking the maximum value of the hard setup
time of all variation instances.
Inverting cell: For cells such as an INV, because the propagation of timing bleed is blocked,
we only need to capture its CQ
norm
and hard setup time. The conventional setup time is also
not applicable to an inverting cell.
Non-inverting cell: For cells such as a DFF, besides its CQ
norm
and hard setup time, we
also compute a soft setup time and a lookup table to capture its timing characteristic when DC
is between soft setup time and hard setup time. Our timing model assumes that
g
CQ (i.e.,
the adjusted clock-to-Q) increases monotonically as DC decreases from soft setup time to hard
setup time. For each DC between the soft setup time and the hard setup time, we compute
a corresponding
g
CQ. As shown in Figure 5.4, the actual clock-to-Q delay is shown as orange
dots, there are very small number of instances where there is very slight non-monotonicity (e.g.,
when DC is around 3.9ps). In those cases, we over-approximate the delay to smooth out the
non-monotonicity with the monotonic curve shown as the black solid line.
70
Because in static timing analysis, no pattern information is available. Therefore, for cells
with more than one input, in order to create a conservative static timing model, we capture the
maximum adjusted clock-to-Q delay among all patterns for each DC.
Figure 5.4: Timing characteristics of an instance of DFF cell.
5.3.3 Static timing analysis with timing bleed
As described above, our STA approach targets each RSFQ circuit module independently. The
inputs of RSFQ circuit module are driven by, and its outputs drives, architectural registers.
Conventional setup time is used at architectural registers. Each RSFQ circuit module is an
acyclic, path-balance, n-grained pipeline. Within the circuit module, our STA method allows
timing bleed by relaxing the conventional setup time requirements.
5.3.3.1 Levelization
For a certain cell, if the soft setup time is violated, then the clock-to-Q delay of that cell will
increase. Due to the potential increase of clock-to-Q delay, we cannot perform the timing analysis
for the second stage in a RSFQ circuit module unless we complete the analysis for the rst stage,
71
therefore, we need to analyze multi-cycle paths. Since in our circuit model, we assume that
conventional setup time constraint is used at architectural registers, typically primary inputs and
primary outputs of functional modules, a multi-cycle path starts at a primary input of a RSFQ
circuit module and ends at one of its primary outputs. Therefore, the initial step of our timing
analysis is to levelize a given circuit using the algorithm shown in Algorithm 1 [38].
Algorithm 1: An outline of levelization algorithm [38] of our RSFQ circuit module
1 Phase - Initialization
2 Initialize level=unknown to every cell in the given circuit
3 Pass - 0
4 Assign level=0 to for every cell that contains a primary input
5 Pass - i
6 repeat
7 foreach cell c that is assigned a new level j in the previous pass do
8 foreach fanout f of cell c do
9 if f is a combinational logic then
10 Assign level j to f
11 end
12 else if f is a sequntial logic then
13 Assign level j+1 to f
14 end
15 end
16 end
17 until no change is made during the iteration or all the cells that contain a primary
output have been levelized;
18 Phase - Check
19 If all cells in the circuit have been assigned a level and the maximum level does not
exceed the total number of cells in the circuit, then the levelization is completed
successfully, otherwise, the levelization fails.
5.3.3.2 Timing analysis with a given clock period
The STA algorithm shown in Algorithm 2 uses all the three setup times dened in 5.3.1. The goal
of the algorithm is to identify whether or not a given clock period, , will cause timing failures
for any cell in the circuit. There are two types of timing failures. One type of timing failure
occurs when the arrival time of a data input violates the hard setup time of a cell. In this
case, a logic error would be generated at the output of that cell. Another type of timing failure
is specically related to the cells belong to the last pipeline stage (i.e., the primary output cells).
72
Because the primary output cells are architectural registers, conventional setup time constraint
is used at those cells. Therefore, if the arrival time of a data input violates the conventional
setup time of that cell, it would cause a timing failure. This only applies to non-inverting cells,
because inverting cells do not have conventional setup time.
Algorithm 2: An outline of our STA algorithm (for RSFQ circuit module)
1 Phase - Initialization
2 Levelize the given RSFQ circuit module and save the maximum level as L
max
3 Initialize current level L to 0 (i.e., primary input level)
4 Initialize the arrival time of all primary output signals to None
5 Initialize all primary input signals to arrive at conventional setup time
6 Phase - Timing analysis
7 foreach level L from 0 to L
max
do
8 repeat
9 foreach fanin x
i
of cell c do
10 if arrival time of the signal at x
i
violates the hard setup time of c then
11 The arrival time of the output signal of c cannot be computed
12 Terminate the analysis and report that the given is insucient for the
given circuit
13 No arrival time of any output signal is reported
14 end
15 end
16 if c is non-inverting cell then
17 if arrival time of the signal at x
i
violates the soft setup time of c then
18 Compute the adjusted CQ (i.e.,
g
CQ) of c based on the arrival time of
the input signal(s) (i.e., DC).
19 end
20 end
21 Compute the arrival time of the output signal(s) of c based on its CQ
22 until the current level L reaches the primary output level;
23 end
24 Phase - Check boundary condition
25 foreach cell c that belongs to primary output level do
26 if arrival time of input signal of c violates the conventional setup time of c then
27 Terminate the analysis and report that the given is insucient for the given
circuit
28 No arrival time of any output signal is reported
29 end
30 end
31 Terminate the analysis and report that the given is sucient for the given circuit
In our approach, since the conventional setup time constraint is used at architectural registers,
in order to analyze the worst-case, all the primary input signals of our RSFQ circuit module are
73
assumed to arrive at the conventional setup time of the cells. Then, based on the given clock
period, for each cell, we compute the arrival time of the clock signal and calculate the timing
interval between the data input and the clock input.
As described in Section 5.3.1, the cells are divided into two categories based on their timing
characteristics, namely inverting and non-inverting.
For Inverting cells: As shown in Algorithm 2, the analysis is only terminated when the hard
setup time is violated, otherwise the CQ
norm
will be used to continue our analysis.
For non-inverting cells: Same as the inverting cells, if the hard setup time is violated, then a
timing failure is identied at the cell and the timing analysis terminates for the given clock period.
However, as shown in Algorithm 2, there are additional concerns for non-inverting cells. If data
input only violates the soft setup time, then we adjust the clock-to-Q delay of the cell based its
lookup table and use the adjusted clock-to-Q delay to continue our timing analysis. If the signal
arrives before the soft setup time, then the CQ
norm
will be used to continue the analysis.
Our timing analysis proceeds level by level, if a hard setup time violation (i.e., logic error) is
found at any cell, then the given clock period is not large enough to operate the circuit. If our
timing analysis nishes without identifying any hard setup time violation at any cell, then the
circuit is guaranteed to provide correct logic response when operated at the given clock period.
Our approach then checks the boundary conditions for the primary output cells to ensure that
no conventional setup time violation occurs since conventional setup time constraint is used at
those cells. The input signals of those cells are deemed to have no timing failure if and only if the
signals arrive before the conventional setup time. If no timing failure is detected at any cell, then
our approach reports that the given clock period is suitable for the particular circuit.
In summary, our STA approach will either identify a timing failure or deem that the given
clock period is sucient for the circuit. In the latter case, it will also report the arrival time of
the output signal for each cell in the circuit.
74
5.3.3.3 Identifying minimum clock period
For a particular circuit and a given clock period, our STA approach is able to identify if the clock
period can cause any timing failures in the circuit. Therefore, by conducting a search for a range
of clock periods, our approach can identify the minimum clock period for that particular circuit.
If the current clock period causes a timing failure, then we increase the clock period and perform
the timing analysis again. If the current clock period does not lead to any timing failures, then
we decrease the clock period to nd a smaller one for the given circuit.
According to Theorem 3 and Theorem 4 shown in Section 5.4, once a certain clock period
is identied as insucient for the given circuit, then all the clock periods that are smaller than
can also be shown to be insucient for that circuit. On the other hand, if a given clock period
is identied as sucient for the given circuit, then all the clock periods that are larger than can
also be shown to be sucient for that circuit. Therefore, faster search methods such as binary
search can be used to reduce the time complexity of this version of our STA approach.
5.4 Proofs of Correctness and Optimality
In this section, we derive formal proofs for claim that for a given RSFQ circuit module, the
minimum clock period identied by our STA approach is guaranteed to produce correct logic
responses for all possible input patterns without any timing failures.
We start with the conditions that the given RSFQ circuit module and the cells under study
satisfy. The timing model provided by the cell characterization approach described in Section 5.3.1
guarantees the following three properties: (1) If a certain timing condition can cause logic failure,
then the cell characterization approach used would identify it. (2) The cell characterization
approach captures the worst-case delay values for the conditions. (3) As DC decreases, the
clock-to-Q delay of the cell increases monotonically, namely as we decrease t
interval
the clock-to-
Q delay of that cell increases monotonically (see Figure 5.4). Our circuit model also guarantees
75
that the target RSFQ circuit module is fully path-balanced and acyclic, and hence can be levelized
by our levelization algorithm.
We also assume that the clock skew is arbitrary, but xed and independent of the data signals
applied to any cell in the circuit.
In addition to the above assumptions, the following assumptions, also used in all other STA
approaches for RSFQ, are also adopted by our approach, namely the abstractions used to capture
delay values and the additive property of the delay values. As described in Section 5.3.1, only the
arrival time of the pulse is used to represent the delay property of the signal, all other properties
of the waveform are not used in the cell characterization process and therefore are not carried into
the timing analysis domain. The other assumption is the additive property of the delay values,
namely the delay of each component along a path can be added to determine the propagation
delay of a pulse along the path [39] [40].
5.4.1 Correctness of our approach
In this section, we focus on proving the robustness of our approach, namely if our approach
identies a certain clock period to be suitable for a given RSFQ circuit module, then the circuit
would not have any logic error or timing failure when operating at . Recall that our method
either reports failure (i.e., reports that the clock period is insucient) or computes the arrival
time. In Lemma 1 and Lemma 2, we focus on proving the properties of the identied arrival time
of the output signals. We then use these properties of arrival time to show the property of the
clock period in Theorem 3.
Lemma 1 If our approach can identify an arrival time for a particular signal, then for all possible
patterns, there is no logic error for that signal and the identied arrival time is guaranteed to be
the latest.
76
Figure 5.5: Timing diagram of a two consecutive pipeline stages.
Proof: Because our circuit under study is fully path-balanced and levelized, this we prove this
Lemma via induction.
Base step: When the circuit only has two pipeline stages as shown in Figure 5.5(a), given by
conventional setup time being satised at the architectural register, the arrival time of the signal
T
data
1
at primary input of RSFQ logic block is always on time, therefore, cell
1
is guaranteed to
produce a logically correct output response and the arrival time of the T
output
1
signal is dened
by Eq. 5.1, where CQ
conv
1
is the corresponding conventional clock-to-Q delay of cell
1
, and T
clk
1
is the arrival time of the clock signal at cell
1
.
T
output
1
=T
clk
1
+ CQ
conv
1
(5.1)
The arrival time of the input signal of cell
2
(i.e., T
data
2
) and the timing interval between the
data input and the clock input of cell
2
(i.e., DC
2
) is given by Eq. 5.2, where
skew
1;2
is the
arbitrary but xed clock skew (i.e., independent of data value) between cell
1
and cell
2
, and is
the clock period.
77
T
data
2
=T
clk
1
+ CQ
conv
1
+
comb
1;2
DC
2
=T
clk
2
T
data
2
=T
clk
2
T
clk
1
CQ
conv
1
comb
1;2
=
skew
1;2
+ CQ
conv
1
comb
1;2
(5.2)
Because the timing library we used in our approach guarantees that the clock-to-Q delay (i.e.,
CQ
1
) is the worst-case delay (i.e., the largest) across all possible patterns, the DC
2
computed
in our approach is the minimum DC for cell
2
.
For both inverting cells and non-inverting cells, the DC
2
must meet the hard setup time
constraint of cell
2
, otherwise a logic error would occur at the output of cell
2
. If DC
2
is smaller
than the hard setup time of cell
2
, then a logic error is identied by our approach and clock-to-Q
delay of cell
2
(i.e., CQ
2
) will not be computed and our timing analysis algorithm will report
that the clock period is insucient (as described above).
As shown in Algorithm 2, if the hard setup time is not violated, then for inverting cells, the
timing bleed propagation is blocked. A new delay propagation path is started at the output of
cell
2
, and CQ
2
is the normal clock-to-Q delay of cell
2
. However, for non-inverting cells, even
if the hard setup time is met, there are additional considerations. If DC
2
is larger than the
hard setup time of cell
2
but smaller than the soft setup time of cell
2
, then there would be no
logic error, but timing bleed would occur. In this case, CQ
2
is adjusted according to the lookup
table in the timing library (which captures the curve of the delay shown in Figure 5.4). Since
g
CQ
2
, the adjusted CQ
2
increases monotonically as DC
2
decreases, by using the minimum
DC
2
identied by our approach, we identify the maximum
g
CQ
2
. On the other hand, if DC
2
is larger than the soft setup time of cell
2
, then there would be no logic error, and CQ
2
is the
normal clock-to-Q delay of cell
2
(i.e., CQ
norm
2
).
78
In summary, for both non-inverting cells and inverting cells, if our approach can identify a
clock-to-Q delay for cell
2
, then for all possible patterns, there is no logic error associated with
cell
2
and the identied clock-to-Q delay (i.e., CQ
2
) is guaranteed to be the largest.
The arrival time of the output signal of cell
2
(i.e., T
output
2
) is determined by CQ
2
and the
arrival time of its clock T
clk
2
. Since the T
clk
2
in the presence of skew is assumed to be arbitrary,
but independent of the data value, the maximum CQ
2
is equivalent to the maximum T
output
2
.
Hence, if our approach can compute a T
output
2
, then for all possible patterns, there is no logic
error for the output signal of cell
2
and computed T
output
2
is guaranteed to be the latest.
Induction step: The proof is essentially the same as the base step except for that in the base
case, the clock-to-Q delay ofcell
1
is always the conventional delay value (i.e., CQ
conv
1
), however,
in the induction step, the clock-to-Q delay of cell
k
may not be the conventional value, it may be
adjusted due to timing bleed occurring in previous stages. Therefore, the equations are similar to
the base step, the only dierence is that the CQ
k
may change.
Assume the property of our approach holds for a pipeline withk stages, namely if our approach
identies an arrival time of the output signal of cell
k
, then for all possible input patterns, there
would be no logic error at the output of cell
k
and the computed CQ
k
as well asT
output
k
are the
largest. Then for a pipeline withk + 1 stages, the last two stages are shown in Figure 5.5(b). The
arrival time of the input signal of cell
k+1
(i.e., T
data
k+1
) and the timing interval between the data
input and the clock input ofcell
k+1
(i.e., DC
k+1
) can then be derived by Eq. 5.3, where
skew
k;k+1
is the arbitrary but xed clock skew between cell
k
and cell
k+1
.
T
data
k+1
=T
clk
k
+ CQ
k
+
comb
k;k+1
DC
k+1
=T
clk
k+1
T
data
k+1
=T
clk
k+1
T
clk
k
CQ
k
comb
k;k+1
=
skew
k;k+1
+ CQ
k
comb
k;k+1
(5.3)
79
Since CQ
k
is assumed to be the maximum over all possible input patterns, DC
k+1
is
minimized. According to a similar analysis described in the base step, if aT
output
k+1
is identied by
our approach, then for all possible patterns, no logic error would occur and the computed T
output
k+1
is guaranteed to be the largest.
Lemma 2 In our timing analysis, for a given RSFQ circuit module, if clock period
2
is greater
than clock period
1
, then for each cell, cell
i
, in the given circuit, if a clock-to-Q delay CQ
i;1
can be identied for cell
i
using
1
without having a logic error at cell
i
, then a clock-to-Q delay
CQ
i;2
can also be identied for cell
i
using
2
without having a logic error at cell
i
. In addition,
CQ
i;2
is smaller than or equal to CQ
i;1
.
Proof: Since the circuit is levelized and fully path-balanced, we prove this Lemma by induction.
First pipeline stage: Since we assume that the primary input signal always arrives at the con-
ventional setup time of the corresponding cell (i.e.,cell
1
), the clock-to-Q delay ofcell
1
is CQ
conv
1
.
Therefore, for the given circuit, when clocked at
1
, the clock-to-Q delay (i.e., CQ
conv
1;1
) is the
same as the one (i.e., CQ
conv
1;2
) when the circuit is clocked at
2
.
Second pipeline stage: For clock period
1
and
2
, the corresponding timing interval between
the input signal of cell
2
and the clock signal of cell
2
can be derived by Eq. 5.4 for the two clock
periods
1
and
2
respectively.
DC
2;1
=T
clk
2;1
T
data
2;1
=T
clk
2;1
T
clk
1;1
CQ
conv
1;1
comb
1;2
=
skew
1;2
+
1
CQ
conv
1;1
comb
1;2
DC
2;2
=T
clk
2
T
data
2;2
=T
clk
2;2
T
clk
1;2
CQ
conv
1;2
comb
1;2
=
skew
1;2
+
2
CQ
conv
1;2
comb
1;2
(5.4)
80
According to the previous step, CQ
conv
1;2
is the same as CQ
conv
1;1
. Since
1
is smaller than
2
,
DC
2;1
is smaller than DC
2;2
. Because our approach determined that there is no logic error
at the output of cell
2
when clocked at
1
, hence DC
2;1
must be larger than the hard setup
time of cell
2
. Therefore, DC
2;2
is also larger than the hard setup time of cell
2
. As a result,
there is no logic error at the output of cell
2
when clocked at
2
. In addition, since the clock-to-Q
delay increases monotonically as the timing interval decreases, CQ
2;2
is smaller than or equal
to CQ
2;1
.
Induction step: Assume that this holds for a pipeline with k stages. Then, for a pipeline with
k +1 stages, if CQ
k;1
can be identied without logic error, then CQ
k;2
can also be computed
by our approach and CQ
k;2
is smaller than or equal to CQ
k;1
. The timing interval forcell
k+1
can then be derived by Eq. 5.5 for clock period
1
and
2
. The proof for induction step is similar
to the base step, except for the proof in base step, CQ
conv
1;1
is equal to CQ
conv
1;2
, however, in
induction step, CQ
k;2
may be smaller than or equal to CQ
k;1
.
DC
k+1;1
=T
clk
k+1;1
T
data
k+1;1
=T
clk
k+1;1
T
clk
k;1
CQ
k;1
comb
k;k+1
=
skew
k;k+1
+
1
CQ
k;1
comb
k;k+1
DC
k+1;2
=T
clk
k+1;2
T
data
k+1;2
=T
clk
k+1;2
T
clk
k;2
CQ
k;2
comb
k;k+1
=
skew
k;k+1
+
2
CQ
k;2
comb
k;k+1
(5.5)
For the k + 1 stage, since our approach identies a valid CQ
k+1;1
, DC
k+1;1
must be larger
than the hard setup time of cell
k+1
. Because CQ
k;2
is smaller than or equal to CQ
k;1
and
2
is larger than
1
, DC
k+1;2
is larger than DC
k+1;1
. Therefore, DC
k+1;2
also satises
the hard setup time constraint ofcell
k+1
. Furthermore, due to the monotone decreasing property
81
of clock-to-Q delay with respect to the timing interval, CQ
k+1;2
is smaller than or equal to
CQ
k+1;1
.
Hence, by induction, for each cellcell
i
in the circuit, if our approach nds that a valid CQ
i;1
exists for
1
, then CQ
i;2
can also be computed for that cell and CQ
i;2
is smaller than or
equal to CQ
i;1
.
Theorem 3 For a given RSFQ circuit modulec and a given clock period, if passes our timing
analysis then c is guaranteed to produce correct logic responses for all possible input patterns
without timing failures when clocked any clock period
1
greater than or equal to .
Recall that in our timing analysis approach, in order to determine whether a given clock
period works for a certain circuit, we rst check if there is any logic error at the output of any
cell. If our timing analysis nishes without detecting any logic error, we then check if there is any
conventional setup time violation for the cells in the last pipeline stage. This is because in our
circuit model described in Section 5.2.1, the cells in the last pipeline stage are the architectural
registers and conventional setup time constraint is used at architectural registers. Therefore, for
each cell that belongs to the last pipeline stage (i.e., cells with the highest level), we not only
check whether there is logic error or not, we also check if the conventional setup time of that cell
is violated.
Proof: We consider two separate cases, rst we demonstrate for the cells within the block of
logic (i.e., the internal cells), the second case is the boundary conditions which are the architectural
registers driven by the output of the block of logic.
Internal cells: Our timing analysis approach proceeds level by level. Whenever a logic error
is detected, our analysis stops. Therefore, if the arrival time of all the primary output signals can
be computed for a certain circuit c under the given clock period , then according to Lemma 1,
there would be no logic error for any output signal for all possible input patterns. In addition,
82
according to Lemma 2, when circuitc is clocked at any clock period
1
that is larger than, there
would also be no logic error for any output signal for all possible input patterns.
According to Lemma 1, the arrival time of the signal computed by our approach is the largest
across all possible patterns. If the arrival time identied by our approach using clock period
satises the conventional setup time of the cells in the last pipeline stage, then those cell are
guaranteed to meet the conventional setup time constraint for all possible patterns when the
circuit is clocked at .
Primary output cells: In addition, according to Lemma 2, if
1
is greater than , then for
each cell, the clock-to-Q delay identied using
1
(i.e., CQ
i;1
) is smaller than or equal to the
one identied by (i.e., CQ
i;
). As a result, the arrival time of the signals computed using
1
is smaller than or equal to the ones computed using . Therefore, if no conventional setup time
violation is detected for any primary output cells for, then
1
should also not cause conventional
setup time violation for those cells.
Hence, for a certain circuit c, if our approach nishes the timing analysis for clock period
without detecting any logic error at the output of any cell or detecting any conventional setup
time violation at any primary output cells, thenc is guaranteed to produce correct logic responses
without any timing violations for all possible input patterns when clocked at any clock period
that is greater than or equal to .
5.4.2 Optimality of our approach
If a given circuit design has a target clock period and the goal is to verify whether the RSFQ
circuit module operates correctly at , then our timing analysis approach can perform timing
verication using. However, if the goal is to identify the minimum clock period
min
at which a
given circuit would operate correctly, then we need to use an appropriate search method to nd
the
min
. In order to use fast search methods such as binary search, the following Theorem must
be proven.
83
Theorem 4 For a given RSFQ circuit module c, if our timing analysis approach detects logic
error or timing failure for clock , then our approach would detect logic error or timing failure for
any clock period that is smaller than .
Proof: We prove this Theorem by contradiction. Assume that for circuit c, our approach
identies logic error for clock period but does not nd any logic error for clock period
1
which
is smaller than. Then, there must exists at least one cell in the circuit c for which our approach
cannot nd a valid clock-to-Q delay for but can compute a clock-to-Q delay for
1
. However,
this contradicts with Lemma 2. Therefore, if our approach detects logic error or timing failure for
clock period , then any clock period that is smaller than will also fail in our approach.
According to Theorem 3 and Theorem 4, binary search can be used to identify the minimum
clock period for a particular circuit. We then show the optimality of the minimum clock pe-
riod identied by our approach by comparing it with two dierent STA approaches, namely the
conventional approach and any other STA approach with timing bleed. Recall that conventional
approach is the one that only uses conventional setup time and a xed CQ.
Theorem 5 For a given RSFQ circuit module and a certain clock period , which passes the
conventional STA approach, if the same cell characterization result is used, would also pass our
STA approach and all the arrival times computed by our approach are guaranteed to be smaller
than or equal to the one computed by the conventional STA approach.
Proof: Since the clock signal is assumed to be arbitrary, but xed and independent of the data
signal, the arrival time of the signal at the output of a cell is equivalent to its clock-to-Q delay.
Therefore, we prove this Theorem in terms of clock-to-Q delay of the cells. We rst show that
for a certain clock period , if the conventional approach can compute a valid clock-to-Q delay
for each cell in the circuit, then our approach would also be able to compute clock-to-Q delay
for all cells in the circuit. Recall that in Section 5.3.1, we show that the conventional setup time
is always larger than or equal to our hard setup time. Because the same cell characterization
84
result is used for both approaches, if passes the conventional STA approach, then for any cell
in the circuit, the DC between its clock input and data input is larger than the conventional
setup time which guarantees that the DC is also larger than the hard setup time of that cell.
Therefore, if passes the conventional STA approach and a valid clock-to-Q delay is computed
for each cell in the circuit, our approach would also be able to compute a clock-to-Q delay for
each cell.
We then show that the clock-to-Q delay computed by our approach is always smaller than or
equal to the one identied by conventional STA approach. If a signal arrives early, namely if DC
is smaller than the conventional setup time, then according to the monotonicity of the clock-to-Q
delay with respect to timing interval, based on the same cell characterization result, our approach
would use a clock-to-Q delay that is smaller than or equal to the clock-to-Q delay used in the
conventional approach. Therefore, for each signal, the arrival time computed by our approach is
guaranteed to be smaller than or equal to the one identied by any conventional approach.
Finally, since the boundary condition for all the primary output cells used in our approach
is the same as the one used in conventional approach (i.e., the signal should arrive before the
conventional setup time of the cell), if causes no timing failure at the primary output cells in the
conventional approach, then will also pass the boundary condition check shown in Algorithm 2
of our approach.
Theorem 5 shows that our STA approach can always identify a minimum clock period that is
smaller than or equal to the one computed by any other conventional STA approaches.
Theorem 6 If the same cell characterization result and the same denition of soft and hard setup
time are used, then no other STA approach with timing bleed can be constructed that identies a
minimum clock period which is smaller than the one computed by our approach.
Proof: We prove this Theorem by contradiction. Assuming that there exists another STA
with timing bleed that identies an minimum clock period (i.e.,
optimal
) that is smaller than the
85
one identied by our approach (i.e., ), i.e.,
optimal
<. Then when analyzing
optimal
in our
approach, it either fails due to an logic error at any cell or violates the conventional setup time
constraint at any primary output cell. In both cases, it means that in the optimal approach, the
DC identied for that cell (e.g., cell
k+1
) is larger than the one computed by our approach. As
shown Eq. 5.3, the timing interval DC
k+1
is computed based onT
data
k+1
andT
clk
k+1
. Since the clock
signal is assumed to be arbitrary but xed and independent of the data signal, the arrival time of
the data signal should be earlier, i.e.,T
data
k+1
is smaller. According to Eq. 5.3, a smallerT
data
k+1
is the
result of a smaller CQ
k
. If CQ
k
is already the nominal value (i.e., CQ
norm
k
), then it cannot
be smaller. Therefore, CQ
k
must be an adjusted clock-to-Q delay (i.e.,
g
CQ
k
). Due to the
monotonicity of the clock-to-Q delay, in order for
g
CQ
k
to be smaller, the timing interval of the
previous cell needs to be larger, i.e., DC
k
is larger. The same analysis keeps propagating back
along the delay path until it reaches a primary input cell or an inverting cell. Since the CQ of
the primary input cell is always CQ
conv
and the CQ of the inverting cell is always CQ
norm
,
the clock-to-Q delay of that cell cannot be further reduced. Hence, no other STA approach with
timing bleed can be constructed that identies a minimum clock period which is smaller than the
one computed by our approach.
5.5 Experimental results
To evaluate our approach, we developed a prototype of our STA tool which computes the minimum
clock period with the knowledge of timing bleed. we evaluated the benchmark circuit with our
latest cell characterization results. The same cell characterization result is also used to evaluate
the one which uses the conventional setup time method where timing bleed is only allowed up to
10% increase in the clock-to-Q delay. Table 5.1 shows the minimum clock period (ps) identied
by both timing analysis methods for 18 RSFQ benchmark circuits [41]. Eight benchmark circuits
are re-synthesized to optimize the number of combinational logic blocks between internal pipeline
86
stages, those circuits are denoted with a ` sp' sux (e.g., C499 sp is functionally identical to
C499 but synthesized in a dierent way).
The result shows that compared to the conventional STA approach, our new STA tool can
certify circuits to operate at a higher frequency.
The experimental results on benchmark circuits also reveal another important characteristic
of highly pipelined circuits. The clock period is determined by the worst-case pipeline stage,
which means that in highly pipelined circuits the clock period remains relatively independent
of the circuit size. For example, as shown in Table 5.1, a 16-bit multiplier has a similar clock
period compared to a 4-bit Kogge-Stone adder (KSA) even though the 16-bit multiplier is 27
times larger than the 4-bit KSA in terms of number of cells in the circuit. The maximum number
of combinational logic cells in a pipeline stage is an important determinant of the clock period.
Table 5.1 also shows that as the maximum number of combinational logic cells increases, the
benet of our new STA decreases. In other words, our approach oers lower benets when there
is a deep multi-stage combinational logic in the worst case path between pipeline stages.
However, the depth of the combinational logic can be optimized during the synthesis process.
For example, in Table 5.1, circuits with ` sp' sux are functionally identical but re-synthesized
by redesigning splitter tree. The results show that when the maximum depth of combinational
logic reduces then the benet of our approach increases.
We then compared the above two dierent approaches using Monte Carlo simulations on
these benchmark circuits. For each circuit, we created 500 Monte Carlo instances under process
variations and computed the minimum clock periods for each instance using the conventional
CMOS STA tool and our new prototype STA tool.
Figure 5.6, 5.7, 5.8, and 5.9 show the distribution of the minimum clock periods for the 500
instances for each benchmark circuit. The gray bars show the clock periods computed using the
conventional CMOS setup time approach. The green bars show that much lower clock periods
are identied by our new prototype STA tool which allows timing bleed at internal cells.
87
Table 5.1: Clock period for benchmark circuits under dierent timing bleed conditions
Circuit name
Number of
cells in
circuit
Maximum number
of combinational
logic between
pipeline stages
Allowing
timing
bleed
Conventional
setup time
(10% timing
bleed)
Improvement of
allowing timing
bleed compared
to conventional
setup time
FA 39 2 13.3 15.5 16.54%
c432 1,193 14 82.8 86.5 4.47%
c432 sp 1,224 5 35.7 40.9 14.57%
c499 907 15 93.1 98.1 5.37%
c499 sp 1,019 7 40.0 45.4 13.50%
c880 1,495 7 42.9 46.6 8.62%
c880 sp 1,533 4 28.6 32.7 14.34%
c1355 954 15 93.1 98.1 5.37%
c1355 sp 1,059 7 40.0 45.4 13.50%
c1908 1,541 13 84.2 87.3 3.68%
c1908 sp 1,600 5 36.7 41.9 14.17%
c3540 3,433 25 146.4 150.2 2.60%
c6288 7,268 12 65.5 68.6 4.73%
KSA4 92 2 21.0 26.0 23.81%
KSA8 239 3 21.0 26.0 23.81%
KSA16 609 3 21.0 26.0 23.81%
KSA32 1,519 3 25.7 30.7 19.46%
KSA32 sp 1,625 2 21.0 26.0 23.81%
MULT4 262 3 21.0 24.8 18.10%
MULT8 1,390 2 17.6 22.5 27.84%
MULT16 6,238 2 17.6 22.5 27.84%
DIV4 535 4 33.0 38.4 16.36%
DIV4 sp 561 3 27.2 32.6 19.85%
DIV8 3,208 8 52.8 57.9 9.66%
DIV8 sp 3,225 3 27.4 32.6 18.98%
DIV16 19,207 15 95.7 101.0 5.54%
For each benchmark circuit, the comparison of the two distributions clearly shows that by
allowing timing bleed at internal cells, we can greatly improve the performance of the circuit
while guaranteeing the correctness of the circuit.
5.6 Conclusion
In this chapter, we propose a new STA approach along with a new method for characterizing
timing behaviors of RSFQ logic cells. By relaxing the setup time constraints on internal regis-
ters of the logic cells and properly accounting for their eects of timing at the subsequent cell
88
Figure 5.6: Distribution of minimum clock periods for dierent benchmark circuits.
Figure 5.7: Distribution of minimum clock periods for dierent benchmark circuits.
using the concept of timing bleed, our STA is able to certify much higher performance for RSFQ
logic circuits, and hence preserve the key advantage of RSFQ, namely high speed. We prove the
89
Figure 5.8: Distribution of minimum clock periods for dierent benchmark circuits.
Figure 5.9: Distribution of minimum clock periods for dierent benchmark circuits.
correctness of our approach for STA of RSFQ circuit modules as well as its optimality. Experi-
mental simulation results show that for a given circuit, our STA prototype identies a minimum
clock period that can dramatically improve the performance of the circuit while guaranteeing its
correctness.
90
Chapter 6
Timing Independent ATPG
Our goal is to ensure that designs and fabricated chips provide desired performance. To achieve
this goal, we propose new methods and tools for timing verication and delay testing of RSFQ
logic.
We address several radically new phenomena in RSFQ technology, especially the existence of
single-pattern delay tests and the need to propagate delayed values via multiple pipeline stages.
Then based on the results of cell characterization under process variations, we identify delay
excitation conditions, sensitization conditions, and conditions for propagation of the logic errors
caused by timing violations due to process variations. We then propose a completely new paradigm
for automatic test pattern generation (ATPG) which utilizes these new phenomena to select multi-
cycle paths as targets and to generate test patterns that are guaranteed to excite the worst-case
delay along each target multi-cycle path. Finally, we present theoretical proofs and Monte Carlo
simulation results for benchmark circuits under process variations to demonstrate that the patterns
generated by our new ATPG are eective (invoke maximum delays of target multi-cycle paths)
and ecient (require small numbers of patterns).
91
6.1 Introduction
As its combination of high performance and low switching energy is a key reason for adopting
RSFQ, to move the technology beyond early prototyping it is important to develop tools to
ensure performance. We propose the rst paradigm and tools for two specic tasks for RSFQ
logic. First, we target timing verication, i.e., ensuring that an RSFQ design meets the desired
performance specications before the design is moved into fabrication. Second, we target delay
testing, i.e., identifying copies of fabricated chips that meet the desired performance specications.
In the wider context of testing fabricated chips, delay testing is particularly important for RSFQ
logic, since these are fabricated using processes with large feature sizes and have defect densities
dramatically lower than those for their CMOS counterparts. Also, due to quantized pulse-based
operation, in RSFQ logic even highly-distorted pulses are interpreted logically correctly by cells,
but the timings is aected.
More specically, we propose a new paradigm and a tool for automatically generating patterns
for these two tasks. For timing verication, the patterns we generate for a given design will be
simulated to characterize the timing of the design. This process, called dynamic timing verication
(DTV), computes the delay of the circuit with lower pessimism compared to static timing analysis
(STA) (a pattern-less approach that provides a bound on delays). Further, it also provides specic
patterns (if any) that violate the desired performance specication, and detailed simulation results
for these patterns provide direct guidance for circuit redesign to meet specications. For delay
testing, the patterns we generate will be applied to each fabricated chip to determine whether the
chip passes or fails at the clock period corresponding to the desired performance specications.
To serve their intended purposes, we develop our tools in a way that guarantees that the
patterns we generate invoke the maximum delay. We demonstrate this via theoretical proofs and
simulations of benchmark RSFQ circuits without and with process variations.
92
Test and testability of RSFQ circuits has been the subject of some research [42] [43]. These
researchers have developed an extensive set of test structures to identify dominant types of defects,
such as junction shorts and hard as well as resistive opens and shorts. Specic RSFQ cells are
fabricated and studied to develop fault models. The stuck-at fault model used in CMOS is also
adopted to RSFQ.
Test point insertion as well as scan techniques are developed for RSFQ technology to enhance
testability of RSFQ circuits [44] [45]. On-chip high-speed test system is also designed for RSFQ
circuits to resolve the issues caused by high speed and small amplitude of SFQ pulses [46]. Some
of the design for testability (DFT) techniques are successfully applied to small RSFQ circuits at
high speed (i.e., 20GHz) [47]. A variety of delay testing design approaches, such as shift register
latch (SRL), developed for CMOS technology also serve as design options for RSFQ technology
[48] [49] [50].
However, due to the fundamental dierences between RSFQ technology and CMOS technology,
failure models and tools for testing for CMOS may not be practically useful for RSFQ, especially in
verication and testing domains. In CMOS, nano-scale feature sizes provide higher performance.
On the other hand, RSFQ fabrication technology is not physical scaling focused, i.e., it achieves
high operating speed and low power consumption by using special characteristics of JJs and
superconducting operation instead of relying on very ne feature sizes. Hence RSFQ circuits are
fabricated using much higher feature sizes compared to CMOS technology [51]. One of the leading
fabs, MIT Lincoln Laboratory, recently demonstrated a process which has a minimum feature size
of 350 nm [52]. Because of the large feature sizes, defect density in these chips is low, with only
one or two defects expected per wafer. Due to the low defect density, stuck-at and other hard
faults become less important. On the other hand, other non-idealities such as process variations
and other RSFQ-specic issues | e.g., inductive coupling and bias current steering | have a
signicant impact on the operation of the fabricated chips. For example, key parameters in an
RSFQ cell need to be ne-tuned to maximize the operating margin of that cell in the presence
93
of process variations [5]. Also, several researchers have designed and fabricated test chips and
identied that dominant components of chip failure are process variations, including the variations
in key inductances, key resistances, and junction current with bias current adjustment [53] [54].
Therefore, developing fault models based on variations and other RSFQ specic non-idealities
increases in importance.
In this context, the essential steps required to develop fault models useful for fault simulation
and ATPG have not yet been taken. For instance, the researchers have not characterized the
conditions for exciting fault-eects as well as the conditions for sensitization and propagation of
fault-eects via RSFQ-specic cells to an observable output of the circuit, the necessary timing
conditions which must be taken into account while generating the test vectors, and so on.
In addition to process variations, many RSFQ-specic issues that cause functional and timing
failures, e.g., trapped
ux, coupling via mutual inductances, and biasing current redistribution,
have not been studied. SCE in general, and RSFQ technology in particular, operate on controlled
creation and propagation of quantized pulses with quantized energy. Due to the nature of such
quantized pulse-based operation, in RSFQ technology even highly-distorted pulses are still inter-
preted as logic-1. Hence, even high-levels of process variations and coupling, which signicantly
distort the pulses, do not create logic errors. Instead, variations and coupling aect timing. One
of our collaborators who has fabricated several SCE chips and tested them has noted that many
of the chips pass at low speed, but fail at designed speed. Hence, compared to CMOS, timing
failures are likely to constitute a signicantly higher proportion of failures due to design marginal-
ities, process variations, and other RSFQ-specic non-idealities. Since speed is one of the major
advantages of this technology, the importance of timing verication especially dynamic timing
verication (DTV, i.e., application of input vectors to, and evaluation of output responses from,
either to a detailed simulation or a fabricated chip) greatly increases.
In this chapter, based on the cell characterization results in the presence of variations, we
develop models of delays as well as fault models, and propose a completely new ATPG paradigm
94
to generate patterns that invoke the maximum values of these delay and robustly test these types
of faults. The primary focus of this paper is on variations; other RSFQ-specic issues are subjects
of our ongoing research.
The proposed ATPG paradigm is presented in the next section. In Section 6.3.2 we present
our theoretical proofs demonstrating that our patterns are guaranteed to invoke maximum delays
for target multi-cycle paths. Experimental results are shown in Section 6.4 and demonstrate the
eectiveness of our patterns, in terms of invoking maximum delays of target paths, as well as their
eciency, in terms of requiring very small numbers of patterns.
6.2 Proposed ATPG paradigm for RSFQ logic
RSFQ logic is pulse based, hence delay is only associated with propagation and generation of logic-
1. Hence, unlike CMOS where two-pattern delay tests are necessary for combinational logic, in
RSFQ single-patterns can be used for dynamic timing verication and single-pattern delay tests.
Also, the need to allow timing bleed and hence to consider multi-cycle paths is completely without
precedent in
ip
op-based CMOS logic circuits. We now present our test pattern generator for
RSFQ logic, which is the rst to generate single-patterns to invoke maximum delays and to consider
timing bleed and multi-cycle paths. As mentioned earlier, our approach generates patterns for pre-
fabrication dynamic timing verication of designs as well as for delay testing of every fabricated
chip.
6.2.1 Circuit model
The circuit model we used in our approach is the same as the one described in Section 5.2.1.
The key properties of the circuit model that are important for our new ATPG approach are
summarized here.
95
In RSFQ circuits, as in CMOS, circuit modules such as arithmetic logic units (ALUs) have
architecturally visible registers at their inputs and outputs. Further, in RSFQ, in contrast to
CMOS, the functional units within the ALU, such as adders and multipliers, are pipelined in
a very ne-grained manner, where each logic gate has a built in pipeline
ip-
op. Therefore,
each functional unit within a circuit module is a path-balanced acyclic logic. Further, typically,
architectural registers exist at inputs and outputs of each functional unit and all feedback loops
pass via architectural registers.
In our ATPG approach, for delay testing, we also assume that the architectural registers such
as those at the inputs and outputs of an ALU are scanned, but the gate-level pipeline
ip-
ops
within each logic gate in the logic modules are not scanned. As discussed in Section 4.2.2.1, this is
necessary for avoiding the very high performance overhead and astronomical area overhead that
would occur if we scanned every
ip-
op, since every logic cell in RSFQ technology is a clocked
element.
Since every
ip-
op in every architectural register is scanned for delay testing, we view these as
primary inputs and outputs when we generate delay test patterns for a functional unit. Further,
while we allow timing bleed at
ip-
ops within individual logic cells in a functional unit, we use
conventional setup time approach at
ip-
ops in architectural registers. Equivalently, for dynamic
timing verication, we assume that generated patterns are applied at architectural registers, the
responses are captured at architectural registers, and conventional setup time approach is used
at architectural registers. Hence, the maximum span of any multi-cycle path we target for test
pattern generation starts at an architectural register and ends at the subsequent architectural
register.
6.2.2 Target path selection
In RSFQ logic circuits, even though each cell is a clocked element, additional delay at cell g
i
can
bleed to cell g
j
in its fanout, aect its timing, and cause a logic error at g
j
, or continue to bleed
96
and cause a logic error at a subsequent cell g
k
, and so on. Hence, as mentioned above, the entire
path between architectural registers may be identied as a target path.
However, as described in Section 4.2.2.1, in RSFQ delay propagation paths pass via non-
inverting cells but terminate-and-restart at inverting cells. Therefore, a target path between
architectural registers is broken at inverting cells.
In our current cell library, in addition to INV cell, XOR2 cell may become an inverting cell
depending on the value applied at its o-path input. If a logic-0 is applied to the o-path input
of an XOR2 cell, from the point of view of the on-path input, the XOR cell is a non-inverting
cell, and timing bleed continues along the target path. In contrast, if a logic-1 is applied to the
o-path input of an XOR2 cell, the XOR2 cell becomes an inverting cell and hence terminates
the current delay propagation path and starts a new path at its output.
Our method for identifying multi-cycle target paths for test pattern generation implements the
above rules regarding continuation of timing bleed or its termination-and-restart (an exaple will
be presented in Section 6.2.6), in conjunction with the observations about architectural registers
at the end of Section 6.2.1.
6.2.3 Delay excitation conditions
To excite an additional delay caused by process variations, a logic-1 needs to be created at the
output of the cell. Table 6.1 summarizes the delay excitation conditions of all logic cells in the
current cell library.
As expected due to the pulse-based operation of RSFQ, every delay excitation condition is
dened in terms of a single-pattern. This is in sharp contrast with CMOS, where delay excitation
conditions are dened in terms of sequences of two-patterns, for combinational logic blocks.
97
6.2.4 Delay sensitization conditions
Once the additional delay is generated, it can be sensitized via a path comprised of non-inverting
cells via the propagation of logic-1. For the rst cell, there is no input timing bleed as the input
arrives on time, but on any of the subsequent cells in the multi-cycle target path, in addition
to sensitizing existing additional delay, new delay caused by variations should also be excited.
Hence, every cell along the path is a site of a combination of delay excitation and sensitization.
While propagating the logic-1 via the on-path inputs of the gates along the path under test, the
o-path inputs of these gates must be set to values that cannot accelerate the creation of logic-1
at the gate's output. For example, as reproduced in Figure 6.1(b), the path from X
1
to output is
selected as the target path. However, the on-path delay is not sensitized because the creation of
logic-1 at the output of OR cell (g
j
) may be accelerated by the value applied at its o-path input,
x
6
. We focus on robust propagation conditions which, by precluding o-path values that may
accelerate the creation of logic-1 at a gate's output, these guarantee propagation of additional
delay at on-path lines independent of the timing of logic driving o-path inputs [55,56].
Figure 6.1: (a) Timing bleed via non-inverting cells. (b) Timing bleed via non-inverting cells is
masked by o-path.
Table 6.1 summarizes the conditions for sensitizing a multi-cycle path delay fault. Since
XOR2 is both a delay sensitizing gate and a delay terminating gate, target paths need to be
enumerated separately for these two cases, and each target path (labelled \exciting, sensitizing"
and \terminating" in Table 6.1), must be congured in a manner consistent with the case for that
path.
98
As expected, every delay sensitization condition is dened in terms of a single-pattern.
The theoretical proof of the robustness of these conditions are presented in Section 6.3.2,
specically Lemma 7 and Lemma 8.
Table 6.1: Delay excitation and sensitization conditions [1]
Cell name
Vector(s) for
exciting delay fault
at cell's output
Vector(s) for
sensitizing delay fault
On-path O-path
DFF 1 1 N.A.
INV 0 N.A. N.A.
AND2 11 1 1
OR2 01, 10, or 11 1 0
XOR2
(exciting,
sensitizing)
01 or 10 1 0
XOR2
(terminating)
N.A. 1 1
6.2.5 Logic error propagation
Once the maximum delay is sensitized along the entire target path, the last cell in the path
will either capture an error-free logic value or an erroneous logic value. Hence, to capture the
potentially erroneous value at the last cell, we represent it as a D, in the same manner as we
capture the fault-eect in a stuck-at fault ATPG [57] [58]. (We do not capture it as a D because
the propagation of delay is only associated with the propagation of logic-1.)
We use well-known methods for propagation of a logic error from the output of the target path
to an observable cell, namely an architectural register.
Note that the conditions for such propagation of logic error require one-pattern tests. Also,
every one of our conditions for excitation and sensitization of delays via multi-cycle target paths
in RSFQ logic also requires one-pattern test. Hence, our robust delay test pattern generation
approach for RSFQ logic is based on single-pattern tests. This is in contrast with all existing delay
test generation approaches and is particularly important as the space of all possible patterns for
99
our new approach for RSFQ is dramatically smaller than that for all previous delay test generation
approaches.
In addition to capturing the potentially erroneous value at the last cell, we also need to
consider that an erroneous value may occur at any cell along the target path. Because while
sensitizing the maximal timing bleed along the entire target path, a logic error can occur at any
cell along the path. Therefore, we show that if there are no splitters along a target multi-cycle path,
then the conditions for sensitizing the timing bleed are necessary and sucient for the propagation
of D or D.
For a particular target path, the o-path input of each XOR2 cell along the target path is
determined by the path selection process. During the target path selection process, if an XOR2
cell is selected as a sensitizing cell, then logic-0 is assigned to its o-path input. On the other
hand, if an XOR cell is congured as a terminating cell, then its o-path input is set to logic-1.
Therefore, for a specic target path, the o-path input values for XOR2 cells are chosen by the
path selection process, not the error propagation process. For all other cells, as shown in Table 6.2,
the conditions for propagating logic error along the target path are identical to the conditions for
sensitizing timing bleed along the target path. Therefore, if there are no splitters along the target
path, then for each cell in the target path, the sensitization condition is the same as logic error
propagation condition. In other words, even if a logic error occurs before reaching the last cell in
the target path, the sensitization conditions ensure that the erroneous value is propagated from
the site of error.
If there are splitters in the target path, then the additional branches created by the splitters
can be used as auxiliary paths to facilitate the propagation of any logic error that occurs before
or at the corresponding splitters. At the end of Section 6.2.6, we describe how our ATPG exploits
this.
100
Table 6.2: Delay sensitization conditions and logic error propagation conditions
Cell name
Vector(s) for
sensitizing delay fault
Vector(s) for
propagating logic error
On-path O-path On-path O-path
DFF 1 N.A. D or D N.A.
INV N.A. N.A. D or D N.A.
AND2 1 1 D or D 1
OR2 1 0 D or D 0
XOR2
(exciting,
sensitizing)
1 0 D or D 0
XOR2
(terminating)
1 1 D or D 1
6.2.6 ATPG paradigm
Algorithm 3 shows an outline of our new ATPG algorithm that embodies our new paradigm,
the rst one to consider multi-cycle paths (to signicantly reduce delay overheads as well as area
overheads of scan) and to generate single pattern tests to invoke maximum delay (to signicantly
reduce the worst-case computational complexity of test generation).
While our ATPG integrates path selection, delay excitation, sensitization, and logic error
propagation, let us examine two parts of our new paradigm. In the rst part, it identies the set
of multi-cycle paths to target as outlined in Sections 6.2.1 and 6.2.2. In the second part, it selects
as a target one multi-cycle path from the above set and generates a test for the target path.
During the second part, the target multi-cycle path passes via one or more gates, with a
typical target path passing via many gates. (In our experiments to date, target path lengths are
distributed between a minimum of one cell and maximum of about 80 cells.) Our ATPG starts
by assigning the delay excitation and sensitization conditions, and hence assigns specic values at
the input of the target path as well as o-path inputs at gates along the target multi-cycle path.
Given the range of lengths of target paths, for most target paths, specic logic values are assigned
to several lines at the start of test generation for the target.
This characteristic of our APTG has signicantly in
uenced our selection of the base test
generation algorithm we use. Specically, in D-algorithm [58], the search space can be dramatically
101
reduced by performing imply and check on multiple excitation and sensitization conditions. In
contrast, in PODEM [59], objectives are mapped to primary inputs and examined one at a time,
this leads to unnecessary search in our scenario. Hence, our ATPG uses D-algorithm as the base
test generation algorithm.
Once the path delay is sensitized along the selected target path to cause a logic error at the
output of the last gate in the target path (if the target path is slow), the error propagation and
justication are achieved using a standard approach for stuck-at fault used in D-algorithm [58].
The multi-cycle path of a full-adder shown in Figure 6.2 is used as a running example to
demonstrate our ATPG approach, a complete description can be found in [1] [60],
In this example, the green multi-cycle path including INV (g
1
), AND2 (g
2
), OR2 (g
3
), and
INV (g
4
) shown in Figure 6.2, is selected as the target path. Logic values determined by excitation
and sensitization are shown in green, values in red are applied to propagate the logic error to the
primary output, blue values are set by the justication process. In order to excite the delay fault
at the beginning of the path which is the output of g
1
, logic-0 must be applied at the input of
INV g
1
. According to the robust sensitization conditions shown in Table 6.1, both inputs of the
AND2 cell need to be logic-1 to sensitize and propagate the extra delay through the AND2 cell.
Similarly, the on-path value of the OR2 cell should be logic-1 while the o-path need to be set to
logic-0 so that the creation of the logic-1 at the output of the OR2 cell would not be accelerated.
Finally, at the end of the path, the additional delay is converted to a logic error at the input of
INV (g
4
) and is propagated to the primary output. After the justication process, input pattern
A = 0,B = 1,C
in
= 1 is identied as a valid pattern to excite, sensitize, and propagate the eect
of unacceptably high delay for the selected multi-cycle target path.
For a certain target path, if the ATPG nishes by nding a test pattern, then that path
is covered. There are two reasons for the ATPG to nish without nding a test pattern for a
particular path: (1) it reaches the backtrack limit and is hence aborted (i.e., uncovered path); or
(2) it nishes the search by showing that no test pattern exists (i.e., untestable path).
102
Algorithm 3: An outline of our proposed ATPG algorithm
1 Path selection, delay excitation and sensitization phase:
2 Identify starting cells of delay paths
3 Start cells=All primary input cells, all INV cells and all XOR cells
4 Search for target paths
5 foreach cell s in start cells do
6 Set on-path and o-path value for cell s
7 Depth First Search (s)
8 end
9 Function Depth First Search (cell s)
10 Set output value of cell s = logic-1
11 Append cell to target path
12 if cell is a primary output cell then
13 Set output value of cell s = D
14 Add new target path
15 else
16 foreach downstream cell t do
17 if cell t is INV then
18 Set output value of cell s = D
19 Add new target path
20 Continue
21 else if cell t is XOR then
22 Set output value of cell s = D
23 Set o path value of cell s = logic-1
24 Add new target path
25 Set output value of cell s = logic-1
26 Set o path value of cell s = logic-0
27 Depth First Search (cell t)
28 end
29 Error propagation and primary input justication phase:
30 foreach path p in the list of all target paths do
31 Apply D-algorithm to p
32 if A valid pattern is found then
33 Target path p is covered
34 else if Search limit is reached then
35 Target path p is uncovered
36 else
37 Target path p is untestable
38 if Target path p is untestable then
39 if the length of path p is 1 then
40 Target path p is untestable
41 else
42 Create two sub-paths of p (i.e., p
1
and p
2
) by removing the rst cell in p and
the last cell in p respectively
43 Add p
1
and p
2
to the target paths list
44 end
45 End Function
103
Figure 6.2: Full-Adder as a running example [1].
6.2.6.1 Uncovered path
There may exist a pattern for the uncovered path, however, the ATPG cannot nd it due to the
limited search capability. Certain heuristics such as the cone-based topological search algorithm
for ATPG [61] can be used to enhance the search capability of the ATPG to improve the coverage.
6.2.6.2 Untestable path
A target path is identied as a untestable path only if the ATPG nishes searching the entire
solution space dened by the constraints shown in Section 6.2.3 and Section 6.2.4. This may be
a result of a con
ict in the excitation condition and sensitization condition of all cells along the
target path. It may also because of logic error cannot be propagated to primary outputs or the
primary inputs cannot be justied.
In this case, since no test exists for the target multi-cycle path, our ATPG recursively generates
shorter sub-paths to cover the entire path. For example, if ann-stage multi-cycle path is identied
as an untestable path, then our ATPG creates two (n 1)-stage sub-paths (i.e., one without the
rst cell in the n-stage path and another without the last cell in the n-stage) and searches for
tests for those newly created sub-paths. If any sub-path is identied as untestable, then our
ATPG further reduces the length of that sub-path and attempts to generate tests to test it.
104
This approach continues recursively until all the sub-paths become testable or the length of the
sub-path is reduced to 1.
In Section 6.2.5, we show that if a logic error occurs before reaching the last cell and there
are no splitters along the path, then sensitization conditions guarantee that the logic error is
propagated to the last cell. However, if there exists a splitter in the target path, then the ATPG
over-species the error propagation conditions by not utilizing the auxiliary path created by that
splitter. If our ATPG nds a test for that path under the over-specied conditions, then that path
is covered. If no test is found, then auxiliary path created by the splitter is implicitly utilized
during our attempt for nding tests for the untestable path by targeting shorter sub-paths.
6.3 Theoretical results
In this section we derive formal proofs of robustness of our test patterns and key properties of our
entire test suite.
6.3.1 Robustness of our test patterns
Process variations have a signicant impact on the delay of the cells, may cause timing bleed, and
eventually lead to circuit failures. The variations include the variation of parameters of a circuit
element within the cell and the variation of biasing current which is caused by variations in the
resistive network that distributes the current. While such variations are the primary focus here,
other types of non-idealities can aect the delay of a cell, such as the eects of mutual inductance,
the history of the circuit (i.e., eects of previous patterns), and the eects of biasing current
steering. The inductive coupling and state/history dependent delays are not handled explicitly by
the approach presented here. However, simulation results show that phenomena such as inductive
coupling, and history dependent delays only aect cell delays by a small amount (i.e., less than
5%) [37]. Hence, as of now, we use guard-banding to account for such non-idealities. Coming up
105
with better guard-banding methods or explicit tests for these non-idealities are subjects of our
future research. Biasing current steering is reduced signicantly because of the use of PTLs in all
cell-to-cell interconnects.
We now show that, in cases where complete path coverage for a given circuit is achieved, the
set of test patterns generated by our timing-independent approach is robust for exciting maximal
delays considering process variation during DTV as well as for testing delay faults caused by
process variations. A sketch of our overall proof is presented here and detailed proofs are shown
ahead. We rst demonstrate the strengths of our excitation and sensitization conditions, namely
that our conditions invoke the maximum clock-to-Q delay at the output of each individual gate.
Then we develop an induction to show that, under the condition that the clock skew is arbitrary,
but xed, our method is guaranteed to excite the maximum delay for the given target path.
Finally, we target all possible logical paths in a given circuit.
In other words, for every logical path in the circuit, the maximal delay is invoked through
on-path inputs along the target path, while the o-path inputs do not accelerate or invalidate any
delay values that are excited along the path.
6.3.2 Proof of the robustness of our test pattern
In our proof, we assume that our target RSFQ circuits and the cells have the following properties.
(1) The circuit is fully path balanced and can be levelized. (2) The clock skew is arbitrary, xed,
and independent of the data values applied to any cell. (3) As shown in Figure 4.13 and Figure 5.3,
the clock-to-Q delay increases monotonically as the timing interval between the data and the clock
decreases. In other words, the clock-to-Q delay is maximized if the timing interval between data
signal and clock signal is minimized.
In addition to the above assumptions, we also adopt the assumptions made in all previous
timing analysis approaches for RSFQ. Specically, while waveforms of SFQ pulses can take a wide
variety of shapes, two fundamental abstractions are used during cell characterization for timing
106
analysis. The rst abstraction is used to compute the logic value of the pulse: if the area of the
pulse exceeds certain threshold, then it is interpreted as a logic-1 [3]. The second abstraction is
used to compute the arrival time of the pulse. There are minor variations of how the arrival time
of the pulse is extracted from the waveform, i.e., either the arrival time is determined by the time
of the peak value [28] or it is determined by the time when the area of the pulse reaches 50% [3].
In either of these variations, all delay properties of RSFQ logic are dened only with respect to
the arrival time, i.e., no other property of the waveform (e.g., the width of the pulse, the slope of
the edges of the pulse, etc.) is carried into the timing analysis domain during cell characterization.
In other words, once the arrival time is computed, all other properties of the pulse are ignored in
timing analysis, the higher the arrival time at an input, the larger the arrival time at the output.
The nal assumption is that the delays are additive, which means that the delays of signals along
a path can be added together [39] [40].
In our approach, a target path either starts at a primary input or an inverting gate and ends
at a primary output or an inverting gate. For a 2-input cell, only one input (say, input x
1
) is
included in the target path (i.e., considered as the on-path input), the other input (say, input
x
2
) is referred to as the o-path input. The goal of our approach is to excite the highest delay
along the target path. If the target path's delay is not being exposed, i.e., if another path (via
the o-path input) is dominating the delay, then it is not a valid condition for the target path.
(This is because that other delay would be identied when a dierent path, namely one that
passes via the o-path, is selected as the target path.) Therefore, our approach needs to identify
a pattern which excites the maximal delay for the target path but, at the same time, the delay of
the o-path should not dominate the delay.
Lemma 7 Our excitation conditions invoke maximum clock-to-Q delay with respect to the on-path
input of a given cell along a given target path.
107
Proof: For a particular cell, only logic-1 at the output excites delay. Therefore, for a single-
input cell (i.e., DFF and INV), the condition is trivial, the vector for exciting delay at the output
of DFF is 1 while the vector for INV is 0. For a two-input AND cell, since the only vector that
invokes a logic-1 at the output of the cell is 11, the excitation conditions for both the on-path
input and the o-path input are 1. For a two-input OR cell (i.e., an OR cell with two inputs
x
1
and x
2
), only one input (e.g., x
1
) is included in the target path, the other input (e.g., x
2
) is
considered as the o-path input. The excitation condition for the on-path input is 1, but there
is a choice for the o-path input. The o-path input can either be 1 or 0. For 11 case the two
inputs can have many dierent skew values. However, as shown in Figure 5.3, for all possible
skew values between the on-path and the o-path input, the 10 condition always excites a higher
delay compared to the 11 case. Therefore, for the two-input OR cell, the excitation condition is
1 for the on-path input and 0 for the o-path input. For a two-input XOR cell, if the o-path
is 0, then the XOR2 cell degenerates to a DFF, if the o-path is 1, then it behaves like an INV.
More importantly, these two cases are handled separately as two dierent target paths and each
has only one choice of value. In conclusion, for a given cell, the excitation conditions shown in
Table 6.1 maximize the clock-to-Q delay via the gate starting at the on-path input.
Lemma 8 Our sensitization conditions preserve the accumulated delay along the path and excite
maximum clock-to-Q delay with respect to the on-path input of a given cell along a given target
path and ensure that the o-path input does not accelerate the clock-to-Q delay.
Proof: According to Lemma 7, the excitation conditions excite the maximal clock-to-Q delay
at the output of the cell. In order to accumulate the excited delay along the path, logic-1 needs to
be generated and propagated along the path. During the delay propagation, the o-path input of
the OR2 cell must be 0 so that it would not accelerate the on-path. In conclusion, for every cell,
the sensitization conditions shown in Table 6.1 ensure that excited delay is accumulated through
108
the cell and new maximal clock-to-Q delay is invoked with respect to the on-path input without
being accelerated by the o-path input.
Theorem 9 For every multi-cycle target path for which our ATPG nds a test, it is guaranteed
to excite the maximum delay for that path.
Proof: Because the circuit is fully path balanced and can be levelized, this Theorem can be
proved via induction. The induction should be set up based on arrival times of signals. For a
given target path with (k + 1) levels of cells, if the arrival time of signal at the output of cell k
is maximum, then we should show that the arrival time of signal at the output of cell k + 1 is
also maximized. However, because RSFQ circuits are gate-level pipelined, every cell is a pipeline
stage. This implies that the arrival time at output of cell k is determined by its clock-to-Q delay
(i.e., CQ
k
) and the arrival time of the clock (i.e., T
clk
k
). Since the arrival time of the clock
signal of a cell in the presence of skew is assumed to be arbitrary, but independent of the data
value, maximizing the arrival time of signal at the output of the cell is equivalent to maximizing
its clock-to-Q delay. Therefore, we phrase the induction in terms of clock-to-Q delay of the cells.
In our induction proof, we show that if the clock-to-Q delay of cell k is maximized, then the
clock-to-Q delay of cell k + 1 is also maximized.
Base step (excitation only): When the target path has only one cell, then it is a degenerated
case, namely we only need to excite the maximal clock-to-Q delay for that cell. According to
Lemma 7, the excitation conditions ensure that if our ATPG nds a test, then it excites the
maximum clock-to-Q delay for that cell.
Base step (excitation and sensitization): If the target path consists of two levels of cells as
shown in Figure 6.3, then according to Lemma 7, CQ
1
is maximized. The on-path input data
arrival time for cell 2 is dened by Eq. 6.1.
T
onpathdata
2
=T
clk
1
+ CQ
1
+ comb
1;2
(6.1)
109
Figure 6.3: Timing diagram of a two-level pipeline stage.
Because the clock signal arrival time is assumed to be arbitrary, but independent of data value,
the clock skew between cell 1 and cell 2 (i.e., skew
1;2
) and the clock period T
Period
are xed.
Therefore, as shown in Eq. 6.2, the timing interval DC
2
between T
clk
2
and T
onpathdata
2
is
minimized as the CQ
1
is maximized.
DC
2
=T
clk
2
T
onpathdata
2
=T
clk
2
T
clk
1
CQ
1
comb
1;2
= skew
1;2
+ CQ
1
comb
1;2
(6.2)
According to Lemma 8, the o-path input signal T
offpathdata
2
will not accelerate the clock-
to-Q delay of cell 2 (i.e., CQ
2
). Because the clock-to-Q delay increases monotonically as the
timing interval between clock signal and data signal decreases (see Figure 5.4), minimizing DC
2
will maximize CQ
2
. In conclusion, if our ATPG nds a test, then it excites the maximum
clock-to-Q delay for the two-level target path.
Induction step: Assume this holds for a target path with k levels of cells. Then, for a target
path with (k + 1) levels of cells, the clock-to-Q delay of cell k (i.e., CQ
k
) is maximized. Since
skew
k;k+1
and are xed, the timing interval DC
k+1
is minimized as CQ
k
is maximized as
shown in Eq. 6.3
110
DC
k+1
=T
clk
k+1
T
onpathdata
k+1
=T
clk
k+1
T
clk
k
CQ
k
comb
k;k+1
= skew
k;k+1
+ CQ
k
comb
k;k+1
(6.3)
According to Lemma 8, the o-path input signal T
offpathdata
k+1
will not accelerate CQ
k+1
.
Because CQ
k+1
increases monotonically as the DC
k+1
decreases, the maximization of CQ
k
will minimize the DC
k+1
and subsequently maximize CQ
k+1
. Hence, by induction, if our
ATPG nds a test, then it excites the maximum delay for that path.
Figure 6.4: Timing diagram of a pipeline stage.
Theorem 10 If our ATPG is able to nd a test for every multi-cycle target path in a circuit,
then our approach guarantees the excitation of maximum delay for the circuit.
Proof: For a given circuit, our approach considers the set of all possible target paths based
on our target path selection conditions. Each input of every cell is selected as on-path for some
target path. Then for each target path p
i
, our approach attempts to nd a test t
i
which invokes
the maximal output delay for that path. If every target path p
i
is testable and it invokes a delay
of d
i
, then any other input pattern would invoke a delay less than max(d
i
) for that fabricated
copy of the circuit. Therefore, our approach excites the maximum delay of the circuit.
Theorem 10 shows the key property of our entire test suite.
111
6.4 Experimental Results
To evaluate our approach, we developed a prototype of our new ATPG paradigm and applied it
to several RSFQ benchmarks obtained from [41] as shown in Table 6.3. The benchmarks include
full adder (FA), Kogge-Stone adder (KSA), integer multiplier (MULT), integer divider (DIV), and
several ISCAS85 benchmark circuits [62].
In this section, we rst evaluate our patterns in terms of coverage to show that our ATPG can
successfully generate patterns for the target paths. Then we present two additional evaluations to
demonstrate the eectiveness and eciency of our patterns. The eectiveness is dened in terms
of the ability of our patterns to invoke maximal delay. The eciency is dened in terms of number
of patterns generated by our ATPG, since the number of patterns required is a key determinant
of the cost of test application for dynamic timing verication as well as delay testing.
6.4.1 Coverage of our test sets
As described in Section 6.2.6, for a certain target path, if a test pattern is found by our ATPG,
then that path is covered. If our ATPG nishes the search without nding a test pattern, then
that path is an untestable path. If the search process is aborted due to backtrack limits, then
that path is an uncovered path. The coverage is dened as the total number of covered paths over
the sum of covered paths and uncovered paths and is presented in Table 6.3. While the results
for arithmetic circuits were presented in our previous work [1], since then we have enhanced our
ATPG prototype and here we present results for many more benchmark circuits.
Our ATPG achieves high coverage for most circuits. Our current ATPG prototype is based on
a straight-forward search which guarantees correctness but not eciency, therefore the coverage
is somewhat low for a couple of circuits. We are currently incorporating heuristics into our tool
to make the search more ecient and to improve the coverage.
112
Table 6.3: Number of test patterns and its coverage for benchmark circuits
Circuit name Untestable ratio
Number of
patterns
Coverage
1-bit FA 0/19 (0%) 6 19/19 (100%)
4-bit KSA 0/65 (0%) 42 65/65 (100%)
8-bit KSA 0/193 (0%) 137 193/193 (100%)
16-bit KSA 0/641 (0%) 463 641/641 (100%)
32-bit KSA 0/2305 (0%) 1687 2305/2305 (100%)
4-bit MULT 38/185 (38%) 49 147/147 (100%)
8-bit MULT 598/1443 (41%) 483 845/845 (100%)
4-bit DIV 323/413 (78%) 62 90/90 (100%)
8-bit DIV 6892/7403 (93%) 282 511/511 (100%)
c432 694/1025 (67%) 213 331/331 (100%)
c499 2310/3658 (63%) 747 890/1348 (66%)
c880 432/1111 (38%) 451 679/679 (100%)
c1355 2266/3733 (60%) 942 1083/1467 (73%)
c1908 1534/3063 (50%) 1019 1361/1529 (89%)
c3540 5303/6949 (76%) 1075 1646/1646 (100%)
6.4.2 Eectiveness of our test sets - invoking maximum delays
If we use our patterns for dynamic timing verication (DTV), the simulation cost is proportional
to the number of patterns. Alternatively, if we use the patterns to actually test the circuit after
fabrication, the number of patterns is an important determinant of testing cost. Therefore, in
the justication process for each target path, our ATPG attempts to mark the value of a primary
input as don't care (x) whenever possible. Each primary input with a don't care value can be
assigned either logic-0 or logic-1. Table 6.4 shows the patterns generated by our ATPG for a
full-adder.
A pattern with x values (e.g., 11x) can be expanded by enumerating all possible assignments
forx values. A pattern withkx values, can be expanded into 2
k
fully-specied patterns. Alterna-
tively, a pattern with one or more x values may be compressed by combining with one or more of
the other patterns that are compatible (two patterns are compatible if none of their corresponding
inputs have complementary logic values, i.e., 0 and 1), by assigning one specic combination of x
values. Our Theorems are valid for both of these methods as well many other methods that can
be used to convert the ATPG generated patterns to fully specied patterns.
113
We start by verifying that the expanded set of patterns can invoke the maximum delay of a
given circuit under process variations.
Table 6.4: Patterns generated for a Full-Adder
Pattern (A, B, C
in
)
010
011
100
101
11x
111
We used Monte Carlo simulation to create versions of circuits with the eects of process
variations and applied patterns generated by our ATPG. Simulation results show that as the clock
period becomes smaller, timing bleed indeed occurs along many paths of the circuit. However,
some of these instances of timing bleeds are eliminated by the downstream gates (because they
have low delay values) without causing any logic errors. When the clock period becomes even
smaller, some of the bleeds accumulate along multi-cycle path and eventually cause logic errors.
All our test patterns are able to excite and sensitize the path delay fault and propagate the logic
error to a primary output.
Because we targeted all possible multi-cycle paths in our approach, if one cell has a timing
bleed which its downstream cells are not able to compensate and hence cause a logic error, our
ATPG is guaranteed to create a high quality delay test pattern for that path that takes into
account the timing bleed.
The eectiveness of our patterns is veried by comparing the delays excited by our test patterns
with an exhaustive set of patterns. (Note that this step is only needed to validate our approach.
Hence, its scalability to circuits with large numbers of inputs is not necessary.) For a Monte Carlo
instance of a circuit, we generate an exhaustive set of patterns. For each pattern in the exhaustive
set, we perform simulations to identify the minimum clock for which Monte Carlo instance of the
circuit would produce correct logic response for that input pattern. For example, Table 6.5 shows
114
the minimum clock period required for each input pattern for a Full-Adder. For instance, as
shown in Table 6.5, in order to get a logically correct response for input pattern (010), the clock
period must be greater than or equal to 7.2 ps.
000 001 010 100 011 101 110 111
Patterns (sorted by clock period low to high)
0
1
2
3
4
5
6
7
8
Minimum clock period (ps)
Circuit: fa
Figure 6.5: Minimum clock period for each input pattern for a Full-Adder with 100% coverage
and 0% untestable ratio.
We sorted the set of exhaustive patterns based on their corresponding minimum clock period
and then marked the ones that are identied by our ATPG. As shown in Figure 6.5, the green bars
indicate the patterns that are generated by our ATPG while the gray ones are not. Figure 6.6
shows the simulation results for a 4-bit KSA. These gures clearly show that the patterns identied
by our ATPG indeed accomplish the goal of generating patterns which excite maximal delays for
a given circuit.
We also performed the same analysis for circuits which are partially testable (i.e., have some
untestable paths). As shown in Figure 6.7, even though the untestable ratio is larger than 0% for
115
0 100 200 300 400 500
Patterns (sorted by clock period low to high)
0
2
4
6
8
Minimum clock period (ps)
Circuit: KSA4
Figure 6.6: Minimum clock period for each input pattern for a 4-bit KSA with 100% coverage
and 0% untestable ratio.
Table 6.5: Minimum clock period for each input pattern for one particular Monte Carlo instance
a Full-Adder
Pattern (A, B, C
in
) Minimum clock period (ps)
000 2.7
001 7.0
010 7.2
011 7.2
100 7.6
101 7.6
110 7.9
111 7.9
these circuits, which means Theorem 10 does not apply, the empirical results show that our test
patterns can still identify the maximal delay for these circuits. This is because Theorem 9, which
is the test pattern level theorem, still holds.
These simulation results clearly show that the patterns generated by our ATPG can excite
maximal delays for the testable paths.
116
If a path is identied as untestable by our ATPG, it may be because of the fact that our ATPG
failed to propagate the logic error to primary outputs. In this case, even though logic error may
occur along the target path, the eect of that failure cannot be observed at the primary output,
therefore, the circuit can still generate correct logic responses. Another reason for our ATPG
to report untestable is that the excitation and sensitization conditions for all the cells along the
target path con
ict with each other. In this case, the maximal delay cannot be excited by any
input pattern, therefore, the circuit can still produce correct logic values for all possible input
patterns.
0 50 100 150 200 250
Patterns (sorted by clock period low to high)
0
1
2
3
4
5
6
7
Minimum clock period (ps)
Circuit: arrmult4
Figure 6.7: Minimum clock period for each input pattern for a 4-bit multiplier with high untestable
ratio.
The eectiveness evaluation results shown in Figure 6.5, 6.6, 6.7, and 6.8 are based on expanded
sets of patterns. We now perform simulation analysis using compressed sets of patterns. As shown
in Figure 6.9, 6.10, 6.11, and 6.12, even with sets of compressed patterns which are more practically
117
0 50 100 150 200 250
Patterns (sorted by clock period low to high)
0
2
4
6
8
10
Minimum clock period (ps)
Circuit: ID4s
Figure 6.8: Minimum clock period for each input pattern for a 4-bit integer divider with high
untestable ratio.
useful, our ATPG is still eective, i.e., our ATPG-generated patterns excite the maximal delay in
the circuit.
6.4.3 Eciency of our test sets - numbers of patterns
As described in Section 6.4.2, a pattern with x values can either be expanded or be compressed.
Table 6.6 shows the number of patterns generated for benchmark circuits for the above two
methods. This shows that our ATPG can dramatically reduce the number of patterns needed to
test a given circuit to a practical level that is a very small fraction of the size of an exhaustive set
of patterns.
Additional heuristics may be used to further reduce the number of patterns by specically
targeting paths that are more susceptible to timing bleed. For instance, due to the dierent setup
118
000 001 010 100 011 101 110 111
Patterns (sorted by clock period low to high)
0
1
2
3
4
5
6
7
8
Minimum clock period (ps)
Circuit: fa
Figure 6.9: Minimum clock period for each input pattern for a Full-Adder with 100% coverage
and 0% untestable ratio using compressed patterns.
time values and clock-to-Q delay values for dierent cells in the given RSFQ cell library, pipeline
stages constructed with certain cell combinations in consecutive stages of a path are more likely
to propagate and accumulate timing bleed while some other combinations are more likely to block
the timing bleed. Also, a pipeline stage with many splitter cells between gates is more likely to
cause timing issue. Hence, any path which includes such pipeline stages should be chosen as target
path for timing verication and delay testing. Therefore, additional criteria can be included in our
target path selection to reduce the number of target paths and subsequently reduce the number
of vectors generated by the ATPG.
119
0 100 200 300 400 500
Patterns (sorted by clock period low to high)
0
2
4
6
8
Minimum clock period (ps)
Circuit: KSA4
Figure 6.10: Minimum clock period for each input pattern for a 4-bit KSA with 100% coverage
and 0% untestable ratio using compressed patterns.
6.5 Conclusion and future work
We characterized the behavior of RSFQ cells under process variations, and addressed its char-
acteristics by developing a completely new ATPG paradigm for generating test patterns for dy-
namic timing verication and path delay fault testing of RSFQ circuits. By utilizing phenomena
identied via cell characterization, especially the existence of single-pattern delay tests and char-
acteristics of timing bleed along multi-cycle paths, our ATPG is able to identify all multi-cycle
target path delay faults and generate test patterns that excite and sensitize the worst case timing
bleeds along each target path. We develop theoretical proofs to demonstrate that our patterns
invoke maximum delays. Simulation results show that our approach achieves high quality. It also
enables high quality delay testing and dynamic timing verication without astronomical overhead
of full-scan.
120
0 50 100 150 200 250
Patterns (sorted by clock period low to high)
0
1
2
3
4
5
6
7
Minimum clock period (ps)
Circuit: arrmult4
Figure 6.11: Minimum clock period for each input pattern for a 4-bit multiplier with high
untestable ratio using compressed patterns.
In our future research, we will extend the scope of our ATPG approach. This includes improv-
ing the speed of our ATPG, utilizing two-pattern tests to further improve coverage, and generating
patterns that can cover other non-idealities. Furthermore, we will also take into consideration the
hold time issues by improving and integrating the method proposed in this chapter.
121
0 50 100 150 200 250
Patterns (sorted by clock period low to high)
0
2
4
6
8
10
Minimum clock period (ps)
Circuit: ID4s
Figure 6.12: Minimum clock period for each input pattern for a 4-bit integer divider with high
untestable ratio using compressed patterns.
Table 6.6: Number of patterns for benchmark circuits
Circuit name
Number of
exhaustive
patterns
Number of
ATPG patterns
generated
Number of
ATPG patterns
with compression
1-bit FA 8 6 5
4-bit KSA 512 42 35
8-bit KSA 131,072 137 123
16-bit KSA 8,589,934,592 463 433
32-bit KSA 36,893,488,147,419,103,232 1,687 1,625
32-bit KSA 3.6893x10
19
1,687 1,625
4-bit MULT 256 49 35
8-bit MULT 65,536 483 365
4-bit DIV 256 62 44
8-bit DIV 65,536 282 237
c432 6,871,947,6736 213 128
c499 2,199,023,255,552 747 727
c880 1,152,921,504,606,846,976 451 420
c1355 2,199,023,255,552 942 915
c1908 8,589,934,592 1,019 1,018
c3540 1,125,899,906,842,624 1,075 954
122
Reference List
[1] Fangzhou Wang and Sandeep Gupta. Automatic test pattern generation for timing veri-
cation and delay testing of rsfq circuits. In 2019 IEEE 37th VLSI Test Symposium (VTS),
pages 1{6. IEEE, 2019.
[2] Konstantin K Likharev and Vasilii K Semenov. Rsfq logic/memory family: A new josephson-
junction technology for sub-terahertz-clock-frequency digital systems. IEEE Transactions on
Applied Superconductivity, 1(1):3{28, 1991.
[3] Coenrad J Fourie. Extraction of dc-biased sfq circuit verilog models. IEEE Transactions on
Applied Superconductivity, 28(6):1{11, 2018.
[4] Lieze Schindler Coenrad Fourie. Generic parameterized cell library for logic synthesis with
dc-biased rsfq logic, 2018.
[5] Soheil Nazar Shahsavani, Bo Zhang, and Massoud Pedram. Accurate margin calculation for
single
ux quantum logic cells. In 2018 Design, Automation & Test in Europe Conference &
Exhibition (DATE), pages 509{514. IEEE, 2018.
[6] WC Stewart. Current-voltage characteristics of josephson junctions. Applied Physics Letters,
12(8):277{280, 1968.
[7] K Nakajima, Y Onodera, and Y Ogawa. Logic design of josephson network. Journal of
Applied Physics, 47(4):1620{1627, 1976.
[8] PI Bunyk, A Oliva, VK Semenov, M Bhushan, KK Likharev, JE Lukens, MB Ketchen,
and WH Mallison. High-speed single-
ux-quantum circuit using planarized niobium-trilayer
josephson junction technology. Applied physics letters, 66(5):646{648, 1995.
[9] Akira Fujimaki, Masamitsu Tanaka, Takahiro Yamada, Yuki Yamanashi, Heejoung Park, and
Nobuyuki Yoshikawa. Bit-serial single
ux quantum microprocessor core. IEICE transactions
on electronics, 91(3):342{349, 2008.
[10] HYPRES. The premier commercial foundry for superconducting ics and custom fabrication,
2018.
[11] MIT Lincoln Laboratory. Lincoln laboratory supercomputing center, 2018.
[12] Yuki Ando, Ryo Sato, Masamitsu Tanaka, Kazuyoshi Takagi, Naofumi Takagi, and Akira
Fujimaki. Design and demonstration of an 8-bit bit-serial rsfq microprocessor: Core e4.
IEEE Transactions on Applied Superconductivity, 26(5):1{5, 2016.
[13] S Nagasawa, T Satoh, K Hinode, Y Kitagawa, M Hidaka, H Akaike, A Fujimaki, K Takagi,
N Takagi, and Nobuyuki Yoshikawa. New nb multi-layer fabrication process for large-scale
sfq circuits. Physica C: Superconductivity, 469(15-20):1578{1584, 2009.
123
[14] Gen Konno, Yuki Yamanashi, and Nobuyuki Yoshikawa. Fully functional operation of low-
power 64-kb josephson-cmos hybrid memories. IEEE Transactions on Applied Superconduc-
tivity, 27(4):1{7, 2017.
[15] IARPA. Iarpa launches program to develop a superconducting computer, 2014.
[16] IARPA. Electronic design automation tools for superconducting electronics eda for sce, 2015.
[17] IARPA. Supertools, 2016.
[18] D Scott Holmes, Andrew L Ripple, and Marc A Manheimer. Energy-ecient superconducting
computing|power budgets and requirements. IEEE Transactions on Applied Superconduc-
tivity, 23(3):1701610{1701610, 2013.
[19] Kris Gaj, Quentin P Herr, Victor Adler, Andy Krasniewski, Eby G Friedman, and Marc J
Feldman. Tools for the computer-aided design of multigigahertz superconducting digital
circuits. IEEE transactions on applied superconductivity, 9(1):18{38, 1999.
[20] Coenrad J Fourie and Mark H Volkmann. Status of superconductor electronic circuit design
software. IEEE Transactions on Applied Superconductivity, 23(3):1300205{1300205, 2012.
[21] P. Le Roux L. Schindler and C. J. Fourie. Optimization of passive transmission lines to
minimize re
ections between rsfq logic cells. IEEE Trans. Appl. Supercond., submitted for
publication.
[22] Stephen R. Whiteley. WRspice circuit simulator, 1996.
[23] M Tanaka, T Kawamoto, Y Yamanashi, Y Kamiya, A Akimoto, K Fujiwara, A Fujimaki,
N Yoshikawa, H Terai, and S Yorozu. Design of a pipelined 8-bit-serial single-
ux-quantum
microprocessor with multiple alus. Superconductor Science and Technology, 19(5):S344, 2006.
[24] Naveen Katam, Alireza Shafaei, and Massoud Pedram. Design of multiple fanout clock
distribution network for rapid single
ux quantum technology. In 2017 22nd Asia and South
Pacic Design Automation Conference (ASP-DAC), pages 384{389. IEEE, 2017.
[25] S Yorozu, Y Kameda, H Terai, A Fujimaki, T Yamada, and S Tahara. A single
ux quantum
standard logic cell library. Physica C: Superconductivity, 378:1471{1474, 2002.
[26] Masaaki Maezawa, Fuminori Hirayama, and Motohiro Suzuki. Design and fabrication of rsfq
cell library for middle-scale applications. Physica C: Superconductivity, 412:1591{1596, 2004.
[27] P. Bunyk, D. Zinoviev, A. Rylyakov, K. Likharev, and P. Litskevitch. Suny rsfq cell library,
2019.
[28] Naveen Kumar Katam and Massoud Pedram. Timing characterization for static timing
analysis of single
ux quantum circuits. IEEE Transactions on Applied Superconductivity,
29(6):1{8, 2019.
[29] MS Hrishikesh, Norman P Jouppi, Keith I Farkas, Doug Burger, Stephen W Keckler, and
Premkishore Shivakumar. The optimal logic depth per pipeline stage is 6 to 8 fo4 inverter
delays. In Proceedings 29th Annual International Symposium on Computer Architecture,
pages 14{24. IEEE, 2002.
[30] Ning Chen, Bing Li, and Ulf Schlichtmann. Iterative timing analysis based on nonlinear and
interdependent
ip
op modelling. IET Circuits, Devices & Systems, 6(5):330{337, 2012.
124
[31] Andrew B Kanng and Hyein Lee. Timing margin recovery with
exible
ip-
op timing model.
In Fifteenth International Symposium on Quality Electronic Design, pages 496{503. IEEE,
2014.
[32] Shweta Srivastava and Jaijeet Roychowdhury. Interdependent latch setup/hold time charac-
terization via euler-newton curve tracing on state-transition equations. In Proceedings of the
44th annual Design Automation Conference, pages 136{141. ACM, 2007.
[33] Jao-Ching Lin and VK Semenov. Timing circuits for rsfq digital systems. IEEE transactions
on applied superconductivity, 5(3):3472{3477, 1995.
[34] Cesar A Mancini, Nada Vukovic, Andrea M Herr, Kris Gaj, Mark F Bocko, and Marc J Feld-
man. Rsfq circular shift registers. IEEE transactions on applied superconductivity, 7(2):2832{
2835, 1997.
[35] Andrzej Krasniewski. Logic simulation of rsfq circuits. IEEE transactions on applied super-
conductivity, 3(1):33{38, 1993.
[36] Ghasem Pasandi and Massoud Pedram. Pbmap: A path balancing technology mapping algo-
rithm for single
ux quantum logic circuits. IEEE Transactions on Applied Superconductivity,
29(4):1{14, 2018.
[37] Fangzhou Wang and Sandeep Gupta. Timing verication for rapid single-
ux-quantum (rsfq)
logic: New paradigm and models. In to be publised in 2019 18th International Superconductive
Electronics Conference (ISEC), pages 1{3. IEEE, 2019.
[38] Fangzhou Wang and Sandeep Gupta. Static timing analysis (sta) with timing bleed: Cer-
tifying much higher performance for rapid single
ux quantum (rsfq) logic. In submitted
to 14th European Conference on Applied Superconductivity (EUCAS). Journal of Physics:
Conference Series (JPCS), 2019.
[39] Johannes A Delport and Coenrad J Fourie. A static timing analysis tool for rsfq and ersfq su-
perconducting digital circuit applications. IEEE Transactions on Applied Superconductivity,
28(5):1{5, 2018.
[40] Amol Inamdar, Denis Amparo, Bibhu Sahoo, Jie Ren, and Anubhav Sahu. Rsfq/ersfq cell
library with improved circuit optimization, timing verication, and test characterization.
IEEE Transactions on Applied Superconductivity, 27(4):1{9, 2017.
[41] Naveen Katam, Soheil Nazar Shahsavani, Ting-Ru Lin, Ghasem Pasandi, Alireza Shafaei, and
Massoud Pedram. Sport lab sfq logic circuit benchmark suite. Univ. Southern California,
Los Angeles, CA, USA, Tech. Rep, 2017.
[42] K Gaj, QP Herr, and MJ Feldman. Parameter variations and synchronization of rsfq circuits.
In Conference Series-Institute of Physics, volume 148, pages 1733{1736. IOP PUBLISHING
LTD, 1995.
[43] Igor V Vernik, Quentin P Herr, K Gaij, and Marc J Feldman. Experimental investigation
of local timing parameter variations in rsfq circuits. IEEE transactions on applied supercon-
ductivity, 9(2):4341{4344, 1999.
[44] A Erik Lehmann, Timur V Filippov, Saad M Sarwana, Dmitri E Kirichenko, Vladimir V
Dotsenko, Anubhav Sahu, and Deepnarayan Gupta. Embedded rsfq pseudorandom binary
sequence generator for multichannel high-speed digital data link testing and synchronization.
IEEE Transactions on Applied Superconductivity, 27(4):1{6, 2017.
125
[45] Gleb Krylov and Eby G Friedman. Test point insertion for rsfq circuits. In Circuits and
Systems (ISCAS), 2017 IEEE International Symposium on, pages 1{4. IEEE, 2017.
[46] Zhong John Deng, Nobuyuki Yoshikawa, Stephen R Whiteley, and Theodore Van Duzer.
Data-driven self-timed rsfq high-speed test system. IEEE transactions on applied supercon-
ductivity, 7(4):3830{3833, 1997.
[47] Arun A Joseph, Marcel HH Weusthof, and Hans G Kerkho. Application of dft techniques
to a 20 ghz superconductor delta adc. Microelectronics journal, 33(10):791{798, 2002.
[48] Jacob Savir. Scan latch design for delay test. In Test Conference, 1997. Proceedings., Inter-
national, pages 446{453. IEEE, 1997.
[49] Kamran Zarrineh, Shambhu J Upadhyaya, and Vivek Chickermane. System-on-chip testa-
bility using lssd scan structures. IEEE Design & Test of Computers, 18(3):83{97, 2001.
[50] ER Hsieh, Robert A Rasmussen, LJ Vidunas, and WT Davis. Delay test generation. In
Proceedings of the 14th Design Automation Conference, pages 486{491. IEEE Press, 1977.
[51] HYPRES. Niobium integrated circuit fabrication, 2015.
[52] Sergey K Tolpygo. Superconductor digital electronics: Scalability and energy eciency issues.
Low Temperature Physics, 42(5):361{379, 2016.
[53] Arun A Joseph, Javier Sese, Jakob Flokstra, and Hans G Kerkho. Structural testing of
the hypres niobium process. IEEE transactions on applied superconductivity, 15(2):106{109,
2005.
[54] Arun A Joseph, Sander Heuvelmans, Gerrit J Gerritsma, and Hans G Kerkho. The detection
of defects in a niobium tri-layer process. IEEE transactions on applied superconductivity,
13(2):95{98, 2003.
[55] Gordon L Smith. Model for delay faults based upon paths. In ITC, pages 342{351. Citeseer,
1985.
[56] Chin Jen Lin and Sudhakar M Reddy. On delay fault testing in logic circuits. IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and Systems, 6(5):694{703, 1987.
[57] Niraj K Jha and Sandeep Gupta. Testing of digital systems. Cambridge University Press,
2003.
[58] J Paul Roth. Diagnosis of automata failures: A calculus and a method. IBM journal of
Research and Development, 10(4), 1966.
[59] Prabhakar Goel. An implicit enumeration algorithm to generate tests for combinational logic
circuits. IEEE transactions on Computers, 1981.
[60] Fangzhou Wang and Gupta. An eective and ecient automatic test pattern generation
(atpg) paradigm for certifying performance of rsfq circuits. to be publised in IEEE Transac-
tions on Applied Superconductivity, 2019.
[61] Tom Kirkland and M Ray Mercer. A topological search algorithm for atpg. In Proceedings
of the 24th ACM/IEEE Design Automation Conference, pages 502{508. ACM, 1987.
[62] Franc Brglez. A neural netlist of 10 combinational benchmark circuits. Proc. IEEE ISCAS:
Special Session on ATPG and Fault Simulation, pages 151{158, 1985.
126
Abstract (if available)
Abstract
Rapid Single Flux Quantum (RSFQ) logic, based on Josephson Junctions (JJs), is seeing a resurgence as a way for providing high performance in the era beyond the end of physical scaling of CMOS. ❧ However, new characteristics of RSFQ technology necessitate development of new paradigms, models, and methods for characterization, verification, and testing essential for harnessing the benefits of RSFQ. ❧ In the first part of our research, we present a new method for characterization of RSFQ cells to expose a much larger set of vulnerabilities, a systematic approach for identifying the root causes of these vulnerabilities to guide the refinement of designs of cells, and a new way to extend test generation approaches to perform design verification at the circuit level. We demonstrate that our new methods and tools expose a large number of vulnerabilities and help identify root causes leading to refined cell designs which completely eliminate these vulnerabilities. Finally, we verify that our refined cells can indeed be composed to create error-free circuits. ❧ In the second part of our research, we systematically study the effect of process variations and other RSFQ-specific non-idealities. We show that because of the nature of its quantized pulse-based operation, even highly-distorted pulses are interpreted logically correctly by cells, but the timing is affected. Therefore, timing verification and delay testing increase in importance in RSFQ. We also show that due to the gate-level pipelined nature of RSFQ circuits, imposing a guard-band to resolve the setup time issue will reduce the performance dramatically and lose some of its performance, a key benefit of RSFQ. More importantly, inserting scan logic at every pipelined gate in RSFQ will cause astronomical area overheads. Therefore, increased clock-to-Q delay (i.e., timing bleed) must be allowed through multi-cycle paths. We develop a new static timing analysis method that allows larger increases in clock-to-Q delay, i.e., timing bleed, whenever the data input arrives late. We present results of simulations for benchmark circuits with process variations to demonstrate that our new method certifies much higher speeds for RSFQ logic. ❧ Furthermore, we develop a completely new paradigm for automatic test pattern generation (ATPG) to address these new phenomena in RSFQ technology to ensure that designs and fabricated chips provide desired performance. We identify delay excitation conditions, sensitization conditions, and conditions for propagation of the logic errors caused by timing violations due to process variations. Our ATPG utilizes these new phenomena to select multi-cycle paths as targets and to generate test patterns that are guaranteed to excite the worst-case delay along each target multi-cycle path. Finally, we present theoretical proofs and Monte Carlo simulation results for benchmark circuits under process variations to demonstrate that the patterns generated by our new ATPG are effective (invoke maximum delays of target multi-cycle paths) and efficient (require small numbers of patterns).
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Electronic design automation algorithms for physical design and optimization of single flux quantum logic circuits
PDF
Advanced cell design and reconfigurable circuits for single flux quantum technology
PDF
Trustworthiness of integrated circuits: a new testing framework for hardware Trojans
PDF
Multi-phase clocking and hold time fixing for single flux quantum circuits
PDF
Design and testing of SRAMs resilient to bias temperature instability (BTI) aging
PDF
Designing efficient algorithms and developing suitable software tools to support logic synthesis of superconducting single flux quantum circuits
PDF
A variation aware resilient framework for post-silicon delay validation of high performance circuits
PDF
Formal equivalence checking and logic re-synthesis for asynchronous VLSI designs
PDF
Towards a cross-layer framework for wearout monitoring and mitigation
Asset Metadata
Creator
Wang, Fangzhou
(author)
Core Title
Verification and testing of rapid single-flux-quantum (RSFQ) circuit for certifying logical correctness and performance
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
01/10/2020
Defense Date
12/10/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
automatic test pattern generation (ATPG),delay testing,Josephson junctions (JJs),multi-cell characterization,OAI-PMH Harvest,pattern-based dynamic timing verification,process variation,rapid single-flux-quantum (RSFQ),single-pattern delay test,static timing analysis (STA),superconducting devices,timing bleed
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gupta, Sandeep (
committee chair
), Beerel, Peter (
committee member
), Nakano, Aiichiro (
committee member
), Pedram, Massoud (
committee member
)
Creator Email
fangzhou.usc@gmail.com,fwangusc@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-260004
Unique identifier
UC11673007
Identifier
etd-WangFangzh-8115.pdf (filename),usctheses-c89-260004 (legacy record id)
Legacy Identifier
etd-WangFangzh-8115.pdf
Dmrecord
260004
Document Type
Dissertation
Rights
Wang, Fangzhou
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
automatic test pattern generation (ATPG)
delay testing
Josephson junctions (JJs)
multi-cell characterization
pattern-based dynamic timing verification
process variation
rapid single-flux-quantum (RSFQ)
single-pattern delay test
static timing analysis (STA)
superconducting devices
timing bleed