Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Towards high-performance low-cost AMS designs: time-domain conversion and ML-based design automation
(USC Thesis Other)
Towards high-performance low-cost AMS designs: time-domain conversion and ML-based design automation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Towards High-Performance Low-cost AMS Designs: Time-Domain Conversion
and ML-based Design Automation
by
Juzheng Liu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2023
Copyright 2023 Juzheng Liu
Dedication
To my dear parents and my beloved wife.
ii
Acknowledgements
I would like to express my heartfelt gratitude to all those who have supported and guided me
throughout this incredible journey of pursuing my Ph.D. This thesis would not have been possible
without their unwavering support and encouragement.
First and foremost, I extend my deepest appreciation to my Ph.D. advisor, Professor Mike
Shuo-Wei Chen, for his exceptional mentorship, invaluable insights, and continuous encouragement. I joined the group without any circuit background, and I barely had the confidence to finish
this Ph.D. journey. Fortunately, Professor Chen trusted me with valuable projects and spared no
effort supporting me throughout the years. His dedication to my academic growth and commitment to excellence has been instrumental in shaping the direction of my research.
I would like to extend my gratitude to Professor Hossein Hashemi, Professor Tony Levi, and
Professor Sandeep Gupta, for all the patient discussion and technical support in multiple projects
and programs.
I am grateful to the members of my thesis committee, Prof. Mike Chen, Prof. Hossein
Hashemi, and Prof. Yong Chen for their valuable feedback, constructive criticism, and expertise in their respective fields. Their thorough reviews and thoughtful suggestions have greatly
enhanced the quality of this thesis.
iii
I am indebted to my colleagues and friends, both within and outside the research group, for
their camaraderie, stimulating discussions, and unwavering support during challenging times.
I’ve learned a lot from my seniors, Shiyu Su, Tzufan Wu, Aoyang Zhang, Qiaochu Zhang, Mohsen
Hassanpourghad, Ce Yang, and Rezwan A Rasul. I still remember those old days when I was tried
by Shiyu and Qiaochu in the gym before Covid hit the world and those late-night discussions
with Aoyang during the tapeout times. I would like to also thank all the other group mates
including Mostafa Ayesh, Soumya Mahapatra, Maysara Hamada, Hsiang Chun Cheng, Mayank
Palaria, Khaled Hassan, and Mostafa Toubar. It has been a great time working with them. Their
friendship has made this academic journey much more enjoyable and rewarding.
I would also like to express my gratitude to MediaTek and the support from Dr. Gabriele
Manganaro, Dr. Ayman Shabra, and Stacy Ho for providing the research opportunity and the
technical support. It has been great working with them on the direct RF sampling ADC project.
Last but not least, I want to thank my family for their boundless love, encouragement, and
belief in my abilities. This thesis is dedicated to my parents Zhongwen Liu and Ping Zhou, who
have been loving me unconditionally even when I’m so far away from them, and my beloved wife
Emily Wanrou Wu, who has always been my sunshine and my haven. Their unwavering support
and understanding during the ups and downs of this journey have been a source of strength and
motivation.
iv
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Existing high-speed ADC architectures and limitations . . . . . . . . . . . . . . . 2
1.4 AMS design automation and challenges . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2: Time-domain ADC Architecture with Pipelined-SAR TDC . . . . . . . . . . . 6
2.1 Time-domain ADC advantages and design challenges . . . . . . . . . . . . . . . . 6
2.2 TDC architecture, operation, and noise analysis . . . . . . . . . . . . . . . . . . . 7
2.2.1 Overview of sub-gate-delay time step TDC architectures . . . . . . . . . . 8
2.2.2 TDC jitter contribution analysis . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Proposed Two-step Flash-SAR TDC . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Delay-tracking technique for SAR TDC throughput
enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Conventional synchronous pipeline TDC . . . . . . . . . . . . . . . . . . . 17
2.3.2 Proposed delay-tracking pipelining . . . . . . . . . . . . . . . . . . . . . . 19
2.4 ADC prototype and implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Top-level ADC structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 SAR TDC reference time generation and subtraction . . . . . . . . . . . . 22
2.4.3 Common mode ramp generation for VTC . . . . . . . . . . . . . . . . . . 26
2.4.4 Flash TDC and residue time generation . . . . . . . . . . . . . . . . . . . . 27
2.4.5 Time comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
v
Chapter 3: Direct-RF Sampling Time-domain ADC, SAR TDC redundancy, and Background Delay Offset Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.1 SAR TDC Delay Variablities and compensation . . . . . . . . . . . . . . . . . . . . 39
3.2 Pipelined-SAR TDC Structure with Background Delay Offset Calibration . . . . . 44
3.3 ADC Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Chapter 4: Machine-Learning-Based AMS Circuit Modeling . . . . . . . . . . . . . . . . 57
4.1 Transfer Learning for Efficient Post-Layout and Silicon-Level Modeling . . . . . . 59
4.1.1 Circuit modeling using MLP and supervised training . . . . . . . . . . . . 59
4.1.2 Transfer learning scheme for training sample reduction . . . . . . . . . . . 60
4.1.3 Mathematical analysis for TL . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1.4 Automatic circuit sizing with post-layout/silicon-level model . . . . . . . . 65
4.1.5 Transfer Learning experimental results . . . . . . . . . . . . . . . . . . . . 67
4.2 Graph Neural Network-based Circuit Modeling . . . . . . . . . . . . . . . . . . . . 77
4.2.1 Graph construction from a circuit topology . . . . . . . . . . . . . . . . . 78
4.2.1.1 Direct Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.1.2 Proposed Transistor-Pin-Specified Graph Construction . . . . . 79
4.2.1.3 Redundant Edge Removal . . . . . . . . . . . . . . . . . . . . . . 79
4.2.1.4 Node Input Feature Definition . . . . . . . . . . . . . . . . . . . 80
4.2.2 Circuit modeling using graph convolution . . . . . . . . . . . . . . . . . . 80
4.2.2.1 GCN Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2.2.2 Intuitive Explanation of GCN Circuit Modeling Advantages . . . 82
4.2.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.3.1 Voltage to Time Converter . . . . . . . . . . . . . . . . . . . . . 85
4.2.3.2 Three-Stage Amplifier . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Chapter 5: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.1 Summary of existing works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
vi
List of Tables
2.1 Performance Summary and Comparison with State-of-the-Art ADCs with Similar
Speed and Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Performance summary and comparison with the state-of-the-art direct RF
sampling ADCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Parameter list of the delta-sigma DAC example . . . . . . . . . . . . . . . . . . . . 68
4.2 VCO sizing results comparison (VCO 1 and 2 used for model training) . . . . . . . 76
4.3 GCN and FCNN detail for the VTC modeling . . . . . . . . . . . . . . . . . . . . . 87
4.4 GCN, FCNN, and CCINN detail for the amplifier modeling . . . . . . . . . . . . . 90
vii
List of Figures
2.1 Conceptual diagram of a time-domain ADC . . . . . . . . . . . . . . . . . . . . . 7
2.2 Existing fine resolution TDC architecture comparison. (a)Vernier TDC; (b)SAR
TDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Accumulated jitter comparison between different TDC architectures (normalized
to single-inverter jitter) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Two-step TDC conceptual block diagram . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 (a)SAR TDC conversion chain; (b)timing-diagram without pipelining; (c)timing
error caused by directly applying higher clock rate; (d)timing diagram of
synchronous pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 The proposed delay tracking pipelining structure and the corresponding timing
diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Top-level diagram of the 10GS/s 8-bit time-domain ADC . . . . . . . . . . . . . . 21
2.8 1-bit stage in the SAR TDC using the proposed selective delay tuning cell for
reference time subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Monte-Carlo simulation results of the reference time variation . . . . . . . . . . . 26
2.10 (a)Conventional ramp generation; (b)proposed common-mode ramp generation
implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.11 Schematic of the first-step Flash TDC and the residue time generation . . . . . . . 28
2.12 Schematic and operation timing diagram of the time comparator . . . . . . . . . . 29
2.13 Chip micro-photograph and layout detail . . . . . . . . . . . . . . . . . . . . . . . 30
viii
2.14 Measured output spectrum of a single ADC channel at Nyquist (2.5GHz) input
frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.15 Measured SNDR and SFDR of a single ADC channel over the input frequency
from 100MHz to 5GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.16 Measured output spectrum of the 2X TI ADC at Nyquist (5GHz) input frequency . 32
2.17 Measured SNDR and SFDR of the 2X TI ADC over the input frequency from
100MHz to 5GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.18 Measured SNDR and SFDR of the 2X TI ADC over ambient temperature variation 34
2.19 Measured SNDR and SFDR of the 2X TI ADC over supply voltage variation . . . . 36
2.20 Measured DNL and INL (7-bit precision level) . . . . . . . . . . . . . . . . . . . . 36
3.1 SAR TDC delay variabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Equivalent SAR TDC error models . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Delay tracking pipelined-SAR TDC structure . . . . . . . . . . . . . . . . . . . . . 43
3.4 Delay offset calibration algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Redundant stage block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.6 Overall ADC block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7 Bottom-plate sampling with top-plate ramp injection VTC, and the corresponding
timing diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.8 Chip microphotograph and the zoomed-in layout of a single channel . . . . . . . 49
3.9 Measured ADC output (decimated by 1025X) spectrum at low (100MHz, fold back
to 6MHz) and Nyquist (7.618GHz, fold back to 1MHz because of decimation)
input frequency, with and without the background delay offset calibration in
redundant stages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.10 Measured ADC SNDR/SFDR versus input signal frequency with or without the
background delay offset calibration in redundant stages. . . . . . . . . . . . . . . 52
3.11 Measured ADC SNDR/SFDR versus sampling frequency with or without the
background delay offset calibration in redundant stages. . . . . . . . . . . . . . . 53
ix
3.12 Measured ADC DNL and INL with or without the background delay offset
calibration in redundant stages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1 Surrogate-model-based circuit design automation and NN-based circuit modeling 58
4.2 Proposed transfer learning scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Schematic of the delta-sigma DAC example . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Schematic, layout, and die photo of the VCO example . . . . . . . . . . . . . . . . 68
4.5 Testing MSE comparison of the Sigma-Delta DAC example . . . . . . . . . . . . . 69
4.6 Relative prediction error comparison of the Sigma-Delta DAC example . . . . . . 71
4.7 Training and testing MSE loss comparison . . . . . . . . . . . . . . . . . . . . . . 72
4.8 Layout NN model prediction, post-layout simulation, and silicon testing results
comparison: (a) Fosc vs. Vctrl for VCO 1 (b) power vs. Vctrl for VCO 1 (c) Fosc
vs. Vctrl for VCO 10 (d) power vs. Vctrl for VCO 10 . . . . . . . . . . . . . . . . . 73
4.9 (a) Prediction errors of silicon result using schematic-, layout- and silicon- level
NN model (b) Fosc vs. Vctrl for VCO 10 (c) power vs. Vctrl for VCO 10 from
post-layout simulation, silicon-level model prediction and silicon testing . . . . . 74
4.10 Different graph construction options for a two-stage common-source amplifier.
(a) Circuit schematic. (b) Direct mapping with devices as nodes and wire
connections as edges. (c) Graph with transistor pin specified as individual nodes.
(d) Graph with redundant constant voltage connections removed. (e) Graph node
input feature definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.11 Circuit modeling using graph convolution followed by fully connected layers . . . 81
4.12 Intuitive explanation of circuit topology information embedded in the graph and
GCN circuit modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.13 Voltage to time converter schematic . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.14 (a) Testing MSE, (b) average SNDR prediction error, (c) average power prediction
error of the GCN and FCNN in the VTC interpolation test . . . . . . . . . . . . . 86
4.15 (a) Testing MSE, (b) average SNDR extrapolation error, (c) average power
extrapolation error of the GCN and FCNN in the VTC extrapolation test . . . . . 87
4.16 3-stage nested miller compensated amplifier schematic . . . . . . . . . . . . . . . 88
x
4.17 (a) Testing MSE, (b) average gain prediction error, (c) average UGB prediction
error of the GCN, FCNN, and CCINN in the amplifier interpolation test . . . . . . 89
4.18 (a) Testing MSE, (b) average gain extrapolation error, (c) average UGB extrapolation error of the GCN, FCNN, and CCINN in the amplifier extrapolation
test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
xi
Abstract
This thesis presents a time-domain ADC architecture for high-speed high-efficiency conversion
between analog and digital signals, together with an AMS design automation algorithm to reduce
the AMS circuit design time and cost.
For the time-domain ADC, we propose a selective delay tuning technique in the SAR TDC to
achieve high-efficiency time-to-digital conversion. To enhance the throughput, we further propose a delay-tracking pipelining technique to enhance the SAR TDC conversion speed without
significant power or noise overhead. The first ADC prototype was fabricated in 14nm FinFet
CMOS technology achieving 10GS/s 8-bit conversion with merely 2 time-interleaved channels
and 14.8mW power consumption. To apply the time-domain ADC to direct-RF sampling wireless communication systems, the ADC resolution needs to be further improved. Therefore, we
proposed to add several redundant stages towards the end of the SAR TDC conversion chain and
implemented a background delay-offset calibration scheme for the redundant stages to ensure
their accuracy. The second ADC prototype was fabricated in 4nm FinFet CMOS technology, and
the ADC achieved 16GS/s conversion speed with 44.5dB SNDR and the state-of-the-art 153.8dB
Schreier power efficiency.
xii
Besides the circuit structure innovation, this thesis also describes a transfer learning algorithm
for efficient and accurate circuit modeling. We proposed to leverage the well-trained schematiclevel circuit NN model and transfer the information to a post-layout/silicon-level circuit model
with few training samples. Modeling and circuit optimization examples are demonstrated to
verify the efficiency and effectiveness of the proposed algorithm.
xiii
Chapter 1
Introduction
1.1 Thesis Organization
This thesis is organized as follows:
Chapter 1 discusses the application, motivation, and challenges of Analog Mixed Signal circuits, and the existing circuit architectures and design automation algorithms to improve the
circuit efficiency and reduce the design cost.
Chapter 2 describes the proposed time-domain ADC using delay-tracking pipelined SAR TDC
to achieve state-of-the-art single-channel conversion speed and power/area efficiency.
Chapter 3 describes a direct-RF sampling time-domain ADC with the proposed SAR TDC
calibration scheme and bottom plate sampling voltage-to-time converter for further improved
signal-to-noise-and-distortion-ratio and power efficiency.
Chapter 4 describes several proposed sampling and modeling algorithms to improve the AMS
circuit modeling accuracy and efficiency, making AMS design automation more feasible.
Chapter 5 summarizes the results of this dissertation and discusses the future research direction.
1
1.2 Background
Analog Mixed Signal (AMS) circuits such as analog-to-digital converters (ADCs) and digital-toanalog converters (DACs) serve as the interface between the physical world and digital systems.
High perform AMS circuits are widely demanded in applications such as wireline data links,
5G/6G wireless communication, autonomous vehicles, and instrumentations [32, 20, 25, 14, 18,
19]. On the other hand, with the ever-increasing requirement for higher performance and lower
cost, AMS circuit design is facing various challenges. From the overall system budget perspective, AMS circuits are usually the most power and area-hungry blocks. For instance, the ADC can
consume up to 75% of the overall system power in a wireline receiver [13, 22]. Meanwhile, AMS
circuits usually contain passive components such as capacitors, resistors, and/or inductors, which
reduce the benefit from technology scaling, and vastly increase the fabrication cost in advanced
technology nodes. Furthermore, unlike digital circuits that can be synthesized by sophisticated
Very Large Scale Integration (VLSI) tools, AMS circuits require rigorous manual design, layout,
and simulation flows, especially in advanced technology nodes where design rules can be stringent and complicated.
1.3 Existing high-speed ADC architectures and limitations
Among all AMS circuits, high-speed (> 10GS/s) ADCs are one of the most representative yet
most challenging circuit blocks. In the past decades, several architectures were developed to
improve ADC efficiency. Successive approximation registered (SAR) ADCs are one of the most
power-efficient ADC architectures thanks to the binary conversion scheme. However, even when
the asynchronous SAR logic [21, 33] is implemented, the SAR loop limits the conversion speed.
2
Massive time-interleaving (>32 channels) [36, 42] was often required to achieve high conversion
speed, which can vastly increase the overall area consumption, introduce a heavy capacitive
loading effect to the ADC input driver, and make the ADC performance vulnerable to clock skew
and inter-channel mismatch.
To minimize these disadvantages of extensive time-interleaving, increasing the single-channel
conversion speed can be an effective solution. Existing design approaches involve Flash ADCs
and pipeline ADCs. Flash ADC achieves a high conversion speed by simultaneously comparing
the input to all the reference voltage levels at the cost of high power consumption and limited
resolution (<6bits) because of the exponentially scaled resistor and comparator array [4, 41, 43].
Pipeline ADCs [38, 39] improve the conversion speed by a multi-stage pipeline structure. The
residue voltage of each stage is amplified and propagates to the following stage for finer conversion. The interstage residue amplification consumes significant power and may suffer from
insufficient headroom, especially in advanced technology nodes. In summary, each of the conventional high-speed ADC architectures has its unique limitations.
1.4 AMS design automation and challenges
Apart from the effort spent on circuit structure development, AMS design automation is also a
prevailing research topic in the past decades, targeting a much-reduced design cost and timeto-market. Compared to digital circuits, AMS circuits have much more design parameters, larger
design space (parameter range), and non-intuitive parameter-to-metric (P2M) functions. To tackle
these issues, different directions were tried including fully synthesized AMS circuits [35], blackbox optimization algorithms such as Bayesian Optimization [29] and Reinforcements Learning
3
[37], and surrogate-model-based circuit design automation [8, 24]. Nevertheless, these design
approaches are not heavily used by AMS designers for research or industrial design due to some
impractical drawbacks. Fully synthesized AMS circuits can be created with VLSI CAD tools but
at the expense of limited design space and performance. Black-box optimization requires a large
number of real-time circuit simulations for each different design target, which makes the optimization very time-consuming.
To mitigate the heavy computational cost, surrogate models are utilized to efficiently evaluate
circuit performance while performing the optimization. Various algorithms have been tried for
the circuit P2M function modeling including polynomial regression [10], support vector machine
(SVM) [12], and fully connected neural network (FCNN) [40, 44]. However, all the existing modeling algorithms treat the circuit under modeling as a black-box function without incorporating the
circuit topology information. As a result, the surrogate model only performs interpolation with
limited accuracy in the given design space of the training dataset. When extrapolating the model
to unseen design space, the modeling accuracy can be even much worse. To improve the modeling accuracy, excessive training sample simulations are required, which makes the modeling
impractical especially when Layout Parasitics Extraction information is considered.
1.5 Research Contribution
Facing the aforementioned challenges, we made efforts in both directions, to develop high-speed
high-efficiency ADC architectures and high-accuracy low-cost circuit modeling algorithms. On
the ADC architecture side, a time-domain ADC using pipeline-SAR TDC was proposed to vastly
increase the single-channel conversion speed and power/area efficiency. To further improve the
4
signal-to-noise-and-distortion ratio (SNDR) of the time-domain ADC, a lumped delay calibration
scheme and a bottom plate sampling voltage-to-time converter (VTC) structure were proposed.
The ADC prototypes achieve superior power/area efficiency compared to other state-of-the-art
ADCs with similar speed and resolution. On the design automation side, Bayesian Optimization
Aided Sampling was proposed to determine the optimal sampling region. Transfer Learning was
proposed to reduce the post-layout simulation cost by reusing the well-trained schematic-level
circuit model. To further improve the modeling efficiency and accuracy, a Graph Neural Network
based circuit modeling algorithm was proposed to utilize the circuit topology information.
5
Chapter 2
Time-domain ADC Architecture with Pipelined-SAR TDC
2.1 Time-domain ADC advantages and design challenges
Different from the prevailing pipeline or SAR ADCs that finish the conversion purely in the
voltage domain, a time-domain ADC first converts the sampled voltage into time-domain pulse
width and further quantizes the pulse width using a time-to-digital converter (TDC), as shown in
Fig. 2.1. Unlike voltage-domain conversions that require large arrays of passive components, a
TDC is typically constructed of dynamic delay cells, which occupy a much smaller area and can
scale with technology. On the other hand, VTC and TDC design in a high-speed time-domain
ADC also presents unique challenges. A time-domain ADC requires the extra step of voltage-totime-conversion, which can introduce extra noise and nonlinearity. Furthermore, as is discussed
in [47], the maximum time-domain signal amplitude generated by the VTC is limited by the ADC
clock cycle, which requires the TDC to achieve sub-gate-delay time steps and even lower jitter
noise to achieve >6-bit quantization at a high single-channel sampling rate.
Existing TDC architectures, such as Vernier TDC [45], passive pulse shrinking (PPS) TDC [7],
and phase interpolation TDC [47], are capable of achieving sub-gate-delay time steps. However,
6
Figure 2.1: Conceptual diagram of a time-domain ADC
these TDCs are implemented under thermometer-coded conversion schemes, which require the
input signal to propagate through a long conversion chain, resulting in high power consumption
and substantial accumulated jitter noise. To increase the single-channel conversion speed without sacrificing the ADC SNR, we utilized the SAR TDC [5] architecture using the binary search
scheme. It reduced the number of total delay stages and achieved a power-efficient and low-noise
time-to-digital conversion.
2.2 TDC architecture, operation, and noise analysis
As a unique challenge of the time-domain ADC design, the full swing of the TDC input timedomain signal (from the VTC) is limited by the clock cycle. As an explanation of the VTC operation, if half of the clock cycle is assigned to input voltage tracking and the other half of the cycle
is used to generate the time-domain signal pulses, then the full time range TF R generated by the
VTC is limited to
TF R =
1
2
TCLK =
1
2Fs
, (2.1)
7
where Fs is the ADC sampling frequency. From here we can derive the SNR of the TDC as
SNR =
1
2
T
2
F R/(σ
2
j,T DC +
t
2
LSB
12
)
=
1
8F2
s
(σ
2
j,T DC +
t
2
LSB
12 )
,
(2.2)
where σj,T DC is the RMS jitter generated by the TDC, and tLSB is the minimum time step of the
TDC. As a realistic example, to design a single channel time-domain ADC with 5GS/s speed and
7 effective number of bit (ENOB), the requirement of the jitter and quantization noise is
r
(σ
2
j,T DC +
t
2
LSB
12
) < 350fs. (2.3)
If we estimate the noise contribution from the jitter and quantization noise to be equal, the above
requirement can be specified as σj,T DC < 250fs and tLSB < 0.87ps. To achieve such stringent
jitter and time step requirements, in this section, we will analyze the jitter contribution of different
TDC architectures that can realize sub-gate-delay time steps and seek reasonable TDC design
guidelines to minimize the jitter contribution.
2.2.1 Overview of sub-gate-delay time step TDC architectures
Unlike the voltage domain conversion, where the voltage signal can be stored on a capacitor and
held for manipulation, the signal invariably propagates in the TDC conversion chain in the time
domain. Therefore, most of the previous designs chose to generate all of the time quantization
levels simultaneously in a thermometer-coded manner. Existing thermometer-coded TDC architectures such as Vernier TDC [45], PPS TDC [7], and interpolation-based TDC [47] are capable of
8
Figure 2.2: Existing fine resolution TDC architecture comparison. (a)Vernier TDC; (b)SAR TDC
achieving sub-gate-delay time steps. Besides the drawback of exponentially scaling hardware and
limited power/area efficiency, each architecture has its own unique limitation. A Vernier TDC involves a long conversion chain that can limit the throughput of the TDC. For PPS TDCs, the pulse
shrinking process is non-linear and can cause distortion to the TDC output. For interpolationbased TDCs, the phase interpolator requires relatively slower input rising/falling edges to achieve
accurate interpolation, which can significantly increase the TDC jitter. As an example, we consider a case where a 5-bit interpolation-based TDC in 14nm CMOS technology provides a quantization range of 16ps. To achieve accurate interpolation (interpolation error < 0.5tLSB) between
two rising edges with a 16ps time difference, simulations show that the input rising edge slope of
a phase interpolator needs to be five times slower than the sharpest possible slope for fanout of
one and the jitter power of a phase interpolator is 22 times higher than that of an inverter with
the sharpest input slope.
9
Taking the conventional Vernier TDC as an example, the conversion chain is illustrated in Fig.
2.2(a). The input of the TDC is the rising edge time difference of two pseudo-differential input
pulses. The time step tLSB is realized as the delay difference of the unit delay cell (two inverters)
on the positive and negative side, where tinv is the delay of an inverter without extra loading. For
an N-bit conversion, the whole conversion chain needs 2 × (2N − 1) delay cells to generate 2
N
quantization levels, and the input rising edge time difference is quantized into thermometer codes
T⟨0⟩ to T⟨2
N − 1⟩. With tLSB << tinv, the rising/falling edge at each inverter input/output can
be kept sharp to minimize the jitter of each unit delay cell. However, since the number of delay
stages scales exponentially with the design resolution, the total jitter can still be significant.
To reduce the number of delay stages in the conversion chain, there was an effort to incorporate the successive approximation search scheme into the TDC [5], as illustrated in Fig. 2.2(b).
Similar to the voltage comparison and binary reference voltage subtraction in a conventional SAR
ADC, a SAR TDC performs rising edge comparison and binary reference time subtraction in each
bit conversion. Since the time-domain signal cannot be held, an extra delay τcomp is inserted in the
signal paths to wait for the time comparator to resolve. The following reference time subtraction
is realized by the selection of different delays on the positive and negative sides according to the
comparison results, and the binary reference time steps are realized by the proper sizing or loading of the unit delay cells. Compared with thermometer-coded TDCs, the binary search iterations
can reduce the number of conversion stages, and require fewer delay cells and comparators. Under process, supply voltage and temperature (PVT) variations, active-delay-cell-based TDCs have
similar behaviors, regardless of the TDC architecture. The time steps of one TDC vary across different PVT corners with similar ratios, so the TDC full scale varies with PVT without significant
changes in TDC radix. As for the TDC jitter, although fewer delay cells are needed for SAR TDCs,
10
the slow rising/falling edge slopes for large reference time steps also significantly contribute to
the total jitter.
2.2.2 TDC jitter contribution analysis
Since different TDC architectures each have their own noise contribution disadvantages, it is
important to analyze the noise contribution mathematically in the high-speed time-domain ADC
scenario to determine the optimal design architecture. Considering that each inverter has an input
referred trip voltage root mean square(RMS) thermal noise Vn, the RMS jitter of the inverter delay
can be estimated as
σj,inv =
Vn
S
, (2.4)
where S is the input rising/falling edge slope of the inverter. As a simplified estimation, we
assume that the inverter is balanced with trip voltage Vtrip = VDD/2 and a constant pull-up/pulldown current. The propagation delay of an inverter in the conversion chain can then be estimated
as [1]
tinv =
VDDCload
2Ipull
=
VDD
2S
, (2.5)
and equation (2.4) can be reformulated into
σj,inv =
2Vntinv
VDD
. (2.6)
11
Assuming all of the inverters are equally sized with the same Vn (which means the delay cell in
the Vernier TDC is identical to the delay cell in the LSB stage of the SAR TDC), the accumulated
jitter in an N-bit Vernier TDC, as shown in Fig. 2.2(a), can be estimated as
σ
2
j,V er = (2N − 1)[3t
2
inv + (tinv + tLSB)
2
]
4V
2
n
V
2
DD
. (2.7)
In the SAR TDC, if we define k as the number of extra cascaded inverters in the signal paths
to compensate the comparator delay (τcomp), then the total jitter contribution (ignoring the MUX
jitter) for an N-bit SAR TDC, as shown in Fig. 2.2(b), is
σ
2
j,SAR =
N
X−1
n=1
[(2k + 3)t
2
inv + (tinv + 2n−1
tLSB)
2
]
4V
2
n
V
2
DD
. (2.8)
To intuitively compare the jitter contribution of different TDC structures, we can take tLSB =
tinv/8 and plot the accumulated jitter (normalized to single-inverter jitter noise power) of the
Vernier TDC and SAR TDC (with different k value) over the TDC resolution, as is presented in
Fig. 2.3. For a low-resolution (N < 4) scenario, a Vernier TDC is less noisy, since there is no
need to insert extra comparator delays in the signal path. For a medium-resolution (4 ≤ N ≤ 6)
TDC, SAR TDC is a better choice, as its number of delay stages does not scale exponentially.
However, it is not reasonable to use a sub-gate-delay time step SAR TDC to cover an arbitrarily
large input range for N > 6. The large reference time steps are generated by exceptionally slow
rising/falling edges, which can severely increase the jitter contribution.
12
Figure 2.3: Accumulated jitter comparison between different TDC architectures (normalized to
single-inverter jitter)
13
Figure 2.4: Two-step TDC conceptual block diagram
2.2.3 Proposed Two-step Flash-SAR TDC
To further minimize the jitter of the TDC but still achieve 8-bit resolution, it is essential to limit
the time range of the sub-gate-delay time step TDC by dividing the 8-bit time quantization into
two steps [7, 47]. In the two-step TDC, the large input time range is covered by a first-step TDC
with coarse time steps. The residue time is then generated and further quantized by the fine SAR
TDC, as shown in Fig. 2.4. With the two-step architecture, here we define the coarse TDC as the
TDC whose time steps are generated by the absolute delay of cascaded (≥ 2) inverters rather than
the relative delay difference of two delay cells, and the fine TDC as the sub-gate-delay time step
TDC. In this way, we can maintain the sharpness of the rising/falling edges and reduce the overall
jitter. For instance, to realize a 2tinv time step with the relative delay difference, the generated
jitter power is 12σ
2
j,inv, according to equation (2.8) with 2
n−1
tLSB = 2tinv and excluding k, which
is much higher than the 2σ
2
j,inv jitter power when the time step is generated by the absolute delay
of two cascaded inverters.
Following the above guideline, a proper TDC topology is needed for the coarse TDC to minimize the jitter. If the coarse TDC adopts Flash topology with Nc bits to cover an input full time
range TF R, the number of inverters needed for a single time step is
Ninv,step = TF R/(2Nc
tinv), (2.9)
14
and the accumulated jitter, which is proportional to the total number of inverters in the conversion chain, can be estimated as
σ
2
j,c,F lash = (2Nc − 1)Ninv,stepσ
2
j,inv ≃
TF R
tinv
σ
2
j,inv. (2.10)
If the coarse TDC adopts the SAR topology, the number of inverters needed for each binary scaled
time step also increases exponentially, and the accumulated jitter is
σ
2
j,c,SAR =
N
Xc−1
n=0
[2nNinv,step + 2k]σ
2
j,inv
=[(2Nc − 1) TF R
2
Nc tinv
+ 2kNc]σ
2
j,inv > σ2
j,c,F lash.
(2.11)
The comparison between equation (2.10) and (2.11) shows that the SAR topology is less efficient when applied to the coarse TDC. Intuitively speaking, in the fine TDC, the number of
inverters needed for each binary scaled time step is constant (= 4 in this work); while in the
coarse TDC, the number of inverters needed for each binary time step increases exponentially.
Therefore, compared to the thermometer-coded conversion scheme, the binary search scheme
can help to reduce the number of inverters needed in the fine TDC, but not in the coarse TDC.
Accordingly, the Flash topology is a better choice for the coarse TDC, since the same number of
inverters is needed to cover the time range for both topologies, but the SAR topology requires an
extra delay in the signal path to hold the signal and wait for the comparator to resolve. So, the
whole TDC should be constructed into two steps as a coarse Flash TDC followed by a fine SAR
TDC.
15
After determining the coarse TDC topology as Flash, the number of bits assigned to the firststep TDC also needs to be deliberately selected to minimize the overall jitter. Equation (2.10)
indicates that the total accumulated jitter of the coarse Flash TDC mainly depends on the maximum input time range and the single-inverter delay, regardless of the number of TDC time steps.
Intuitively speaking, if TF R is divided into more time steps, fewer inverters are needed for each
time step. Therefore, according to Fig. 2.3, the overall jitter can be minimized by reducing the
number of bits in the second step and pushing more bits to the first-step Flash TDC until the
coarse time step shrinks to the minimum gate (two-inverter) delay.
As a realistic example, in this work, with the tLSB limited by equation (2.2) and (2.3), the
second-step TDC needs to reach a 5-bit resolution to make the coarse time step reach the minimum gate delay. Thus, we have implemented the 8-bit TDC into a first-step 3-bit Flash TDC
followed by a second-step 5-bit SAR TDC. Assuming all the inverters in the two-step TDC are
sized equally, with the full time range equals to 16 times of the single inverter delay, the firststep TDC only accumulates 16σ
2
j,inv jitter power. Accordingly, the total jitter noise power of this
two-step TDC is significantly lower than a one-step 8-bit SAR TDC, as is illustrated in Fig. 2.3.
2.3 Delay-tracking technique for SAR TDC throughput
enhancement
Although the two-step (Flash TDC followed by SAR TDC) time-domain ADC reduces the accumulated jitter and power consumption, the conversion speed bottleneck lies in the successive
approximation iterations of the SAR TDC. While the first-step coarse Flash TDC can generate all
the quantization levels and perform the comparison simultaneously, the second-step SAR TDC
16
requires the input signal to propagate through a much longer delay chain to complete the conversion. In the SAR TDC, if we combine each comparison and the following reference time subtraction shown in Fig. 2.2(b) as a one-bit stage, the pseudo-differential pulses of each sample need to
propagate through four stages besides the last comparator to finish a 5-bit conversion, as shown
in Fig. 2.5(a). If the comparator in each stage is controlled by the same clock, the comparator must
be enabled before the input sample arrives and can only be reset after the sample propagates to
the last comparator and finishes the last bit conversion, as shown in Fig. 2.5(b). Therefore, the
minimum clock cycle of the SAR TDC is limited to
TCLK > 4τstage + tconv, (2.12)
where τstage is the signal propagation delay of each stage, and tconv is the time required for the
time comparison and output bit storage. Meanwhile, after the sample propagates through the first
stage and the first bit conversion is finished, the first stage remains idle for the rest of the clock
cycle. It allows us to pipeline the SAR TDC by sending a second sample into the conversion chain
before the first one propagates to the last stage, thereby improving the throughput. Nevertheless,
directly applying a higher clock/sample rate to the SAR TDC to achieve pipelining can cause
timing errors in the TDC operation. For instance, as demonstrated in Fig. 2.5(c), the second
comparator is reset before the second stage conversion is finished.
2.3.1 Conventional synchronous pipeline TDC
To eliminate the timing error, the existing pipeline TDC is realized in a synchronous fashion [15],
similar to the voltage domain pipeline ADC. Rather than performing the second-bit conversion in
17
Figure 2.5: (a)SAR TDC conversion chain; (b)timing-diagram without pipelining; (c)timing error
caused by directly applying higher clock rate; (d)timing diagram of synchronous pipelining
the same clock cycle with insufficient time, the synchronous pipeline TDC pushes the second-bit
conversion to the next clock cycle and restores the conversion time. Unfortunately, the timedomain signal cannot be held, so an extra delay needs to be inserted in each stage to match the
18
Figure 2.6: The proposed delay tracking pipelining structure and the corresponding timing diagram
signal delay with the clock cycle, as shown in Fig. 2.5(d). The extra delay cells can incur extra
jitter and power/area consumption. Furthermore, under PVT variations, the propagation delay
of each stage can vary significantly, leading to timing violation of TDC logic.
2.3.2 Proposed delay-tracking pipelining
To achieve pipelining without the aforementioned drawbacks, we propose the delay-tracking
pipelining to increase the throughput of the SAR TDC, as shown in Fig. 2.6. Rather than applying a global clock signal to the comparators in all of the stages, the clock of each stage is delayed
by τstage from the previous stage’s clock. The key objective of the delay-tracking pipelining is
19
to match the clock delay with the propagation delay of the corresponding stage. In one embodiment, this can be done via replica circuits. This ensures that the time duration allowed for each
bit conversion is the same, thus allowing pipelining without timing errors. Accordingly, each
comparator only needs to be enabled for its corresponding 1-bit conversion (instead of the entire
5-bit conversion) before a reset. In comparison to equation (2.12), the clock cycle limitation is
relaxed to
TCLK > tconv. (2.13)
In practical implementations, an extra timing margin needs to be added to the clock cycle before
and after the actual conversion to avoid timing violation caused by delay mismatch between the
clock delay path and the signal path.
This pipelining approach has two main advantages comparing to the conventional sync.
pipeline structure. First of all, the inserted delay cells for pipelining are in the clock path; thus,
its associated jitter does not contribute to the noise floor of the SAR TDC. Therefore, the inserted
delay cells can use minimum transistor sizes and minimize the power and area consumption
overhead of pipelining. Secondly, the TDC operation is robust against timing violations caused
by PVT variation. With the clock delay path laid out next to the signal path, the delay variations of the two paths are approximately the same under different PVT corners. Regardless of
the absolute delay value, as long as the clock delay tracks the signal propagation delay of the
corresponding stage, the time duration allowed for each bit conversion is constant. Therefore,
the delay tracking pipelining scheme can properly operate under PVT variations.
20
Figure 2.7: Top-level diagram of the 10GS/s 8-bit time-domain ADC
2.4 ADC prototype and implementation
2.4.1 Top-level ADC structure
The proposed delay-tracking pipelined-SAR TDC is incorporated as part of the two-step timedomain ADC and prototyped in 14nm CMOS technology achieving 8-bit design resolution and
10GS/s sample rate. As shown in Fig. 2.7, the entire ADC consists of two time-interleaved subADC channels, with each channel running at 5GS/s. For each sub-ADC, the VTC first converts
the sampled differential input voltage into the rising edge time difference between two pseudodifferential pulses P and N. Subsequently, the generated pulses go through a two-step TDC. The
coarse quantizer (i.e., the first step) uses a Flash TDC to cover the full time range of ±64ps and
quantize it into 3 bits. After the Flash TDC, the residue time pulses (RP and RN ) are created based
on the coarse decision results, with the rising edge time difference within ±8ps. They are then
further quantized by the following 5-bit delay-tracking pipelined-SAR TDC, serving as the fine
quantizer of the two-step TDC. The thermometer-coded quantization result from the first-step
21
Figure 2.8: 1-bit stage in the SAR TDC using the proposed selective delay tuning cell for reference
time subtraction
TDC is encoded into binary and aligned with the second-step TDC output. For testing purposes,
the ADC output is decimated by 729 times.
The following sub-sections discuss the implementation and design consideration for critical
building blocks, including SAR TDC reference time generation, VTC and Flash TDC.
2.4.2 SAR TDC reference time generation and subtraction
To perform the successive approximation algorithm in the time domain, it is necessary to subtract
the input rising edge time difference with a certain reference time after each bit comparison. This
can be realized by varying the delay of the pseudo-differential pulses based on the comparator
output. The design consideration for this reference time subtraction is the low implementation
cost and robustness against variability. As shown in Fig. 2.2(b), the conventional approach is to
multiplex different fixed delays on the positive and negative side according to the comparison
22
result (which will be referred to as Vernier-based reference time generation hereafter). However,
this approach requires four delay cells and two multiplexers for each bit conversion, which can
cause extra jitter and power consumption.
To mitigate the drawbacks of conventional reference time subtraction, we propose a selective
delay tuning cell that directly tunes the signal delay on the positive and negative side according
to the comparison result, as shown in Fig. 2.8. For instance, when the pulse on the positive side
has an earlier rising edge and the comparator’s decision result is ONE, extra pull-down current
at the MID node will be enabled for the negative side, resulting in a faster falling edge and a
shorter delay on the negative side. The delay difference between the positive and negative side
serves as the reference time and is subtracted from the input rising edge time difference once the
pseudo differential pulses propagate through the SDT cell. To create the binary scaled reference
time between SDT cells used from the MSB to LSB stages, we progressively reduce the tunable
pull-down current at the MID node via proper transistor sizing. Compared to the Vernier-based
reference time generation, this approach only requires two delay cells for each SAR stage and
eliminates the multiplexers, which reduces the power and area cost of the SAR TDC. Furthermore,
this SDT approach yields a much smaller delay variation when device mismatch is considered.
The detailed analysis is as follows.
According to the SDT cell structure, the reference time generated by the SDT cell is the delay
difference caused by different falling edge slopes at the MID node, which can be expressed as
Tref,SDT = Vt2Cmid(
1
Imid
−
1
Imid + Itune
)
≃ Vt2Cmid
Itune
I
2
mid
,(Itune << Imid),
(2.14)
23
where Vt2 represents the trip voltage of the second inverter, Cmid is the capacitance at the MID
node, Imid is the MID node pull-down current from the first inverter, and Itune is the tunable
pull-down current according to the selection signal. For derivation purposes, if we view the
variation of any term as infinitesimal and ignore high-order infinitesimal, we can further derive
the reference time variation as
δTref,SDT =δ(
Vt2Cmid
Imid
)
Itune
Imid
+
Vt2Cmid
Imid
δ(
Itune
Imid
)
=
δ(Vt2Cmid)Itune
I
2
mid
−
2δ(Imid)Vt2CmidItune
I
3
mid
+
δ(Itune)Vt2Cmid
I
2
mid
.
(2.15)
Assuming δ(Vt2Cmid), δ(Imid), and δ(Itune) are independent and follow normal distribution,
statistically we can get
δ
2
(Tref,SDT ) =δ
2
(Vt2Cmid)I
2
tune
I
4
mid
+
δ
2
(Itune)(Vt2Cmid)
2
I
4
mid
+
4δ
2
(Imid)(Vt2CmidItune)
2
I
6
mid
.
(2.16)
Considering the fact that the pull-down current Imid and Itune are summation of unit current
from unit NMOS transistors, statistically we can estimate that
δ
2
(I) ∝ I, so
δ
2
(Itune) = Itune
Imid
δ
2
(Imid).
(2.17)
24
Therefore,
δ
2
(Tref,SDT ) =
δ
2
(Vt2Cmid)I
2
tune
I
4
mid
+
δ
2
(Imid)(Vt2CmidItune)
2
I
6
mid
(4 + Imid
Itune
)
< [
δ
2
(Vt2Cmid)
I
2
mid
+
δ
2
(Imid)(Vt2Cmid)
2
I
4
mid
](4 + Imid
Itune
)
I
2
tune
I
2
mid
= (4 + Imid
Itune
)(Itune
Imid
)
2
δ
2
(
Vt2Cmid
Imid
).
(2.18)
Taking the standard deviation from both sides, we can get
δstd(Tref,SDT ) <
r
4 +
Imid
Itune
(
Itune
Imid
)δstd(
Vt2Cmid
Imid
).
(2.19)
For each of the four binary-scaled reference time steps used in this design with Itune/Imid ≃
1/8, 1/4, 1/2, or 1, we can calculate the standard deviation upper bound as 0.43, 0.71, 1.22, or 2.3
times of the delay variation of a single inverter. As for a conventional Vernier-based reference
time generation cell where the reference time is created by the delay difference of two fixed delay
cells (each has two inverters), we can estimate its reference time variation as
δ
2
(Tref,V ernier) = 4δ
2
(
VtC
Ipull
), or
δstd(Tref,V ernier) = 2δstd(
VtC
Ipull
).
(2.20)
For comparison purposes, we assume that the inverters in the SDT cell and the Vernier-based
reference time generation cell are equally sized with the same delay variation standard deviation. Taking the smallest (LSB) reference time step as an example, we can reach a conservative reference time variation comparison between the proposed SDT cell and the conventional
Vernier-based reference time generation cell as
25
Figure 2.9: Monte-Carlo simulation results of the reference time variation
δstd(Tref,SDT,LSB) <
1
5
δstd(TV ernier,LSB). (2.21)
The theoretical prediction and Monte-Carlo simulation results of reference time variation are
shown in Fig. 2.9. For each binary scaled reference time step used in this design, the proposed
SDT can achieve a much smaller reference time variation compared to the conventional Vernierbased approach.
2.4.3 Common mode ramp generation for VTC
To linearly encode the voltage domain information into the rising edge time difference of two
pseudo-differential pulses, one approach involves ramping up the sampled differential voltages
and comparing them to a certain voltage threshold in the VTC. In conventional time-domain ADC
26
Figure 2.10: (a)Conventional ramp generation; (b)proposed common-mode ramp generation implementation
designs, the ramps are generated by directly charging the sampling capacitors with two separated
current sources [47][7], as shown in Fig. 2.10(a). However, this approach is vulnerable to the
current source mismatch. Additionally, the charge stored on the sampling capacitor is kicked back
to the input voltage source at the beginning of each sampling phase and can affect the following
sample. This memory effect can be especially severe at high sampling speeds. To overcome these
drawbacks, we propose the common mode ramp generation scheme, in which a common mode
capacitor (Ccm) is attached to the center node of the two sampling capacitors, as depicted in Fig.
2.10(b). Ccm will be charged by a single current source during the hold phase and push up the
sampled voltages as ramps without mismatch. Meanwhile, instead of discharging through the
input voltage source only, the charge on Ccm is discharged to the ground at the beginning of the
next sampling phase. By sizing up the Ccm discharging switch, we can significantly alleviate the
charge kickback problem and improve the linearity of the entire time-domain ADC.
2.4.4 Flash TDC and residue time generation
As an important component of the two-step TDC, the first-step Flash TDC along with the residue
time generation block are shown in Fig. 2.11. The input pulses from the VTC are first delayed
27
Figure 2.11: Schematic of the first-step Flash TDC and the residue time generation
by three coarse delay cells (each with a nominal delay of 16ps). Those delayed pulses are crosscompared with the input pulses via time comparators, which effectively create eight quantization
levels (i.e., 3b Flash TDC). Afterward, the residue time is generated by selecting the closest two
rising edges between the input pulses and the coarsely delayed pulses. Notably, the residue time
range is re-adjusted from [0, 16ps] to [-8ps, 8ps] by subtracting an 8ps time step via an SDT cell
to properly interface with the following-stage pseudo-differential SAR TDC operation.
2.4.5 Time comparator
The schematic and the corresponding timing diagram of the time comparator used in the twostep TDC is shown in Fig. 2.12. The latch-based time comparator is controlled by the CLK signal.
When the CLK signal is HIGH, the comparator is enabled, and the first input rising edge (PP or
PN ) will trigger the latch. For instance, as shown in Fig. 2.12, if the input rising edge on the
positive side (PP ) arrives first, the B¯ node is pulled to ground, and the comparator output is
latched to B = 1, B¯ = 0. After the comparison is finished, the CLK signal turns to LOW and
28
Figure 2.12: Schematic and operation timing diagram of the time comparator
resets the comparator to the initial state for the next comparison. In this design, the duty cycle
of the 5GHz CLK signal is adjusted to 80% to create a sufficient comparison window.
Comparator metastability is one key problem for ADC designs. For time-domain ADCs, the
metastability state can occur when the two input rising edges are too close to each other. Although time-domain ADCs are not immune to this issue, it shows a significantly lower possibility
of entering a metastability state compared to a voltage-domain ADC. As shown in Fig. 13, when
the input time is small, the equivalent input voltage can be regarded as:
∆Veq = ∆TinSrise (2.22)
where Srise is the rising edge slope of the comparator input. The δTin is generated from the
input voltage by the VTC as:
∆Tin = ∆Vin/SV T C (2.23)
29
where SV T C is the VTC ramp slope. Combine the two above equations, we can get:
∆Veq = ∆VinSrise/SV T C (2.24)
Accordingly, the VTC can be regarded as a pre-amplifier of the time comparator, with the gain
determined by the slope ratio between the VTC ramp and the comparator input rising edge. Typically, the ratio is 40 to 80. From post-layout simulations, the time comparator input metastability
region is smaller than ±10fs.
2.5 Measurement results
Figure 2.13: Chip micro-photograph and layout detail
The chip micro-photograph of the ADC prototype is shown in Fig. 2.13. The active area of
each ADC channel is 1425µm2
. The nominal power supply voltage of this ADC is 0.8V , with a
30
Figure 2.14: Measured output spectrum of a single ADC channel at Nyquist (2.5GHz) input frequency
Figure 2.15: Measured SNDR and SFDR of a single ADC channel over the input frequency from
100MHz to 5GHz
0.48V differential p-p input voltage range. The total power consumption of the ADC is 14.8mW,
31
Figure 2.16: Measured output spectrum of the 2X TI ADC at Nyquist (5GHz) input frequency
Figure 2.17: Measured SNDR and SFDR of the 2X TI ADC over the input frequency from 100MHz
to 5GHz
with 32% consumed by the first-step Flash TDC, 41% consumed by the second-step SAR TDC,
19% consumed by the local clock buffer, and 8% consumed by the VTC.
32
During the measurement of the ADC, the radix error between the first-step Flash TDC and
the second-step SAR TDC is calibrated in the digital domain, and the sampling clock timing skew
between the two ADC channels is manually calibrated on-chip. Fig. 2.14 shows the measured
4096-point output spectrum of a single ADC channel at Nyquist input frequency. The 5GS/s
ADC achieves a 40.78dB SNDR and a 53.65dB SFDR. The significant high-order harmonic tones
might be caused by the delay mismatch of the extra τcomp delay on the positive and negative path
in each SAR TDC stage. Fig. 2.15 shows the measured SNDR and SFDR of the single channel ADC
over different input frequencies, and the ADC achieves a >38dB SNDR and a >51dB SFDR up to
5GHz input. For the 2X TI ADC running at 10GS/s, the output spectrum and the input frequency
sweep test results are shown in Fig. 2.16 and Fig. 2.17, achieving a 37.24dB SNDR and a 50.69dB
SFDR at Nyquist input frequency. While measuring the time-interleaved ADC, we observed a
small gain/offset error between the two channels. To avoid saturation of the two ADC channels,
we slightly reduced the input signal swing, resulting in a small SNDR degradation comparing the
single-channel case.
Fig. 2.18 shows the measured ADC SNDR and SFDR over temperatures between 20◦C and
80◦C, and the ADC achieves a <3dB SFDR variation and a <2dB SNDR variation. Fig. 2.19 shows
the measured ADC SNDR and SFDR with a 10% supply voltage variation around the nominal
value, and the ADC achieves a <6dB SFDR variation and a <3dB SNDR variation. The significant
SFDR drop at the lower supply voltage is caused by the insufficient headroom of the VTC. For
the measurement performed under temperature and supply voltage variation, the ADC radix is
only calibrated once under 20◦C and 0.8V supply voltage, and the input signal swing is adjusted
to keep the ADC output at full-scale under different conditions. Fig. 2.20 shows the measured
differential non-linearity (DNL) and integral non-linearity (INL) of the ADC. At a 7-bit precision
33
level, the DNL is within +0.82/-0.89, and the INL is within +0.69/-0.70. Although the Monte Carlo
simulation shown in Section IV. B reveals that the variation of each reference time step is less
than 0.6TLSB, the delay mismatch of the extra comparator delay cell (τcomp) inserted in the
signal path is not calibrated. The accumulated delay mismatch of the whole conversion chain
might saturate the LSB comparator and limit the static linearity of the TDC. Nevertheless, this
static non-linearity is not a limiting factor for this prototype, since the SNDR is mainly limited
by the TDC jitter to a 6-bit level. Note that, this static non-linearity issue can be solved by adding
redundancy to the TDC.
Figure 2.18: Measured SNDR and SFDR of the 2X TI ADC over ambient temperature variation
Table 2.1 summarizes the ADC performance and compares it with other state-of-the-art ADCs
that have similar resolutions and sampling speeds. This ADC achieves a 5GS/s single-channel
conversion speed, which is the highest among ADCs with a ≥8-bit resolution. At the Nyquist
input frequency, this ADC achieves a leading Walden figure of merit of 16.6 and 24.8fJ/conv-step
for the single-channel and two-channel cases, respectively. Thanks to the power/area efficient
34
Table 2.1: Performance Summary and Comparison with State-of-the-Art ADCs with Similar Speed and Resolution
This work ISSCC 20
[47]
CICC 19
[7]
JSSC 19
[34]
JSSC 18
[49]
VLSI 21
[30]
ISSCC 20
[48]
VLSI 19
[6]
ISSCC 17
[11]
Architecture Time-domain ADCs Voltage-domain ADCs Pipelined-SAR
TDC
Interpolation
TDC Flash TDCInterpolation TDC Flash TDC TI SAR Dynamic Pipeline TI-SAR TI-SAR
Technology (nm) 14 65 65 65 65 16 28 28 28
Supply (V) 0.8 1 1 1 1 0.85 0.9 1 0.9
Resolution (bits) 8 8 7.3
6 8 8 6 10 10
# of TI channels
2
1
4 2 1 1 8 1 16 16
Sample rate (GS/s) 10
5 10 10 2.5 2 8 3.3 5 8
Power (mW) 14.8 7.4 50.8 29.7 7.5 21 26 5.5 29 300
SFDR@Nyq (dB) 50.69 53.12 52.8 40.7 45.07 48.36 46 45.5 59.6 60.3
SNDR@Nyq (dB) 37.2 40.8 40.1 32.5 33.84 40.68 42.4 34.2 48.5 49.0
FOMWalden
@Nyq (fJ/conv-step) 24.8 16.6 61.5 86.0 74.7 119 30.2 40.0 26.7 162.9
FOMSechreier
@Nyq (dB) 152.5 156.1 150.0 144.8 146.1 147.5 154.3 149.0 157.9 150.2
Active area (um2
) 2850 1425 95000 15000 120000 80000 23000 16600 103000 184000
35
Figure 2.19: Measured SNDR and SFDR of the 2X TI ADC over supply voltage variation
Figure 2.20: Measured DNL and INL (7-bit precision level)
time-domain SAR conversion, and the small number of TI channels, the total active area consumption of this ADC is only 2850µm2
, which is the smallest among ADCs with ≥10GS/s speed
[31].
36
2.6 Summary
This Chapter describes an 8-bit time-domain ADC design achieving a 10GS/s conversion speed
with only two TI channels. A two-step Flash-SAR TDC is implemented to realize low-noise and
high-efficiency time quantization. The SAR TDC throughput is further improved by the delay
tracking pipelining with minimal power and noise overhead. On the circuit level, selective delay
tuning enables efficient and low-variation SAR TDC reference time generation, and the common
mode ramp generation scheme facilitates high-speed high-linearity voltage-to-time conversion.
The high-speed and power/area-efficient characteristics of this ADC can significantly reduce the
time-interleaving effort when pushing for an even higher sample rate, and the time-domain conversion based on dynamic delay cells ensures that the ADC can benefit from future technology
scaling.
37
Chapter 3
Direct-RF Sampling Time-domain ADC, SAR TDC
redundancy, and Background Delay Offset Calibration
With the work demonstrated in Chapter 2, we would like to further push the performance of
time-domain ADCs for cutting-edge communication applications. Direct-RF sampling has gained
increasing interest for advanced wireless and wireline communications in recent years due to its
ability to simplify the receiver front end. Instead of using a mixer to down-convert the RF signal
to the baseband, the direct RF sampling scheme directly quantizes the RF signal and performs
the equalization and filtering using digital signal processing (DSP). This approach necessitates
the ADC to achieve a high sampling speed (tens of GS/s) and a high effective number of bits
(ENOB) (>7b) to support large signal bandwidth and high order modulation scheme simultaneously. Most existing designs implemented voltage-domain ADCs using time-interleaved (TI) SAR
ADCs or pipeline ADCs [2, 23, 46]. However, the limited speed of single-channel voltage-domain
ADCs inevitably requires excessive time-interleaving, resulting in high power consumption and
complexity, and a large active area. Recent developments in time-domain ADCs have showcased high-speed, high-efficiency ADC channels that are particularly advantageous for advanced
38
technology nodes due to their digital-like delay-cell-based quantization scheme as discussed in
Chapter 2. However, existing time-domain ADCs are generally limited in ENOB (<6 bits) due to
TDC static non-linearity caused by the variability of delay stages and the limited SFDR caused
by the voltage-to-time conversion. To enhance ENOB while maintaining a high sampling speed,
we propose adding redundant stages to the end of a binary pipelined-SAR TDC to compensate
for the delay variabilities of SAR TDC stages. To ensure the accuracy of the redundant stages,
we further propose a background delay offset calibration scheme in the redundant stages. To
improve the dynamic linearity of the VTC, we further propose a bottom-plate sampling and topplate ramp injection scheme. Our ADC prototype, fabricated in 4nm CMOS technology, achieves
a sampling rate of 16GS/s with an SNDR of 44.48dB at Nyquist. The ADC consumes 94.2mW
power and occupies an active area of 8000um2, leading to a Schreier FoM of 153.8dB. To the best
of the authors’ knowledge, these results demonstrate the highest power efficiency and the lowest
area consumption among published ADCs operating over 16GS/s.
3.1 SAR TDC Delay Variablities and compensation
As discussed in Chapter 2, The operation of a SAR TDC is similar to a voltage-domain SAR ADC
except that the signal is represented in time. Hence, each SAR TDC stage should add or subtract
a binary scaled reference time based on the bit trial result. In our implementation, the signal is
encoded as the rising edge time difference of two pseudo-differential pulses throughout the TDC
stages. In each SAR TDC stage, a time comparator recognizes which input rising edge arrives
earlier, and its output determines whether reference time (Tref ) subtraction is applied on the
positive or negative path. An extra delay (τcomp) is inserted into both signal paths to hold the time
39
Figure 3.1: SAR TDC delay variabilities
signal while the comparator is resolving. As shown in Fig. 3.1, due to process variation, the delay
of all the delay cells can vary, and the comparator can have an input referred offset. For illustration
purposes, we model the delay variabilities as extra delay (δcm,N , δcm,P , δref,P , δref,N , δ0,P , and
δ0,N ) following the corresponding delay cell, and the comparator offset as an extra delay δcomp at
one side of comparator input. Without calibration, those variabilities can significantly limit the
TDC static linearity. For instance, as shown in Fig. 2.20, the DNL/INL of the previous work is
limited to a 7-bit level.
Although it is possible to add delay measurement and calibration to each of the delay cells, this
approach can cause significant design complexity, jitter, and power/area overhead. To simplify
40
the calibration scheme, we propose to lump all the variabilities into two equivalent error models.
Mathematically speaking, the comparator input-referred delay offset equals an extra delay in
the signal path before the comparator input plus an inverted delay with the same value after
the comparator input. For the SAR conversion, only the delay difference between the positive
and negative paths matters. Taking all the variability sources into account, the delay difference
between the two paths when the comparator output is ’1’ can be expressed as
∆t1 = δ
(n+1)
comp − δ
n
comp + δ
n
cm,P − δ
n
cm,N + Tref /2 + δ
n
ref,P − δ
n
0,N , (3.1)
and the delay difference between the two paths when the comparator output is ’0’ can be expressed as
∆t0 = δ
(n+1)
comp − δ
n
comp + δ
n
cm,P − δ
n
cm,N − Tref /2 − δ
n
ref,N + δ
n
0,P . (3.2)
To simplify the calibration scheme, we can rewrite equation (3.1) and (3.1) into
∆t1 =δ
(n+1)
comp − δ
n
comp + δ
n
cm,P − δ
n
cm,N +
δ
n
ref,P + δ
n
0,P − δ
n
ref,N − δ
n
0,N
2
+(Tref /2 +
δ
n
ref,P − δ
n
0,P + δ
n
ref,N − δ
n
0,N
2
),
(3.3)
∆t1 =δ
(n+1)
comp − δ
n
comp + δ
n
cm,P − δ
n
cm,N +
δ
n
ref,P + δ
n
0,P − δ
n
ref,N − δ
n
0,N
2
−(Tref /2 +
δ
n
ref,P − δ
n
0,P + δ
n
ref,N − δ
n
0,N
2
),
(3.4)
As we can see from equation (3.3) and (3.4), the term in the first row is a code-independent term
that has the same value when the comparator output is ’0’ or ’1’. The term in the second row
inside the parenthesis is the code-dependent term, with the sign determined by the comparator
output. Accordingly, we can derive the two equivalent error models, as shown in Fig. 3.2
41
Figure 3.2: Equivalent SAR TDC error models
The first model is the equivalent time radix error δref,eq which includes all the delay variabilities that effectively change the reference time (Tref) of the corresponding SAR TDC stage,
resulting in a larger-than-expected output time residue. The value of the radix error is
δref,eq = (δref,P + δref,N − δ0,P − δ0,N )/2 (3.5)
The second model is the equivalent delay offset between the two signal paths (δcm,eq), which
corresponds to a quantization offset in the TDC stage. The value of the delay offset is
δcm,eq = (δref,P − δref,N + δ0,P − δ0,N )/2 (3.6)
42
Figure 3.3: Delay tracking pipelined-SAR TDC structure
Both radix error and delay offset can saturate the following stages and cause larger static nonlinearity. To compensate for those errors, we propose to add a few redundant stages at the end of the
SAR TDC conversion chain. As long as the additional quantization range provided by redundant
stages covers the variability-induced errors, the final quantization noise can be reduced within 1
LSB of the target specification.
43
Figure 3.4: Delay offset calibration algorithm
3.2 Pipelined-SAR TDC Structure with Background Delay
Offset Calibration
With the aforementioned analysis, the overall TDC architecture is shown in Fig. 3.3. The TDC
consists of 9 binary SAR stages with 2 redundant stages and a final comparator for a target resolution of 10 bits. The two redundant stages are inserted at the end of the conversion chain to
compensate for the delay offset and radix error from the preceding nine stages. Nevertheless,
the variabilities from the redundant stages cannot be further compensated and may degrade the
44
effectiveness of error correction as outlined in Fig. 3.2. Therefore, they need to be minimized
by design or calibrated. For the time radix error of each redundant stage, we apply the SDT
technique for reference time subtraction to minimize the error («LSB) by design. However, as analyzed in Chapter 2, this technique cannot be applied in all the TDC stages. Applying SDT in MSB
stages can cause significant jitter, because the SDT implementation results in unnecessarily slow
rising/falling edges when a long delay (larger than one gate delay) is required. As for the delay
offset of the redundant stages, we propose to calibrate it in the background. During calibration,
the input rising edges of the redundant stage under calibration are forced to be aligned through
a shorting switch while the first 9 SAR TDC stages still operate normally. As shown in Fig. 3.4,
the redundant stage under calibration can be regarded as two signal paths with the equivalent
delay offset inserted to the positive side, with two digital-to-time converters (DTCs) inserted in
the signal path. When the calibration is enabled, the DTCs are initialized with zero input code.
In this case, the following stage comparator output is solely determined by the delay offset of the
stage under calibration. As a result, we apply two integrators to accumulate the comparator’s
positive/negative outputs individually and adjust the corresponding DTC to compensate for the
delay offset. For robustness against jitter, the integrator output LSBs are truncated. For instance,
if the delay offset equals an effectively longer delay on the positive path, then the following comparator output would be one, therefore increasing the control code of the negative path DTC, and
compensating the delay offset accordingly. After sufficient cycles (usually <64), the delay difference between the positive and negative path will be equalized, i.e. reaching zero offsets. At that
point, the calibration for the current redundant stage is completed, and ready for the calibration
of the next redundant stage. While there is no particular constraint in which redundant stage
to calibrate first, it is advantageous to calibrate the redundant stages in the forward sequence so
45
Figure 3.5: Redundant stage block diagram
that the redundancy correction can be resumed right after the calibration. Note that, the SNDR
of the ADC would temporarily drop by approximately 6dB during the calibration phase (<16ns)
of the redundant stages.
To realize the above calibration algorithm, the block diagram of a redundant stage is shown in
Fig. 3.5. The DTC is realized by an inverter chain with tunable loading capacitors. The reference
time subtraction is realized using SDT and can be disabled when the stage is under calibration.
The integrator is realized by an up/down counter connected to the next stage comparator output.
3.3 ADC Implementation
With the new pipelined-SAR TDC architecture, the overall ADC block diagram is shown in Fig.
3.6. Each ADC channel runs at 4GS/s with a 360fF sampling capacitor and 0.8V peak-to-peak
46
Figure 3.6: Overall ADC block diagram
input differential voltage swing, and the overall ADC achieves 16GS/s using only 4-way timeinterleaving. In each channel, the sampled input differential voltage is linearly converted into
a time difference between the rising edges of two pulses via a VTC, and further quantized by
the aforementioned pipelined-SAR TDC with redundancy and calibration. The digital output
from the TDC is further aligned and decimated by the digital backend for testing purposes. The
maximum output time range of the VTC is ±80ps, and to achieve the designed 10-bit resolution,
the TDC resolution is as fine as 156fs. The ADC sampling clock is generated from a differential
8GHz input sinewave signal and further divided into 4 clock phases at 4GHz. To minimize the
inter-channel interference caused by sampling, the sampling clock duty cycle is adjusted to 25%
to ensure that there would always be one and only one channel turned on for sampling at a given
time.
To further improve the ADC dynamic linearity, we propose the bottom-plate sampling with
top-plate ramp injection technique for the VTC, as shown in Fig. 3.7. In the VTC of the previous
47
Figure 3.7: Bottom-plate sampling with top-plate ramp injection VTC, and the corresponding
timing diagram
ADC prototype discussed in Chapter 2, the input voltages are first sampled on the top plates,
and the sampled voltage is later ramped up by injecting a DC current to the bottom plates. It
is known that the top-plate sampling scheme is vulnerable to clock feedthrough and charge injection. Furthermore, while the voltage ramp on the bottom plates can be reset before the next
sampling phase to avoid common-mode kickback to the input, the sampled voltages remain on
48
the top plates, still causing differential kickback to the input and hence signal-dependent distortion, i.e., inter-symbol interference (ISI). In the proposed VTC, we first use bottom plate sampling
to mitigate the clock feedthrough and charge injection. At the end of the sampling phase, the
sampled voltage is ramped up by injecting a DC current to the top plates, such that the voltage at
the bottom plates will cross a certain voltage level (Vth of the crossing detector) which effectively
encodes the sampled voltage into the time difference between the two crossing points, as shown
in Fig. 3. More importantly, when the ramp voltage is reset before the next sampling phase, it
eliminates both common-mode and differential-mode kickbacks to input and hence reducing the
ISI. The SPICE simulation indicates a 6.5dB SFDR improvement compared to the conventional
VTC structure under the same power consumption and input voltage swing.
3.4 Measurement results
Figure 3.8: Chip microphotograph and the zoomed-in layout of a single channel
49
As a proof of concept, an ADC prototype was fabricated in 4nm FinFet CMOS technology.
The chip microphotograph is shown in Fig. 3.8. The 16GS/s 10-bit 4-channel time-interleaved
ADC occupies an active area of 8000 µm2
. To intuitively demonstrate the effectiveness of the
proposed background delay offset calibration in the redundant stages, the measured ADC output
spectrum is shown in Fig. 3.9. The ADC output is decimated by 1025 times for measurement
purposes. At a low input frequency of 100.1 MHz, the ADC achieves a 41.0dB SNDR and 53.4dB
SFDR without calibration. After the calibration, the SNDR increases to 45.3dB, and the SFDR
increases to 56.7dB. At a close-to-Nyquist input frequency, the ADC achieves a 39.9dB SNDR and
a 50.8dB SFDR before calibration, and a 44.5dB SNDR and a 55.9dB SFDR after the calibration.
Fig. 3.10 shows the measured ADC SNDR/SFDR at different input signal frequencies. The
ADC achieves a higher than 44.3dB SNDR and higher than 55dB SFDR over the band from 100MHz
to 7.6GHz. Fig. 3.11 shows the measured ADC SNDR/SFDR at different sampling frequencies. The
ADC achieves a higher than 45dB SNDR below 16GS/s at a fixed 100MHz input frequency. At
20GS/s, the ADC SNDR slightly drops to 42dB mainly because of the reduced voltage-to-time
conversion time and the limited output time range from the VTC. At an even higher sampling
rate beyond 20GS/s, the ADC SNDR/SFDR drops significantly due to insufficient conversion time.
Fig. 3.12 shows the measured ADC INL/DNL with and without the background delay offset
calibration in redundant stages. Before the calibration, the ADC DNL is between +1.25/-0.86 LSBs,
and the INL is between +6.07/-1.70 LSBs. After the calibration, the DNL reduces to +0.59/-0.48
LSBs, and the INL reduces to +2.54/-2.55 LSBs.
The performance summary of the ADC is shown in Table. 3.1. Compared to the existing timedomain ADCs, this work achieves much higher SNDR. Compared to existing voltage-domain
ADCs, this work consumes much lower power and area. At 16GS/s, the ADC consumes a power
50
Figure 3.9: Measured ADC output (decimated by 1025X) spectrum at low (100MHz, fold back to
6MHz) and Nyquist (7.618GHz, fold back to 1MHz because of decimation) input frequency, with
and without the background delay offset calibration in redundant stages.
51
Figure 3.10: Measured ADC SNDR/SFDR versus input signal frequency with or without the background delay offset calibration in redundant stages.
52
Figure 3.11: Measured ADC SNDR/SFDR versus sampling frequency with or without the background delay offset calibration in redundant stages.
53
Figure 3.12: Measured ADC DNL and INL with or without the background delay offset calibration
in redundant stages.
54
Table 3.1: Performance summary and comparison with the state-of-the-art direct RF sampling
ADCs
55
of 94.2mW, leading to the highest Schreier power efficiency compared to existing ADCs with
higher than 10GS/s conversion speed.
3.5 Summary
This chapter describes a direct-RF sampling time-domain ADC achieving 16GS/s and 44.5dB
SNDR at Nyquist input frequency. To improve the static linearity of the TDC, we proposed to
add redundancy to the SAR TDC conversion chain. The accuracy of the redundant stages is ensured through the proposed background delay offset calibration scheme. To improve the ADC
dynamic linearity, we proposed the bottom-plate sampling with top-plate ramp injection VTC for
to reduce the distortion from clock feedthrough, charge injection, and inter-symbol interference.
56
Chapter 4
Machine-Learning-Based AMS Circuit Modeling
While the digital-like active-cell-dominated AMS circuit architecture described in Chapters 2
and 3 can effectively reduce the power and area cost of the circuit itself, the design process is
still mostly manual which can cause significant design costs and long time-to-market. Among all
the different kinds of AMS design automation algorithms described in Chapter 1, the surrogatemodel-based design automation is one of the most promising approaches because of the fast evaluation of the circuit performance, as shown in Fig 4.1. For a given design target, the automation
flow iterates the design parameters using an optimization algorithm such as gradient descent [8],
and can efficiently evaluate the circuit performance that corresponds to the design parameters
given within seconds, which is fundamentally faster than real-time circuit simulations. When
multiple design targets are demanded in one certain circuit topology, this design approach shows
extra efficiency advantages since the surrogate model only requires one-time training. Nevertheless, the surrogate model training method requires significant amounts of training samples to
57
Figure 4.1: Surrogate-model-based circuit design automation and NN-based circuit modeling
ensure sufficient modeling accuracy, which makes the training sample generation the most timeconsuming step in the whole design flow, especially when layout parasitic information is considered. To overcome this bottleneck, we improved the circuit modeling efficiency with two proposed modeling algorithms. Transfer Learning [28, 27] is proposed to reuse the schematic-level
circuit model and vastly reduce the required training samples for the post-layout or silicon-level
circuit model. To further utilize the circuit topology information in the circuit, we proposed the
Graph Neural Network-based circuit modeling algorithm, which demonstrates significant circuit
modeling accuracy improvement compared to existing NN-based circuit modeling algorithms.
58
4.1 Transfer Learning for Efficient Post-Layout and SiliconLevel Modeling
4.1.1 Circuit modeling using MLP and supervised training
The term “circuit modeling” usually refers to modeling the circuit parameter-to-metrics (P2M)
function using a regression algorithm such as neural networks (NN) [40]. The NN training is
performed in a supervised manner, and a Fully-connected NN (FCNN) is usually used as the NN
model because the parameters and metrics have no certain spatial or temporal structures. For
each layer in the MLP, the mathematical computation can be written as:
F(W, b, x) = fELU (Wx + b). (4.1)
fELU = y,(y ≥ 0) or ey − 1,(y < 0) (4.2)
where x is the input vector, W is the weight matrix, and b is the bias vector. fELU is the activation
function of MLP, that is, the exponential linear unit (ELU) [8] that introduces nonlinearity into
the regression model (which provides the best model accuracy according to experiments). The
mean squared error (MSE) loss of the NN is defined as follows:
L =
1
N
X
N
i=1
(mi − mˆi)
2
, (4.3)
59
where mi
is the i
th performance metric of the AMS circuit, and mˆi
is the prediction made by
the NN. Then, the gradient of the MSE loss is used to train the MLP network, which can be
represented as follows:
Wj+1 = Wj − α
∂L
∂W (4.4)
bj+1 = bj − α
∂L
∂b
(4.5)
where Wj and bj are the weight and the bias of the j
th iteration, respectively, and α is the learning
rate. In actual training, to avoid falling into local minimum points, Adam [16]—a modification of
4.4 and 4.5—was used as the optimizer.
In the training dataset of the NN model, for each different design parameter set, the circuit under modeling needs to be simulated to generate the golden reference performance metric mi
in equation 4.3. To achieve sufficient modeling accuracy, a large number of (typically >
1000) samples from the design space needs to be simulated. While simulating all the samples
at the schematic level is still feasible, it can be impractically time-consuming to simulate with
layout parasitics extraction (LPE) information included. To further include the effect of chip input/output (I/O) connections and printed circuit board (PCB) traces, one can measure the circuit
performance of several silicon prototypes, while the number of samples can be extremely limited.
4.1.2 Transfer learning scheme for training sample reduction
To reduce the required training samples for the post-layout/silicon-level circuit modeling, we
propose the Transfer learning (TL) technique which leverages the existing knowledge. More
specifically, TL requires one to have a well-trained NN model representing the P2M function of a
certain circuit in the early design phase (schematic level). Two augmenting layers will be added
60
Figure 4.2: Proposed transfer learning scheme
to the well-trained NN, which will efficiently transfer the knowledge to a later design phase—i.e.,
post-layout or silicon-level—with very few training samples required.
The detailed flow of the proposed TL technique is shown in Fig. 4.2. The first step is to define the design parameters and performance metrics of the AMS circuit. Next, we define a certain
region inside the parameter space that will be sampled for the training dataset. Two sets of parameters are sampled in design space following the uniform random distribution, with one densely
sampled for the schematic dataset generation and one sparsely sampled for the layout/siliconlevel dataset generation. The schematic dataset is then generated using fast schematic-level simulations, while the layout/silicon-level training dataset is generated using time-consuming LPE
simulations or even silicon measurement. The large schematic dataset is used to train a highprecision source NN model from scratch (which means the weighting in the NN is randomly
61
initialized following the normal distribution). With the well-trained source NN, the TL is performed by fixing the source NN and adding two trainable linear layers to the input and output
of the source NN. The layout/silicon dataset is used to only update the weight and biasing in the
two new layers, and the whole NN is the target transfer learned NN which includes the LPE or
even physical chip I/O information.
4.1.3 Mathematical analysis for TL
Mathematically, the post-layout model is constructed as follows:
ˆflayout(p) = A ˆfsch(Cp + d) + b = ˆmlayout (4.6)
where ˆfsch is the schematic-level surrogate model, and ˆflayout is the post-layout-level surrogate model. With n parameters and k metrics, C (an n × n matrix) and d (a 1×n bias vector) are
the mapping function of the parameters, i.e., the added input linear layer, while A (a k × k matrix) and b (a 1×k bias vector) are the mapping function of the metrics, or the added output linear
layer. To justify the use of linear TL layers between schematic and post-layout model, we perform propagation delay analysis of an inverter for illustration purpose. If we express the inverter
delay as a function of the width and length of the PMOS and NMOS, assuming NMOS and PMOS
have the same size, the schematic-level delay without layout parasitics can be approximated as
follows:
td,sch(p) = 0.69CLRon; p = (W, L), (4.7)
CL = CoxW L + 2CovW + 2Cdb, (4.8)
62
Ron =
3VDD
4k
′
[(VDD − VT )Vdsat −
V
2
dsat
2
]
W
L
1 −
7
9
λVDD
(4.9)
If we regard the terms that are irrelavent to W and L in (4.9) as constants and lump them into α,
the equation can be simplified as follows:
td,sch = α(CoxW L + 2CovW + 2Cdb)
L
W
(4.10)
After the inverter is laid out, there will be a parasitic resistance in series with Ron, and a parasitic
capacitance in parallel with CL, which can be reasonably estimated as
Rp ∝ 1/W = Runit/W (4.11)
Cp ∝ W = CunitW (4.12)
With the parasitic effect, the delay will be changed to:
td,layout = α(CoxW L + 2CovW + 2Cdb + CunitW)
L + Runit/α
W
(4.13)
Corresponding to (4.6), to transfer the layout-level delay (4.13) from schematic-level (4.10) as:
td,layout = Atd,sch(Cp + d) + b; p = (W, L) (4.14)
63
we can annotate two new variables as:
W′ =
2Cov + Cunit − CoxRunit/α
2Cov
W = θwW (4.15)
L
′ = (L − Runit/α)/θw
and rewrite (4.13) into
td,layout = θ
2
wα(CoxW′L
′ + 2CovW′ + 2Cdb)
L
′
W′
(4.16)
= θ
2
wtd,sch(p
′
); p
′ = (W′
, L′
)
Compared to (4.14), we can get
A = (θ
2
w) (4.17)
b = (0)
C =
θw 0
0 1/θw
d = (0, −Runit/(αθw))
for the input and output linear mapping.
In conclusion, by adding two linear layers to the input/output of the pre-trained schematiclevel circuit model, and only training the two layers using post-layout simulation results, it would
be sufficient for modeling inverter delay in the post-layout stage. As a result, TL can significantly
64
improve the post-layout modeling accuracy with a small number of training samples and prevents
over-fitting.
Similarly, the same TL technique can be applied to silicon-level circuit modeling after the
circuit is fabricated and tested, i.e., step 5 of the proposed design flow. Based on the transferlearned post-layout circuit model, we can cascade additional TL layers and re-train an accurate
silicon-level model with only a few silicon measurement samples.
4.1.4 Automatic circuit sizing with post-layout/silicon-level model
After preparing the surrogate model, we perform the automatic sizing of the circuit block to satisfy the desired design targets. In step 4 of the proposed design flow, we use the search algorithm
illustrated in [8] to find multiple circuit parameter candidates. This algorithm incorporates a
gradient-based parameter search using NN models of the circuit blocks. In [8], the search algorithm only utilized NN models trained from schematic-level simulations. In this paper, we further
enhance the model with post-layout simulation and/or silicon measurement results, leading to
much-improved search accuracy in terms of matching with the final silicon performance.
Since we use an optimization-based search methodology, a penalty function is designed to
help find the optimal circuit parameters. Additionally, this penalty function should be differentiable everywhere for the gradient-based optimizer. Here, we define the circuit sizing problem
as:
arg min
p
g
o
(mˆ ), (4.18)
65
s.t. : g
i
(mˆ ) ≥ 0,
g
e
(mˆ ) = 0,
mˆ = ˆf(p),
where ˆf is the circuit surrogate NN model, g
o
includes the specifications that should be minimized, g
i
includes the inequality constraints, and g
e
includes the equality constraints that should
be satisfied. For example, g
o
can be the power consumption that should be minimized, g
i
can be
the bandwidth of a system desired to be larger than 10MHz, and g
e
can be the gain of a feedback
system that should be exactly equal to 2. For simplification, we assume g
o
, g
i
and g
e
are a list
of subfunctions (i.e., g = [g1, g2, ...]) that are only related to the circuit’s design metrics and are
differentiable for the given inputs. Knowing (4.18), we can construct the penalty function for the
automatic sizing problem as:
penalty(p) = X
j
w
o
j × g
o
j
(
ˆf(p)) (4.19)
+
X
k
elu(w
i
k × g
i
k
(
ˆf(p)))
+
X
l
w
e
l × (g
e
l
(
ˆf(p)))2
,
where ws are the optimization weights determined by the importance of each specification and
elu is the exponential linear unit function. The function elu linearly increases the penalty when
the inequality is not satisfied and exponentially reduces it if satisfied, and it is differentiable everywhere. To satisfy the equality constraints, we use the MSE which is differentiable and increases
the penalty if it is not satisfied. To calculate the gradients for the gradient-based optimization,
66
Figure 4.3: Schematic of the delta-sigma DAC example
we can use the chain rule to first find the ∂penalty
∂mˆ
and then derive ∂mˆ
∂p
. Machine learning tools
such as TensorFlow can easily complete both tasks and therefore compute the gradients of the
penalty function with respect to the design parameters.
4.1.5 Transfer Learning experimental results
In order to validate the proposed TL method in terms of the efficiency and accuracy of the regression model, the method was applied to two representative circuits. First, we use a sigma-delta
digital-to-analog converter (DAC) example to demonstrate the circumstance where TL is applied
for post-layout circuit modeling, and the DAC schematic is shown in Fig. 4.3. In the second example, we model a voltage-controlled oscillator (VCO) using TL with the post-layout samples and
also silicon measurement data that includes all physical I/O effects, and further performed circuit
sizing using the post-layout and silicon model. Fig 4-4 shows the schematics and the design parameters of the two circuits. With the help of layout automation, 10 different VCO designs were
taped-out and measured in terms of oscillation frequency (Fosc) and power consumption (PW)
under different control voltages. The die photo is shown in Fig. 4-4(b), and the zoomed-in VCO
layout details are shown in Fig. 4-4(c) and (d).
67
Figure 4.4: Schematic, layout, and die photo of the VCO example
When performing the circuit model, we compare the modeling accuracy between the proposed TL approach and the baseline approach where an NN is trained from scratch using the same
number of training samples. To quantify the modeling accuracy, we calculate the customized MSE
between the metrics predicted by the NN model and the golden reference as:
MSE =
1
NQ
X
N
n=1
X
Q
q=1
( ˆmnq − mnq)
2
(mq,max − mq,min)
2
(4.20)
where N is the number of samples, Q is the number of metrics, mˆ nq stands for the q
th metric of
the nth sample predicted by the regression model, and mnq is the true value. In addition, mq,max
Table 4.1: Parameter list of the delta-sigma DAC example
68
Figure 4.5: Testing MSE comparison of the Sigma-Delta DAC example
and mq,min stand for the maximum and minimum values of the q
th metric in the entire dataset.
In the sigma-delta DAC example, we try to use a schematic-level P2M model to accelerate the
modeling of the LPE-involved P2M function. As shown in Table 4.1, we defined nine parameters
in this example, which are the discretized sizes of the standard cells used in the DAC, to make
this circuit suitable for commercial automatic layout tools.
The metrics of the DAC are defined as spurious free dynamic range (SFDR), effective number
of bits (ENOB), and power consumption. The source NN is trained by 2000 schematic simulation
results, and another 1000 layouts are randomly generated. The post-layout simulation results are
used to train the LPE-involved P2M NN model with or without the transfer learning method, and
the comparison result is shown in Fig. 4.5.
69
We use a log scale plot in Fig. 4.5 to show the test MSE loss versus the number of LPE simulation samples used in the training process. As Fig.4-5 indicates, even for a sophisticated circuit
structure and layout-sensitive metrics, TL is much more efficient than training from scratch. With
only four training samples, TL can achieve an accuracy for which the baseline approach needs
more than 600 samples. As for accuracy, with the same number of training samples, MSE loss
from TL is always less than 1/4 of the MSE loss of the baseline approach.
In order to more precisely illustrate the prediction accuracy of the NN models, the comparison
of the relative prediction error of each metric versus the number of training samples is shown in
Fig. 4.6. When very small numbers of samples are used in training, the baseline approach can
only make predictions with more than 10% prediction error. On the contrary, if TL is used, the
NN model can achieve around a 4% prediction error, which is a > 2X improvement. If we are able
to provide more training samples, the prediction error of the TL can reach below 2%. Therefore,
TL can avoid large numbers of time-consuming post-layout simulations and effectively accelerate
the modeling of LPE-involved P2M function modeling.
After verifying the TL technique for the post-layout circuit modeling, we use the VCO example to further validate TL for silicon-level circuit modeling and sizing. we first densely sampled
the parameters (the aforementioned eight parameters) to metrics (Fosc and PW) function in the
design space via low-cost schematic-level simulation, and trained a 3-hidden-layer MLP (number
of neurons per layer: [8, 16, 32, 16, 2]) from scratch. All the parameters and metrics have been
linearly re-scaled to [-1, 1] according to the minimum and maximum values in the dataset before training. As shown in the first column of Fig. 4.7, with a large number of training samples
the VCO surrogate model can precisely predict the schematic-level performance metric. In this
particular case, we used 5,250 training samples and 500 testing samples, and the training sample
70
Figure 4.6: Relative prediction error comparison of the Sigma-Delta DAC example
71
Schematic
train from
scratch
Layout
train from
scratch
Layout
Transfer
learning
Silicon
train from
scratch
Silicon
transfer
learning
MSE loss
~0.84% error
~50% error
~4.3% error
~20% error
~3.9% error
Figure 4.7: Training and testing MSE loss comparison
generation took around 95 minutes with parallel threads. We take the square root of the testing
MSE loss as the approximated prediction error since both metrics have been re-scaled to [-1, 1].
When we first perform transfer learning for the post-layout VCO model, we only used a single
layout to generate 20 training samples at 20 different control voltages, and the trained model was
tested with another 180 samples with different VCO parameters. Each of the post-layout simulations took around 24 minutes. As shown in the second and third columns of Fig. 4-7, while
the model trained from scratch has a 50% testing error because of overfitting, TL can effectively
reuse the information in the schematic-level model and the transfer-learned model can predict the
post-layout metrics with only 4.3% error. To intuitively demonstrate the post-layout modeling accuracy, we compared the performance metrics from two of the VCO designs using: 1. post-layout
72
VCO 1 VCO 1
VCO 10 VCO 10
(a) (b)
(c) (d)
Figure 4.8: Layout NN model prediction, post-layout simulation, and silicon testing results comparison: (a) Fosc vs. Vctrl for VCO 1 (b) power vs. Vctrl for VCO 1 (c) Fosc vs. Vctrl for VCO 10
(d) power vs. Vctrl for VCO 10
simulation, 2. post-layout model prediction, and 3. silicon measurement result. The results are
shown in Fig. 4.8. For both VCO designs, the surrogate model predictions can accurately follow
the simulation results, but they are still different from the silicon test result. This discrepancy
can be caused by modeling and layout extraction inaccuracy, especially for the high-frequency
cases, and parasitic capacitance and resistance from the peripheral testing circuitry.
To fix the aforementioned discrepancy between post-layout model prediction and silicon measurement results, we utilized the post-layout surrogate model and performed TL with 40 training
samples from the silicon measurement of VCO 1 and 2. For the testing, we used 160 measurement samples from VCO 3 to 10, with 20 samples from each VCO by applying 20 different control
73
VCO 10
VCO 10
~24% error
~11% error
~3.9% error
SCH LAY Silicon
(a)
(b)
(c)
Figure 4.9: (a) Prediction errors of silicon result using schematic-, layout- and silicon- level NN
model (b) Fosc vs. Vctrl for VCO 10 (c) power vs. Vctrl for VCO 10 from post-layout simulation,
silicon-level model prediction and silicon testing
voltages. As shown in the last two columns of Fig. 4.7, we can obtain a much more accurate
silicon-level VCO model with TL compared to the model trained from scratch given the same
number of training samples.
To examine the prediction accuracy regarding the silicon result, we used the 1. schematiclevel model, 2. post-layout model trained by TL, and 3. silicon-level model trained by TL. As
shown in Fig. 4.9(a), if we directly use the schematic-level model or the post-layout model to
predict the silicon test results, the MSEs of predictions are approximately 8 and 3 times higher
than that of the transfer learned silicon model, respectively. The oscillation frequency and power
consumption prediction by the silicon-level model are shown in Fig. 4.9(b) and (c). Compared
74
with Fig. 4.8, the model prediction can precisely follow the silicon results (within 5% throughout
the frequency tuning range of VCO).
Using the accurate post-layout/silicon-level VCO model, we can further perform the circuit
sizing algorithm described in Section III-D to validate the effectiveness of the proposed design
flow. We used the silicon measurement results of testing VCOs (3-10) as the design targets (Fosc
and PW at a certain control voltage) and perform the circuit sizing algorithm with the VCO
surrogate model at different design stages (schematic-level, post-layout, silicon-level). On an
NVIDIA 1080 computing platform, the maximum time consumption for one design is 70 seconds.
The search results based on the surrogate models are compared to the actual VCO’s sizing, as
shown in Table 4.2. Note that, the sizing results from the model-based search are continuous
values, but the 12nm FinFET technology requires discrete numbers for device sizes. Therefore,
the continuous sizing results are rounded and annotated inside parentheses in the table. As shown
in the table, the sizing results from the schematic-level model are significantly different from the
actual parameter values in silicon, with an average sizing difference of 35%. Sizing with a postlayout model can find much closer design points as compared to the actual parameter values in
most cases, and the average sizing difference reduces to 21%. With the silicon-level model, the
sizing results show much-improved precision in all cases as the average sizing difference further
reduces to only 5%. Accordingly, this proposed design flow can significantly accelerate the design
process and find the desired design points with different design specifications, especially with the
silicon-level circuit model.
75
Table 4.2: VCO sizing results comparison (VCO 1 and 2 used for model training)
VCO# nf_c_n nf_c_p nf_d_n nf_d_p nfin_c_p nfin_c_n nfin_d_n nfin_d_p
VCO 3
schematic 5.21(5) 5.21(5) 10.40(10) 10.40(10) 9.60(10) 7.68(8) 7.68(8) 9.60(10)
post-layout 4.41(4) 4.41(4) 8.82(9) 8.82(9) 9.98(10) 7.99(8) 7.99(8) 9.98(10)
silicon-level 4.12(4) 4.12(4) 8.23(8) 8.23(8) 10.00(10) 8.00(8) 8.00(8) 10.00(10)
actual value 4 4 8 8 10 8 8 10
VCO 4
schematic 7.97(8) 7.97(8) 15.90(16) 15.90(16) 9.00(9) 7.20(7) 7.20(7) 9.00(9)
post-layout 7.02(7) 7.02(7) 14.00(14) 14.00(14) 9.99(10) 7.99(8) 7.99(8) 9.99(10)
silicon-level 6.02(6) 6.02(6) 12.00(12) 12.00(12) 9.99(10) 7.99(8) 7.99(8) 9.99(10)
actual value 6 6 12 12 10 8 8 10
VCO 5
schematic 8.71(9) 8.71(9) 17.40(17) 17.40(17) 9.86(10) 7.89(8) 7.89(8) 9.86(10)
post-layout 9.15(9) 9.15(9) 18.30(18) 18.30(18) 9.98(10) 7.99(8) 7.99(8) 9.98(10)
silicon-level 8.25(8) 8.25(8) 16.49(16) 16.49(16) 10.00(10) 8.00(8) 8.00(8) 10.00(10)
actual value 8 8 16 16 10 8 8 10
VCO 6
schematic 9.44(9) 9.44(9) 18.90(19) 18.90(19) 9.71(10) 7.77(8) 7.77(8) 9.71(10)
post-layout 9.72(10) 9.72(10) 19.40(19) 19.40(19) 9.95(10) 7.96(8) 7.96(8) 9.95(10)
silicon-level 9.91(10) 9.91(10) 19.80(20) 19.80(20) 9.95(10) 7.96(8) 7.96(8) 9.95(10)
actual value 10 10 20 20 10 8 8 10
VCO 7
schematic 5.75(6) 5.75(6) 11.50(12) 11.50(12) 7.75(8) 6.20(6) 6.20(6) 7.75(8)
post-layout 3.15(3) 3.15(3) 6.29(6) 6.29(6) 9.77(10) 7.81(8) 7.81(8) 9.77(10)
silicon-level 4.21(4) 4.21(4) 8.43(8) 8.43(8) 4.85 (5) 3.88(4) 3.88(4) 4.85(5)
actual value 4 4 8 8 5 4 4 5
VCO 8
schematic 6.14(6) 6.14(6) 12.30(12) 12.30(12) 7.20(7) 5.76(6) 5.76(6) 7.20(7)
post-layout 2.82(3) 2.82(3) 5.63(6) 5.63(6) 9.93(10) 7.95(8) 7.95(8) 9.93(10)
silicon-level 2.21(2) 2.21(2) 4.42(4) 4.42(4) 5.82(6) 4.66(5) 4.66(5) 5.82(6)
actual value 2 2 4 4 5 4 4 5
VCO 9
schematic 3.73(4) 3.73(4) 7.45(7) 7.45(7) 9.88(10) 7.90(8) 7.90(8) 9.88(10)
post-layout 3.65(4) 3.65(4) 7.29(7) 7.29(7) 8.60(9) 6.88(7) 6.88(7) 8.60(9)
silicon-level 3.09(3) 3.09(3) 6.19(6) 6.19(6) 8.66(9) 6.93(7) 6.93(7) 8.66(9)
actual value 3 3 6 6 10 8 8 10
VCO 10
schematic 6.72(7) 6.72(7) 13.40(13) 13.40(13) 9.29(9) 7.43(7) 7.43(7) 9.29(9)
post-layout 5.72(6) 5.72(6) 11.40(11) 11.40(11) 9.99(10) 7.99(8) 7.99(8) 9.99(10)
silicon-level 5.52(6) 5.52(6) 11.00(11) 11.00(11) 9.60(10) 7.68(8) 7.68(8) 9.60(10)
actual value 5 5 10 10 10 8 8 10
76
4.2 Graph Neural Network-based Circuit Modeling
While the TL technique can effectively reduce the required training sample for circuit modeling in
later design phases, the source NN training still regards the circuit under modeling as a black-box
function without incorporating the circuit topology information. As a result, the surrogate model
is only performing interpolation with limited prediction accuracy in the given design space of the
training dataset. When extrapolating the model to unseen design space, the modeling accuracy
can be even much worse. Accordingly, the surrogate-model-based design approach is not capable
of optimizing the design beyond the training dataset.
Inspired by the recent developments in the Graph Neural Network (GNN), we propose to
model the circuit P2M function using Graph Convolutional Neural Network (GCN) [8], to significantly improve the circuit modeling accuracy, and successfully extrapolate the model to unseen
design space for performance enhancement. The graph that represents the circuit can be directly
generated from the netlist, with device pins defined as nodes, and wires defined as edges. Different from an MLP which connects all the circuit parameters together, graph convolution only
updates the node information based on the neighboring nodes, which mimics the signal propagation and loading effect in an AMS circuit. Compared to existing MLP and Circuit Connectivity inspired Neural Network (CCINN) [9] modeling approach, the GCN can effectively extract
the circuit operation feature, reduce the unnecessary neuron connection, mitigate over-fitting,
and achieve significant modeling accuracy improvement (2.2x - 4.5x) in various design examples.
When extrapolating the GCN model to unseen design space, the GCN model can maintain similar prediction accuracy, indicating that the GCN model can be applied for circuit performance
enhancement without extra simulation overhead.
77
Figure 4.10: Different graph construction options for a two-stage common-source amplifier. (a)
Circuit schematic. (b) Direct mapping with devices as nodes and wire connections as edges.
(c) Graph with transistor pin specified as individual nodes. (d) Graph with redundant constant
voltage connections removed. (e) Graph node input feature definition
4.2.1 Graph construction from a circuit topology
In order to apply GCNs to the circuit modeling, it is crucial to find the best mapping from the
circuit topology to the corresponding graph to perform circuit feature extraction. Taking a twostage common-source amplifier with capacitive loading as an example, Fig. 4.10 presents different
graph construction options.
78
4.2.1.1 Direct Mapping
The most straightforward way of generating a circuit graph is to define each device as a graph
node and each wire connection as a graph edge, as shown in Fig. 4.10(b). This approach is
frequently used in prior works such as [37, 3] because of its simplicity and intuitiveness. However,
this graph fails to convey practical circuit topology information. In the example given, all the
transistor nodes are connected together, which highly resembles an FCNN. The graph fails to
represent the different loading or driving effects from the transistor gate or source/drain, which
limits the modeling accuracy improvement from a conventional FCNN.
4.2.1.2 Proposed Transistor-Pin-Specified Graph Construction
To avoid circuit topology information loss, we propose to construct the graph with each transistor
pin specified as one node. The transistor gate and source/drain are treated as different nodes with
different input feature vectors, mimicking their different physical driving and loading effects.
Within each transistor, different graph nodes are connected via edges to represent the transistor
function. Passive devices, such as capacitors and resistors, are still regarded as one node since
both pins have the same physical property and are interchangeable. Accordingly, the transistorpin-specified graph is shown in Fig. 4.10(c). This graph resembles the actual circuit topology,
thereby enhancing the GCN circuit P2M function modeling capability.
4.2.1.3 Redundant Edge Removal
Nevertheless, directly mapping the circuit wire connections as graph edges may not be the most
effective method for circuit P2M function modeling. In GCN circuit modeling, graph edges are
utilized to extract the loading and driving effect between nodes. However, when an external
79
source such as a supply or ground forces a voltage on a wire, circuit components cannot be loaded
or driven through that wire. Therefore, the corresponding graph edges should be eliminated to
reduce the complexity of the graph and mitigate over-fitting, as is shown in Fig. 4.10(d).
4.2.1.4 Node Input Feature Definition
Once the graph construction criteria have been established, it is also important to determine
the feature of each node. In this proposed modeling method, the input feature of each node is
a one-dimensional vector. To represent the different functions of different device types in the
circuit, the design parameters of each device type are assigned to specific locations in the vector,
while the remaining elements are fixed at zero, as shown in Fig. 4.10(e). The transistor gate and
source/drain are treated as different types of nodes due to their different loading and driving
characteristics.
4.2.2 Circuit modeling using graph convolution
With the aforementioned graph and input feature, GCN feature extraction can be performed to
model the circuit P2M function. GNN was invented to process data with graph structures such as
molecules or social networks. Naturally, GNN can be also applied to circuit modeling to utilize the
information embedded in the circuit graph structure. Similar to the circuit operation principle, the
feature update of each node is mainly affected by the neighboring nodes according to the circuit
connectivity. Further separated devices in the circuit are also isolated in the GNN. Compared
to CCINN, the circuit graph can be directly generated from its netlist down to the device level,
with no human-in-the-loop. In this way, the NN over-fitting is much reduced, and extrapolating
the model for more optimal performance becomes possible. In this work, we further propose a
80
Figure 4.11: Circuit modeling using graph convolution followed by fully connected layers
pin-specified circuit to graph mapping, to make the GNN feature extraction process more similar
to actual circuit operation.
4.2.2.1 GCN Structure
The GCN structure used for circuit modeling is shown in Fig. 4.11. The input graph has multiple
layers, with each layer representing one input feature element of each node. Graph convolution
is performed on the graph as described in [17]
f(A, X) = σ(D
− 1
2AD− 1
2XW), (4.21)
where A is the adjacency matrix deduced from the circuit topology, X is the input node feature generated from circuit parameters, D is the degree matrix of A for normalization, W is the
trainable convolution weight matrix, and σ is the activation function. Multiple layers of graph
convolution can be applied to model direct and high-order loading effects. The output feature
vectors from the graph nodes are then combined and flattened into a one-dimensional vector. Finally, several fully connected layers are used to process the features into final output performance
metrics.
81
During GCN model training, all input parameters and output metrics are normalized to the
interval of [-1, 1]. Mean Squared Error (MSE) is used as the loss function for gradient descent
optimization. It calculates the difference between the predicted performance metrics (Mˆ ) and the
corresponding "golden reference" metrics (M) obtained from SPICE simulations as
MSE =
1
N
X
N
i=1
(Mˆ
i − Mi)
2
, (4.22)
where N is the number of performance metrics of the circuit under modeling. The error is backpropagated to update the fully connected layer weight/bias and the graph convolution weight
matrix.
4.2.2.2 Intuitive Explanation of GCN Circuit Modeling Advantages
Compared to an FCNN, a GCN is capable of extracting the circuit topology information to enhance
the modeling accuracy and efficiency. To intuitively explain this advantage, here we take the
two-stage common-source amplifier as an example. The design parameters include the transistor
width and length, and the loading capacitance, as shown in Fig. 4.12. The target is to model the
bandwidth of this amplifier, which is limited by the two poles at the first and second stage output.
The first pole is determined by the first-stage driving capability and the loading from the second
stage, which can be expressed as
P ole1 = p1(W1, L1, W2, L2, W4, L4). (4.23)
82
Figure 4.12: Intuitive explanation of circuit topology information embedded in the graph and
GCN circuit modeling
Similarly, the second stage output pole can be expressed as
P ole2 = p2(W3, L3, W4, L4, C). (4.24)
Accordingly, the overall circuit bandwidth can be expressed as
BW = fBW (p1, p2). (4.25)
83
The above equations are derived by using human designers’ knowledge and circuit topology
information. When modeling the amplifier bandwidth as a function of circuit design parameters, a GCN can learn to mimic this two-level function. For demonstration purposes, we assume the GCN model consists of one graph convolution layer and one fully connected layer.
During the graph convolution, each node gathers the feature from its neighboring nodes, as
shown in Fig. 4.12. Therefore, the driving and loading effect between the first stage and the
second stage are gathered at the two drain nodes of the first stage and the gate node of the second stage as g1
(W1, L1, W2, L2, W4, L4). Similarly, the driving and loading effects at the output
node are gathered at the two drain nodes of the second stage and the output capacitor node as
g2
(W3, L3, W4, L4, C). To make this analysis more concise, the information gathering that happened between the nodes of each individual transistor is not shown. After the graph convolution,
the fully connected layer further process node feature into the predicted bandwidth as
BWˆ = fGCN (g1
, g2
), (4.26)
which is in the same format as the analyzed bandwidth in equation (4.23) to (4.25).
Therefore, a GCN can effectively decompose the modeling of a high-order function into the
modeling of several lower-order functions according to the circuit topology. This approach can
lead to better generalization, less over-fitting, and higher accuracy in modeling. On the contrary,
an FCNN directly model the high-order function as
BWˆ = fF CNN (W1, ..., W4, L1, ..., L4, C), (4.27)
84
which can cause more severe over-fitting and requires extra training samples to achieve comparable modeling accuracy.
4.2.3 Experimental results
In this section, we verify the proposed circuit graph construction and GCN modeling method
using two circuit examples. We compare our proposed method with the conventional FCNN
and the recently proposed CCINN in two different modeling scenarios. In the first scenario,
the training and testing samples are homogeneously sampled from the design space. When the
model is tested, it essentially performs interpolation between training samples to predict the
performance of the testing samples. Therefore, we refer to this scenario as an Interpolation Test
hereafter. In the second scenario, we sort the whole dataset according to the most representative
performance metric of the circuit. Then, we use 90% of the samples with the worst performance
as the training dataset and test the model with the remaining 10% samples that have the best
performance. In this scenario, the model is extrapolated to unseen design spaces to verify whether
it can be utilized to further improve the circuit performance. Therefore, we refer to this scenario
as the Extrapolation Test hereafter.
4.2.3.1 Voltage to Time Converter
In the first example, we use the voltage-to-time converter(VTC) as an example. VTC is a critical
block in high-speed time-domain analog-to-digital converters, which is one of the prevailing
research topics in the field of data converter circuit design [26, 47, 7]. The schematic of the VTC
is shown in Fig. 4.13. All the transistors use minimum length, and the design parameters of
this circuit are the sampling NMOS width, sampling capacitor size, and the width of the four
85
Figure 4.13: Voltage to time converter schematic
Figure 4.14: (a) Testing MSE, (b) average SNDR prediction error, (c) average power prediction
error of the GCN and FCNN in the VTC interpolation test
transistors in the level-cross detector. As for the performance metrics, we choose Signal to Noise
and Distortion Ratio (SNDR), Spurious Free Dynamic Range (SFDR), and power consumption.
In this example, we only compare the proposed GCN modeling method with the conventional
FCNN, since the VTC does not have an explicit multi-stage structure to apply CCINN. The FCNN
consists of one input layer, four hidden layers, and one output layer, while the GCN consists of
three graph convolution layers and three fully connected layers. The structures of the GCN and
FCNN are summarized in Table 4.3.
86
Figure 4.15: (a) Testing MSE, (b) average SNDR extrapolation error, (c) average power extrapolation error of the GCN and FCNN in the VTC extrapolation test
Table 4.3: GCN and FCNN detail for the VTC modeling
In the interpolation test, we randomly sample a 2000-sample VTC dataset from the design
space and divide it into 1750 training samples and 250 testing samples. The GCN and FCNN
models are trained using varied numbers of training samples (from 40 to 1750) and tested using
all the testing samples. Fig. 4.14 shows the testing MSE and metrics prediction error rate using different numbers of training samples, and GCN consistently achieves a higher modeling accuracy.
When trained with the whole training dataset, the GCN model can achieve a 4.5x lower testing
MSE, a 2.3x lower SNDR prediction error, and a 2.4x lower power prediction error. Furthermore,
to achieve the lowest MSE of the FCNN, the GCN only requires one-third of the training samples.
For the extrapolation test, we sort the VTC dataset according to the SNDR of each sample and
use the 250 ones with the highest SNDR as the testing samples. The GCN and FCNN are trained
using samples randomly selected from the rest of the dataset. Fig. 4.15 shows the testing MSE and
87
Figure 4.16: 3-stage nested miller compensated amplifier schematic
metrics extrapolation error rate using different numbers of training samples. The conventional
FCNN treats the circuit P2M function as a black box, and the over-fitting prevents the model from
extrapolating the design to unseen design space and accurately predicting the performance. In
contrast, the proposed GCN modeling method incorporates the circuit topology information and
can effectively extrapolate the model to a more optimized design with reasonably good performance prediction accuracy.
4.2.3.2 Three-Stage Amplifier
In the second example, we use a fully differential 3-Stage Nested Miller Compensated Amplifier
(3S-NMCA), which has multiple poles, zeros, and feedback loops, to demonstrate the effectiveness
of our proposed GCN modeling method for sophisticated analog design. The schematic of the
88
Figure 4.17: (a) Testing MSE, (b) average gain prediction error, (c) average UGB prediction error
of the GCN, FCNN, and CCINN in the amplifier interpolation test
Figure 4.18: (a) Testing MSE, (b) average gain extrapolation error, (c) average UGB extrapolation
error of the GCN, FCNN, and CCINN in the amplifier extrapolation test
amplifier is shown in Fig. 4.16, which contains 25 design parameters and six performance metrics.
The design parameters include transistor sizes, bias voltages, feedback capacitance/resistance,
and loading capacitance. The performance metrics include gain, unity gain bandwidth (UGB),
phase margin, common-mode gain, input capacitance, and power consumption.
In this example, we compare the proposed GCN modeling method with the conventional
FCNN and CCINN. The CCINN consists of three sub-NN with direct and sequential connections,
which correspond to the three amplifier stages. Each sub-NN has three hidden layers, and the
sub-NN outputs are connected together through a final layer [9]. The structures of the GCN,
FCNN, and CCINN are summarized in Table 4.4.
In the interpolation test, we randomly sampled a 20000-sample amplifier dataset from the
design space, which was then divided into 17500 training samples and 2500 testing samples. Fig.
4.17 shows the testing MSE and metrics prediction error rate using different numbers of training
89
Table 4.4: GCN, FCNN, and CCINN detail for the amplifier modeling
samples, and the GCN again achieves higher modeling accuracy. When trained with the whole
training dataset, the GCN model can achieve a 2.2x lower testing MSE, a 2.3x lower gain prediction
error, and a 1.6x lower UGB prediction error compared to the CCINN. To achieve the lowest MSE
of the CCINN, the GCN only requires less than one-sixth of the training samples.
For the extrapolation test, we sort the amplifier dataset according to the UGB of each sample and use the 2500 ones with the largest UGB as the testing samples. The models are trained
using samples randomly selected from the rest of the dataset. Fig. 4.18 shows the testing MSE
and metrics extrapolation error rate using different numbers of training samples. Although the
CCINN utilizes the circuit multi-stage structure, each amplifier stage and all the feedback loops
are still treated as black boxes, and performance prediction of designs outside the existing design
space cannot be achieved. With the proposed GCN modeling method, all the circuit topology information of each stage and the feedback loops can be utilized, and the model can be successfully
extrapolated to predict the performance of more optimized designs.
90
4.3 Summary
AMS circuit modeling is the key bottleneck of surrogate-model-based circuit design automation
flow. NNs are usually used for circuit modeling because of their large model capacity compared
classical modeling method. To reduce the required training sample for the NN model, especially
for the LPE/silicon design level, we proposed two modeling algorithms. TL is proposed to reuse
a well-trained schematic-level model to fundamentally reduce the number of training samples to
generate a post-layout/silicon-level circuit model, which makes the complete design automation
from specification GDS and silicon possible. To further utilized the circuit topology information of
the circuit under modeling, we proposed a circuit graph construction method, and use GCN when
modeling the circuit P2M function. The GCN model successfully achieved significantly higher
modeling accuracy compared to the existing NN-based modeling algorithm. Furthermore, the
GCN model can be extrapolated to unseen design space and accurately predict the performance
of more optimized design samples, which for the first time indicates that a surrogate model-based
circuit design automation algorithm can be used for circuit performance improvement without
extra simulation overhead.
91
Chapter 5
Conclusion
5.1 Summary of existing works
The ever-increasing demand for high-speed data transmission requires high-performance lowcost AMS circuits such ADCs and DACs. However, existing AMS circuits do not benefit from
technology as much as digital circuits, and the design process is becoming increasingly rigorous
in advanced FinFet technology nodes. Conventional voltage-domain conversion structures in
ADCs make them the most power-hungry block in a data communication or acquisition system,
and manual design flow results in high design cost and long time-to-market. To reduce AMS
circuits more efficiently in terms of power consumption and fabrication/design cost, my research
focused on both novel ADC architectures and AMS design automation.
Regarding the ADC architecture innovation, I first proposed to use the differential SAR TDC
architecture in a time-domain ADC for efficient time-to-digital conversion. To further increase
the ADC conversion speed, a delay-tracking pipelining technique is proposed to increase the SAR
92
TDC throughput. A proof-of-concept ADC prototype was fabricated in 14nm FinFet technology, which achieved 10GS/s conversion speed with merely two TI channels, and state-of-the-art
power/area efficiency.
While the first prototype achieved high efficiency, the ENOB is limited due to TDC delay
variations and a limited input dynamic range. To overcome those limitations, I further proposed
a delay variation calibration scheme and a bottom-plate-sampling VTC scheme, to increase the
ADC SNR and TDC static linearity with significant extra power consumption. A prototype directRF sampling ADC is fabricated in 4nm FinFet technology, achieving 16GS/S and 44.5dB SNDR,
with a state-of-the-art 153.8dB Schreier FoM.
As for the AMS circuit design automation, I focused on AMS circuit modeling, which is the
most time-consuming step in a surrogate-model-based design flow. To reduce the number of
training samples needed for the post-layout/silicon circuit modeling, I proposed TL technique
to reuse the well-trained schematic-level circuit model. This approach demonstrated significant
modeling efficiency and accuracy improvements compared to training the model from scratch.
To further incorporate the circuit topology information into the circuit NN model, I proposed
to use GNN for circuit modeling together with a transistor pin-specified circuit-to-graph mapping method to fully utilized the circuit topology. Experiments showed that the GNN modeling
method can significantly improve modeling accuracy. Furthermore, for the first time, the GNN
circuit model can be extrapolated to unseen design spaces with reasonable circuit performance
prediction accuracy, which indicates that a surrogate model-based circuit design automation algorithm can be used for circuit performance improvement without extra simulation overhead.
93
5.2 Future works
My future direction for AMS circuit design should focus on co-optimization between circuit architecture and design automation. On the circuit architecture side, I will try to push for even
higher ADC resolution and ENOB using a voltage/time hybrid conversion scheme. After solving
the delay variation issue in the TDC, the ultimate limiting factor for a time-domain ADC SNR
is the limited input dynamic range and the significant jitter accumulated in the TDC chain. To
reduce the TDC jitter, a time amplifier can be applied to reduce the input-referred noise of the
long TDC conversion chain, while the challenge is to design the time amplifier with high linearity
and stability. To extend the voltage dynamic range, a voltage-domain coarse ADC can be applied
to process the large input signal, and the residue voltage can be converted to a time-domain for
further quantization. The advantage of this conversion scheme is that the hybrid structure can
take advantage of both voltage-domain and time-domain quantization, to achieve a more optimal
SNDR and FoM at high conversion speed. On the other hand, the design challenges are to build a
voltage-to-time residue generation circuit with high operation speed and high linearity. To further push for higher sampling/input frequency, bootstrap S/H should be applied, while modified
to accommodate the VTC operation.
For the design automation aspect, I will try to utilize the existing modeling method in a complete design flow, to fully automate AMS circuit design with performance beyond manual designs.
GNN is proven to be able to accurately predict the performance of more optimized designs beyond the training dataset. Therefore, we can further apply transfer learning to the GNN circuit
model, and optimize the circuit at the layout level, and automatically generate a more optimized
GDS compared to a manual design.
94
Bibliography
[1] A.A. Abidi. “Phase Noise and Jitter in CMOS Ring Oscillators”. In: IEEE Journal of
Solid-State Circuits 41.8 (2006), pp. 1803–1816. doi: 10.1109/JSSC.2006.876206.
[2] Ahmed M. A. Ali, Huseyin Dinc, Paritosh Bhoraskar, Scott Bardsley, Chris Dillon,
Mohit Kumar, Matthew McShea, Ryan Bunch, Joel Prabhakar, and Scott Puckett. “16.1 A
12b 18GS/s RF Sampling ADC with an Integrated Wideband Track-and-Hold Amplifier
and Background Calibration”. In: 2020 IEEE International Solid- State Circuits Conference -
(ISSCC). 2020, pp. 250–252. doi: 10.1109/ISSCC19947.2020.9063011.
[3] Tinghuan Chen, Qi Sun, Canhui Zhan, Changze Liu, Huatao Yu, and Bei Yu. “Deep
H-GCN: Fast Analog IC Aging-Induced Degradation Estimation”. In: IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems 41.7 (2022), pp. 1990–2003. doi:
10.1109/TCAD.2021.3107250.
[4] Michael Chu, Philip Jacob, Jin-Woo Kim, Mitchell R. LeRoy, Russell P. Kraft, and
John F. McDonald. “A 40 Gs/s Time Interleaved ADC Using SiGe BiCMOS Technology”.
In: IEEE Journal of Solid-State Circuits 45.2 (2010), pp. 380–390. doi:
10.1109/JSSC.2009.2039375.
[5] Hayun Chung, Hiroki Ishikuro, and Tadahiro Kuroda. “A 10-Bit 80-MS/s Decision-Select
Successive Approximation TDC in 65-nm CMOS”. In: IEEE Journal of Solid-State Circuits
47.5 (2012), pp. 1232–1241. doi: 10.1109/JSSC.2012.2184640.
[6] Mingqiang Guo, Jiaji Mao, Sai-Weng Sin, Hegong Wei, and R. P. Martins. “A 29mW 5GS/s
Time-interleaved SAR ADC achieving 48.5dB SNDR With Fully-Digital Timing-Skew
Calibration Based on Digital-Mixing”. In: 2019 Symposium on VLSI Circuits. 2019,
pp. C76–C77. doi: 10.23919/VLSIC.2019.8778077.
[7] Mohsen Hassanpourghadi and Mike Shuo-Wei Chen. “A 2-way 7.3-bit 10 GS/s
Time-based Folding ADC with Passive Pulse-Shrinking Cells”. In: 2019 IEEE Custom
Integrated Circuits Conference (CICC). 2019, pp. 1–4. doi: 10.1109/CICC.2019.8780180.
95
[8] Mohsen Hassanpourghadi, Rezwan A Rasul, and Mike Shuo-Wei Chen. “A
Module-Linking Graph Assisted Hybrid Optimization Framework for Custom Analog and
Mixed-Signal Circuit Parameter Synthesis”. In: ACM Transactions on Design Automation
of Electronic Systems (Jan. 2021). doi: 10.1145/3456722.
[9] Mohsen Hassanpourghadi, Shiyu Su, Rezwan A Rasul, Juzheng Liu, Qiaochu Zhang, and
Mike Shuo-Wei Chen. “Circuit Connectivity Inspired Neural Network for Analog
Mixed-Signal Functional Modeling”. In: 2021 58th ACM/IEEE Design Automation
Conference (DAC). 2021, pp. 505–510. doi: 10.1109/DAC18074.2021.9586236.
[10] Young-Hyun Jun, Ki Jun, and Song-Bai Park. “An accurate and efficient delay time
modeling for MOS logic circuits using polynomial approximation”. In: IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems 8.9 (1989), pp. 1027–1032.
doi: 10.1109/43.35557.
[11] John P. Keane, Nathaniel J. Guilar, Dusan Stepanovic, Bernd Wuppermann, Charles Wu,
Cheongyuen W. Tsang, Robert Neff, and Ken Nishimura. “An 8GS/s time-interleaved SAR
ADC with unresolved decision detection achieving -58dBFS noise and 4GHz bandwidth
in 28nm CMOS”. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC). 2017,
pp. 284–285. doi: 10.1109/ISSCC.2017.7870372.
[12] T. Kiely and G. Gielen. “Performance modeling of analog integrated circuits using
least-squares support vector machines”. In: Proceedings Design, Automation and Test in
Europe Conference and Exhibition. Vol. 1. 2004, 448–453 Vol.1. doi:
10.1109/DATE.2004.1268887.
[13] Gain Kim, Lukas Kull, Danny Luu, Matthias Braendli, Christian Menolfi,
Pier-Andrea Francese, Hazar Yueksel, Cosimo Aprile, Thomas Morf, Marcel Kossel,
Alessandro Cevrero, Ilter Ozkaya, Hyeon-Min Bae, Andreas Burg, Thomas Toifl, and
Yusuf Leblebici. “A 4.8pJ/b 56Gb/s ADC-Based PAM-4 Wireline Receiver Data-Path with
Cyclic Prefix in 14nm FinFET”. In: 2019 IEEE Asian Solid-State Circuits Conference
(A-SSCC). 2019, pp. 239–240. doi: 10.1109/A-SSCC47793.2019.9056940.
[14] Gain Kim, Lukas Kull, Danny Luu, Matthias Braendli, Christian Menolfi,
Pier-Andrea Francese, Hazar Yueksel, Cosimo Aprile, Thomas Morf, Marcel Kossel,
Alessandro Cevrero, Ilter Ozkaya, Andreas Burg, Thomas Toifl, and Yusuf Leblebici. “A
161-mW 56-Gb/s ADC-Based Discrete Multitone Wireline Receiver Data-Path in 14-nm
FinFET”. In: IEEE Journal of Solid-State Circuits 55.1 (2020), pp. 38–48. doi:
10.1109/JSSC.2019.2938414.
[15] KwangSeok Kim, WonSik Yu, and SeongHwan Cho. “A 9 bit, 1.12 ps Resolution 2.5 b/Stage
Pipelined Time-to-Digital Converter in 65 nm CMOS Using Time-Register”. In: IEEE
Journal of Solid-State Circuits 49.4 (2014), pp. 1007–1016. doi: 10.1109/JSSC.2013.2297412.
96
[16] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In:
arXiv preprint arXiv:1412.6980 (2014).
[17] Thomas N Kipf and Max Welling. “Semi-supervised classification with graph
convolutional networks”. In: arXiv preprint arXiv:1609.02907 (2016).
[18] Shiva Kiran, Shengchang Cai, Ying Luo, Sebastian Hoyos, and Samuel Palermo. “A
52-Gb/s ADC-Based PAM-4 Receiver With Comparator-Assisted 2-bit/Stage SAR ADC
and Partially Unrolled DFE in 65-nm CMOS”. In: IEEE Journal of Solid-State Circuits 54.3
(2019), pp. 659–671. doi: 10.1109/JSSC.2018.2878850.
[19] Yoel Krupnik, Yevgeny Perelman, Itamar Levin, Yosi Sanhedrai, Roee Eitan,
Ahmad Khairi, Yizhak Shifman, Yoni Landau, Udi Virobnik, Noam Dolev, Alon Meisler,
and Ariel Cohen. “112-Gb/s PAM4 ADC-Based SERDES Receiver With Resonant AFE for
Long-Reach Channels”. In: IEEE Journal of Solid-State Circuits 55.4 (2020), pp. 1077–1085.
doi: 10.1109/JSSC.2019.2959511.
[20] I-Ning Ku, Zhiwei Xu, Yen-Cheng Kuan, Yen-Hsiang Wang, and
Mau-Chung Frank Chang. “A 40-mW 7-bit 2.2-GS/s Time-Interleaved Subranging CMOS
ADC for Low-Power Gigabit Wireless Communications”. In: IEEE Journal of Solid-State
Circuits 47.8 (2012), pp. 1854–1865. doi: 10.1109/JSSC.2012.2196731.
[21] Lukas Kull, Danny Luu, Christian Menolfi, Matthias Brändli, Pier Andrea Francese,
Thomas Morf, Marcel Kossel, Alessandro Cevrero, Ilter Ozkaya, and Thomas Toifl. “A
24–72-GS/s 8-b Time-Interleaved SAR ADC With 2.0–3.3-pJ/Conversion and >30 dB
SNDR at Nyquist in 14-nm CMOS FinFET”. In: IEEE Journal of Solid-State Circuits 53.12
(2018), pp. 3508–3516. doi: 10.1109/JSSC.2018.2859757.
[22] Lukas Kull, Danny Luu, Christian Menolfi, Thomas Morf, Pier Andrea Francese,
Matthias Braendli, Marcel Kossel, Alessandro Cevrero, Ilter Ozkaya, and Thomas Toifl. “A
10-Bit 20–40 GS/S ADC with 37 dB SNDR at 40 GHz Input Using First Order Sampling
Bandwidth Calibration”. In: 2018 IEEE Symposium on VLSI Circuits. 2018, pp. 275–276. doi:
10.1109/VLSIC.2018.8502268.
[23] Sandeep Santhosh Kumar, Masahiro Kudo, Vlad Cretu, Antoine Morineau,
Atsushi Matsuda, Minori Yoshida, Masazumi Marutani, Aadil Hussain Maniyar, and
Jay Kumar. “A 750mW 24GS/s 12b Time-Interleaved ADC for Direct RF Sampling in
Modern Wireless Systems”. In: 2023 IEEE International Solid- State Circuits Conference
(ISSCC). 2023, pp. 1–3. doi: 10.1109/ISSCC42615.2023.10067793.
[24] Yaping Li, Yong Wang, Yusong Li, Ranran Zhou, and Zhaojun Lin. “An Artificial Neural
Network Assisted Optimization System for Analog Design Space Exploration”. In: IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems 39.10 (2020),
pp. 2640–2653. doi: 10.1109/TCAD.2019.2961322.
97
[25] Haidang Lin, Charlie Boecker, Masum Hossain, Shankar Tangirala, Roxanne Vu,
Socrates D. Vamvakos, Eric Groen, Simon Li, Prashant Choudhary, Nanyan Wang,
Masumi Shibata, Hossein Taghavi, Marcus van Ierssel, AdilHussain Maniyar,
Adam Wodkowski, Kulwant Brar, Nhat Nguyen, and Shaishav Desai. “ADC-DSP-Based
10-to-112-Gb/s Multi-Standard Receiver in 7-nm FinFET”. In: IEEE Journal of Solid-State
Circuits 56.4 (2021), pp. 1265–1277. doi: 10.1109/JSSC.2021.3051109.
[26] Juzheng Liu, Mohsen Hassanpourghadi, and Mike Shuo-Wei Chen. “A 10GS/s 8b 25fJ/c-s
2850um2 Two-Step Time-Domain ADC Using Delay-Tracking Pipelined-SAR TDC with
500fs Time Step in 14nm CMOS Technology”. In: 2022 IEEE International Solid- State
Circuits Conference (ISSCC). Vol. 65. 2022, pp. 160–162. doi:
10.1109/ISSCC42614.2022.9731625.
[27] Juzheng Liu, Mohsen Hassanpourghadi, Qiaochu Zhang, Shiyu Su, and
Mike Shuo-Wei Chen. “Transfer Learning with Bayesian Optimization-Aided Sampling
for Efficient AMS Circuit Modeling”. In: 2020 IEEE/ACM International Conference On
Computer Aided Design (ICCAD). 2020, pp. 1–9.
[28] Juzheng Liu, Shiyu Su, Meghna Madhusudan, Mohsen Hassanpourghadi,
Samuel Saunders, Qiaochu Zhang, Rezwan Rasul, Yaguang Li, Jiang Hu,
Arvind Kumar Sharma, Sachin S. Sapatnekar, Ramesh Harjani, Anthony Levi,
Sandeep Gupta, and Mike Shuo-Wei Chen. “From Specification to Silicon: Towards
Analog/Mixed-Signal Design Automation using Surrogate NN Models with Transfer
Learning”. In: 2021 IEEE/ACM International Conference On Computer Aided Design
(ICCAD). 2021, pp. 1–9. doi: 10.1109/ICCAD51958.2021.9643445.
[29] Wenlong Lyu, Pan Xue, Fan Yang, Changhao Yan, Zhiliang Hong, Xuan Zeng, and
Dian Zhou. “An Efficient Bayesian Optimization Approach for Automated Optimization
of Analog Circuits”. In: IEEE Transactions on Circuits and Systems I: Regular Papers 65.6
(2018), pp. 1954–1967. doi: 10.1109/TCSI.2017.2768826.
[30] Ewout Martens, Davide Dermit, Mithlesh Shrivas, Shun Nagata, and Jan Craninckx. “A
Compact 8-bit, 8 GS/s 8×TI SAR ADC in 16nm with 45dB SNDR and 5 GHz ERBW”. In:
2021 Symposium on VLSI Circuits. 2021, pp. 1–2. doi:
10.23919/VLSICircuits52068.2021.9492512.
[31] Boris Murmann. “ADC Performance Survey 1997-2021”. In: Online. Available:
http://web.stanford.edu/ murmann/adcsurvey.html (2021). doi:
http://web.stanford.edu/~murmann/adcsurvey.html..
[32] Jae-Won Nam and Mike Shuo-Wei Chen. “A 12.8-Gbaud ADC-Based Wireline Receiver
With Embedded IIR Equalizer”. In: IEEE Journal of Solid-State Circuits 55.3 (2020),
pp. 557–566. doi: 10.1109/JSSC.2019.2956395.
98
[33] Jae-Won Nam, Mohsen Hassanpourghadi, Aoyang Zhang, and Mike Shuo-Wei Chen. “A
12-Bit 1.6, 3.2, and 6.4 GS/s 4-b/Cycle Time-Interleaved SAR ADC With Dual Reference
Shifting and Interpolation”. In: IEEE Journal of Solid-State Circuits 53.6 (2018),
pp. 1765–1779. doi: 10.1109/JSSC.2018.2808244.
[34] Dong-Ryeol Oh, Jong-In Kim, Dong-Shin Jo, Woo-Chul Kim, Dong-Jin Chang, and
Seung-Tak Ryu. “A 65-nm CMOS 6-bit 2.5-GS/s 7.5-mW 8× Time-Domain Interpolating
Flash ADC With Sequential Slope-Matching Offset Calibration”. In: IEEE Journal of
Solid-State Circuits 54.1 (2019), pp. 288–297. doi: 10.1109/JSSC.2018.2870554.
[35] Shiyu Su, Qiaochu Zhang, Mohsen Hassanpourghadi, Juzheng Liu, Rezwan A Rasul, and
Mike Shuo-Wei Chen. “Analog/Mixed-Signal Circuit Synthesis Enabled by the
Advancements of Circuit Architectures and Machine Learning Algorithms”. In: 2022 27th
Asia and South Pacific Design Automation Conference (ASP-DAC). 2022, pp. 100–107. doi:
10.1109/ASP-DAC52403.2022.9712577.
[36] Kexu Sun, Guanhua Wang, Qing Zhang, Salam Elahmadi, and Ping Gui. “A 56-GS/s 8-bit
Time-Interleaved ADC With ENOB and BW Enhancement Techniques in 28-nm CMOS”.
In: IEEE Journal of Solid-State Circuits 54.3 (2019), pp. 821–833. doi:
10.1109/JSSC.2018.2884352.
[37] Hanrui Wang, Kuan Wang, Jiacheng Yang, Linxiao Shen, Nan Sun, Hae-Seung Lee, and
Song Han. “GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural
Networks and Reinforcement Learning”. In: 2020 57th ACM/IEEE Design Automation
Conference (DAC). 2020, pp. 1–6. doi: 10.1109/DAC18072.2020.9218757.
[38] Jiangfeng Wu, Acer Chou, Tianwei Li, Rong Wu, Tao Wang, Giuseppe Cusmai,
Sha-Ting Lin, Cheng-Hsun Yang, Gregory Unruh, Sunny Raj Dommaraju, Mo M. Zhang,
Po Tang Yang, Wei-Ting Lin, Xi Chen, Dongsoo Koh, Qingqi Dou, H. Mohan Geddada,
Juo-Jung Hung, Massimo Brandolini, Young Shin, Hung-Sen Huang, Chun-Ying Chen,
and Ardie Venes. “27.6 A 4GS/s 13b pipelined ADC with capacitor and amplifier sharing
in 16nm CMOS”. In: 2016 IEEE International Solid-State Circuits Conference (ISSCC). 2016,
pp. 466–467. doi: 10.1109/ISSCC.2016.7418109.
[39] Jiangfeng Wu, Acer Chou, Cheng-Hsun Yang, Yen Ding, Yen-Jen Ko, Sha-Ting Lin,
Wenbo Liu, Chi-Ming Hsiao, Ming-Hung Hsieh, Chun-Cheng Huang, Juo-Jung Hung,
Kwang Young Kim, Michael Le, Tianwei Li, Wei-Ta Shih, Ayaskant Shrivastava,
Yau-Cheng Yang, Chun-Ying Chen, and Hung-Sen Huang. “A 5.4GS/s 12b 500mW
pipeline ADC in 28nm CMOS”. In: 2013 Symposium on VLSI Circuits. 2013, pp. C92–C93.
[40] Jianjun Xu, M.C.E. Yagoub, Runtao Ding, and Qi-Jun Zhang. “Neural-based dynamic
modeling of nonlinear microwave circuits”. In: IEEE Transactions on Microwave Theory
and Techniques 50.12 (2002), pp. 2769–2780. doi: 10.1109/TMTT.2002.805192.
99
[41] Il-Min Yi, Naoki Miura, Hiroyuki Fukuyama, and Hideyuki Nosaka. “A 15.1-mW 6-GS/s
6-bit Single-Channel Flash ADC With Selectively Activated 8× Time-Domain Latch
Interpolation”. In: IEEE Journal of Solid-State Circuits 56.2 (2021), pp. 455–464. doi:
10.1109/JSSC.2020.3017229.
[42] A. Serdar Yonar, Pier Andrea Francese, Matthias Brändli, Marcel Kossel, Thomas Morf,
Jonathan E. Proesel, Sergey Rylov, Herschel Ainspan, Martin Cochet, Zeynep Deniz,
Timothy Dickson, Troy Beukema, Christian Baks, Michael Beakes, John F. Bulzacchelli,
Young-Ho Choi, Byoung-Joo Yoo, Hyoungbae Ahn, Dong-Hyuk Lim, Gunil Kang,
Sang-Hune Park, Mounir Meghelli, Hyo-Gyuem Rhew, Daniel Friedman, Michael Choi,
Mehmet Soyuer, and Jongshin Shin. “An 8-bit 56GS/s 64x Time-Interleaved ADC with
Bootstrapped Sampler and Class-AB Buffer in 4nm CMOS”. In: 2022 IEEE Symposium on
VLSI Technology and Circuits (VLSI Technology and Circuits). 2022, pp. 168–169. doi:
10.1109/VLSITechnologyandCir46769.2022.9830308.
[43] Alireza Zandieh, Peter Schvan, and Sorin P. Voinigescu. “Design of a 55-nm SiGe BiCMOS
5-bit Time-Interleaved Flash ADC for 64-Gbd 16-QAM Fiberoptics Applications”. In: IEEE
Journal of Solid-State Circuits 54.9 (2019), pp. 2375–2387. doi: 10.1109/JSSC.2019.2917155.
[44] Qi-Jun Zhang, K.C. Gupta, and V.K. Devabhaktuni. “Artificial neural networks for RF and
microwave design - from theory to practice”. In: IEEE Transactions on Microwave Theory
and Techniques 51.4 (2003), pp. 1339–1350. doi: 10.1109/TMTT.2003.809179.
[45] Minglei Zhang, Chi-Hang Chan, Yan Zhu, and Rui P. Martins. “A 0.6-V 13-bit 20-MS/s
Two-Step TDC-Assisted SAR ADC With PVT Tracking and Speed-Enhanced
Techniques”. In: IEEE Journal of Solid-State Circuits 54.12 (2019), pp. 3396–3409. doi:
10.1109/JSSC.2019.2938450.
[46] Minglei Zhang, Yan Zhu, Chi-Hang Chan, and Rui P. Martins. “A 20GS/s 8b
Time-Interleaved Time-Domain ADC with Input-Independent Background Timing Skew
Calibration”. In: 2021 Symposium on VLSI Circuits. 2021, pp. 1–2. doi:
10.23919/VLSICircuits52068.2021.9492436.
[47] Minglei Zhang, Yan Zhu, Chi-Hang Chan, and Rui P. Martins. “An 8-Bit 10-GS/s 16×
Interpolation-Based Time-Domain ADC With <1.5-ps Uncalibrated Quantization Steps”.
In: IEEE Journal of Solid-State Circuits 55.12 (2020), pp. 3225–3235. doi:
10.1109/JSSC.2020.3012776.
[48] Zihao Zheng, Lai Wei, Jorge Lagos, Ewout Martens, Yan Zhu, Chi-Hang Chan,
Jan Craninckx, and Rui P. Martins. “A Single-Channel 5.5mW 3.3GS/s 6b Fully Dynamic
Pipelined ADC with Post-Amplification Residue Generation”. In: 2020 IEEE International
Solid- State Circuits Conference - (ISSCC). 2020, pp. 254–256. doi:
10.1109/ISSCC19947.2020.9062895.
100
[49] Shuang Zhu, Bo Wu, Yongda Cai, and Yun Chiu. “A 2-GS/s 8-bit Non-Interleaved
Time-Domain Flash ADC Based on Remainder Number System in 65-nm CMOS”. In: IEEE
Journal of Solid-State Circuits 53.4 (2018), pp. 1172–1183. doi: 10.1109/JSSC.2017.2774280.
101
Abstract (if available)
Abstract
This thesis presents a time-domain ADC architecture for high-speed high-efficiency conversion between analog and digital signals, together with an AMS design automation algorithm to reduce the AMS circuit design time and cost. For the time-domain ADC, I propose a selective delay tuning technique in the SAR TDC to achieve high-efficiency time-to-digital conversion. To enhance the throughput, I further propose a delay-tracking pipelining technique to enhance the SAR TDC conversion speed without significant power or noise overhead. The first ADC prototype was fabricated in 14nm FinFet CMOS technology achieving 10GS/s 8-bit conversion with merely 2 time-interleaved channels and 14.8mW power consumption. To apply the time-domain ADC to direct-RF sampling wireless communication systems, the ADC resolution needs to be further improved. Therefore, I proposed to add several redundant stages towards the end of the SAR TDC conversion chain and implemented a background delay-offset calibration scheme for the redundant stages to ensure their accuracy. The second ADC prototype was fabricated in 4nm FinFet CMOS technology, and the ADC achieved 16GS/s conversion speed with 44.5dB SNDR and the state-of-the-art 153.8dB Schreier power efficiency. Besides the circuit structure innovation, this thesis also describes a transfer learning algorithm for efficient and accurate circuit modeling. I proposed to leverage the well-trained schematic-level circuit NN model and transfer the information to a post-layout/silicon-level circuit model with few training samples. Modeling and circuit optimization examples are demonstrated to verify the efficiency and effectiveness of the proposed algorithm.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Analog and mixed-signal parameter synthesis using machine learning and time-based circuit architectures
PDF
Nonuniform sampling and digital signal processing for analog-to-digital conversion
PDF
Variation-aware circuit and chip level power optimization in digital VLSI systems
PDF
Digital to radio frequency conversion techniques
PDF
Low-power, dual sampling-rate, shared-architecture ADC for implantable biomedical systems
PDF
Average-case performance analysis and optimization of conditional asynchronous circuits
PDF
Charge-mode analog IC design: a scalable, energy-efficient approach for designing analog circuits in ultra-deep sub-µm all-digital CMOS technologies
PDF
Electronic design automation algorithms for physical design and optimization of single flux quantum logic circuits
PDF
An asynchronous resilient circuit template and automated design flow
PDF
Silicon photonics integrated circuits for analog and digital optical signal processing
PDF
Formal equivalence checking and logic re-synthesis for asynchronous VLSI designs
PDF
Optimization methods and algorithms for constrained magnetic resonance imaging
PDF
Automatic conversion from flip-flop to 3-phase latch-based designs
PDF
Towards efficient edge intelligence with in-sensor and neuromorphic computing: algorithm-hardware co-design
PDF
Advanced cell design and reconfigurable circuits for single flux quantum technology
PDF
Theory and design of magnetic induction-based wireless underground sensor networks
PDF
Design and testing of SRAMs resilient to bias temperature instability (BTI) aging
PDF
Memristive device and architecture for analog computing with high precision and programmability
PDF
Scalable multivariate time series analysis
PDF
Calibration of digital-to-analog converters in highly-integrated RF transceivers using machine learning
Asset Metadata
Creator
Liu, Juzheng
(author)
Core Title
Towards high-performance low-cost AMS designs: time-domain conversion and ML-based design automation
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering (VLSI Design)
Degree Conferral Date
2023-12
Publication Date
10/20/2023
Defense Date
10/13/2023
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
analog-to-digital converter,Bayesian optimization,circuit modeling,circuit optimization,OAI-PMH Harvest,pipelined-SAR,SAR TDC,time-domain,time-to-digital converter,transfer learning,voltage-to-time converter
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Chen, Mike Shuo-Wei (
committee chair
), Chen, Yong (
committee member
), Hashemi, Hossein (
committee member
)
Creator Email
juzhengl@usc.edu,liujuzheng15@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113759454
Unique identifier
UC113759454
Identifier
etd-LiuJuzheng-12429.pdf (filename)
Legacy Identifier
etd-LiuJuzheng-12429
Document Type
Dissertation
Format
theses (aat)
Rights
Liu, Juzheng
Internet Media Type
application/pdf
Type
texts
Source
20231020-usctheses-batch-1102
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
analog-to-digital converter
Bayesian optimization
circuit modeling
circuit optimization
pipelined-SAR
SAR TDC
time-domain
time-to-digital converter
transfer learning
voltage-to-time converter