Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Analog and mixed-signal parameter synthesis using machine learning and time-based circuit architectures
(USC Thesis Other)
Analog and mixed-signal parameter synthesis using machine learning and time-based circuit architectures
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ANALOG AND MIXED-SIGNAL PARAMETER SYNTHESIS USING MACHINE LEARNING AND TIME-BASED CIRCUIT ARCHITECTURES by Mohsen Hassanpourghadi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2022 Copyright 2022 Mohsen Hassanpourghadi Dedication Dedicated to my dear mother, father, wife and little Sara ii Acknowledgments This work could not have been accomplished without the following supports. I would like to express my gratitude to my advisor Prof. Mike Shuo-Wei Chen, who welcomes any innovative ideas and helps to expand them with love and dedication. During our discussions, his tolerance and patience have helped me organize the initial chaotic thoughts into something more constructed and legitimate research work. Thisworkwouldnothavebeenpossibleifitweren’tforhisnonstopeagerness to extend the limits, as it was hit by many different obstacles and failures. I will never forget his guidance and support and use them in my next professional endeavors. Iwouldliketoextendmygratitudetothemembersofmyqualifyinganddefense committee, includingProf. HosseinHashemi, Prof. PierluigiNuzzo, Prof. Sandeep Gupta, and Prof. Aiichiro Nakano. They generously shared their viewpoints that gave me insights into the other aspects of my research. Their mindful insights improved the quality of this work and helped this work extend for more general users. I would like to extend special thanks to Prof. Pierluigi Nuzzo, Prof. Sandeep Gupta for their selfless assistance and advice. They dedicated their time to have iii several discussions with me, who had little knowledge of CAD tools. Their guid- ance and feedback taught me the significance of CAD tools and assisted me in constructing the foundation of this project. I would like to thank Prof. Tony Levi, Prof. Sandeep Gupta, Prof. Mike Shuo-Wei Chen, who established USC’s POSH team. The fundamental concepts of the AMPSE software were grown with their sincere help and care. They always helped patiently and shared their opinions selflessly to improve the project. I like to thank my dear colleagues in the POSH team, including Shiyu Su, Juzheng Liu, Qiaochu Zhang, Mutian Zhu, Baishakhi Rani Biswas, and my brother Rezwan A. Rasul. Being together in the POSH projects has made us more than just colleagues, more than friends, a family! From the early days of POSH, we were through integration exercises, working together, traveling together, going through the research difficulties together. Their well-coordinated and passion for the CAD project established the backbone of this work. It kept furnishing it with new outstanding ideas. I also like to thank my seniors, including Cheng-Ru Ho, Jaewon Nam, and Tzu-Fan Wu, as they countlessly dedicated their time to teach me when I had technical difficulties. I also like to thank my lab mates, Pedram Teimouri, Sourya Dey, Naveen Katem, Zisong Wang, Haolin Cong, Ce Yang, Mostafa Ayesh, Soumya Mahapatra, Aria Samiei, and Sushil Subramanian. Special thanks to Aoyang Zhang and Masashi Yamagata, who helped and supported me when I had severe problems even beyond research. I believe Aoyang is the superman disguised asanelectricalengineer. Hissuperpoweristhathehelpseveryonenearhimwithout even asking him to. Masashi is a true gem of a friend, kind, humble, resourceful, and supportive. He has every quality of a good leader, and I will vote for him if he runs for the presidency! iv I would like to acknowledge DARPA that this project would not have been possible without their grants and envisioned future. Finally, I would like to thank my family, as their support has allowed me to pursue my dreams. My parents truly sacrificed their lives for their children to achieve higher education. They dedicated everything they had to provide and support me when I wanted to continue my education abroad. I will always owe them their selfless devotion and pure love. I also love to thank my wife, who at the time was pursuing a Ph.D. at UCI. Her strong personality and patience always helped me overcome my demons and powerfully fight for what I deemed right. Also, we had a kid together, Sara, who was born after my qualifying exam. I would like to thank her for delivering more delight and joy to our house. v Contents Dedication ii Acknowledgments iii List of Tables viii List of Figures ix Abstract xiii 1 Introduction 1 1.1 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Passive Pulse Shrinking Time-based ADC 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 High Speed T-ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Speed-Resolution Trade-off . . . . . . . . . . . . . . . . . . . 8 2.2.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Proposed Passive Pulse Shrinking TDC . . . . . . . . . . . . . . . . 12 2.3.1 PPS TDC with R-C network . . . . . . . . . . . . . . . . . . 12 2.3.2 R-C network Linearization . . . . . . . . . . . . . . . . . . . 14 2.3.3 PPS TDC Design Process . . . . . . . . . . . . . . . . . . . 21 2.4 Circuit Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 Analog Mixed-Signal Parameter Search Engine 30 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Proposed MLG Assisted Hybrid Search Engine With NN . . . . . . 31 3.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.2 MOHSENN Flow Overview . . . . . . . . . . . . . . . . . . 33 3.3 MOHSENN Preparation and Hybrid Search . . . . . . . . . . . . . 35 vi 3.3.1 MLG Construction . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.2 Hybrid Search . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4 Case Study of a SAR ADC . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.1 Preparation of the MOHSENN for SAR ADC Design . . . . 48 3.4.2 Search Engine Results . . . . . . . . . . . . . . . . . . . . . 55 4 Circuit Connectivity Inspired Neural Network CCI-NN 65 4.1 AMS Circuit Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Proposed CCI-NN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.3 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.1 Two-Stage Amplifier . . . . . . . . . . . . . . . . . . . . . . 71 4.3.2 Three-Stage Amplifier . . . . . . . . . . . . . . . . . . . . . 73 4.3.3 Current-Steering DAC . . . . . . . . . . . . . . . . . . . . . 78 5 Two-step SAR ADC with Passive Residue Transfer 80 5.1 Circuit Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2 MLG Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Design Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.4 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6 Conclusion 88 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Reference List 92 vii List of Tables 2.1 COMPARISON TABLE WITH STATE-OF-THE-ART ADCS . . . 29 3.1 Design parameters with ranges . . . . . . . . . . . . . . . . . . . . . 50 3.2 Modules’ metrics of the SAR ADC . . . . . . . . . . . . . . . . . . 51 3.3 NN topology and hyper-parameters used in SAR ADC Design . . . 53 3.4 Metrics introduced in Table 3.2, the statistical parameters and the regression error by kNN, RF, SVR and NN modeling with different size of training dataset|D train | . . . . . . . . . . . . . . . . . . . . . 54 3.5 MOHSENN Result of The SAR ADC with Given u . . . . . . . . . 62 3.6 Design Parameters and the Search Space of the SAR ADC . . . . . 63 4.1 Statistic of performance metrics for AMS circuits . . . . . . . . . . 70 4.2 MAE loss of the test dataset for different FC-NN and CCI-NN sizes modeling 3S-NMCA . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 MAE loss of the test dataset for different FC-NN and CCI-NN sizes modeling CS-DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.1 NN topology and hyper-parameters used in SAR ADC Design . . . 83 5.2 COMPARISON TABLE WITH STATE-OF-THE-ART ADCS . . . 87 viii List of Figures 1.1 Design of custom AMS parameter synthesis with optimization . . . 3 2.1 Comparison between proposed PPS T-ADC to other reported T- ADC configuration [1–6] in 65nm CMOS for (a) area and sample- rate (b) single ADC sample-rate and effective time step . . . . . . . 10 2.2 Concept of the PPS TDC with R-C network. . . . . . . . . . . . . . 13 2.3 PPS TDC quantization transfer function curve (a) without and (b) with termination circuit . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 t lsb relation to R-C unit time constant and the number of units in the network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 (a) PPS TDC quantization transfer function curve (b) time step, and (c) time offset vs v t . . . . . . . . . . . . . . . . . . . . . . . . 17 2.6 Input resistance vs (a) PPS TDC quantization curve (b) t lsb , and (c) t offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.7 PPS TDC’s quantization curve vs (a) termination resistance, and (b) termination capacitance . . . . . . . . . . . . . . . . . . . . . . 19 2.8 (a) Maximum INL, and (b) t lsb vs different termination circuit ele- ment values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 ix 2.9 Simplified proposed time-based folding ADC architecture with PPS TDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.10 Proposed PPS TDC Implementation . . . . . . . . . . . . . . . . . 23 2.11 PPS section with metal layer as resistance . . . . . . . . . . . . . . 24 2.12 (a) Time folding/subtraction circuit and block diagram with (b) operation waveform, and (c) the corresponding transfer function . . 25 2.13 Chip Micrograph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.14 OutputspectrumwithnearNyquistinput(decimationfactorof257) for (a) TI-ADC with calibration, (b) TI-ADC without calibration, and (c) Single-ADC with calibration . . . . . . . . . . . . . . . . . 28 2.15 Measured dynamic performance of ADC . . . . . . . . . . . . . . . 29 3.1 ProposedMOHSENNflowchart, thepreparationandthehybridsearch 34 3.2 Graphical depiction of MLG . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Two types of interface modeling in module test benches . . . . . . . 36 3.4 Calculating the gradient of metrics when using (a) approximation, and (b) duplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 The amount of error when the partial gradient is (a) near zero, or (b) larger than zero . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 SAR ADC Top level . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.7 THCDAC architecture and the test bench . . . . . . . . . . . . . . 47 3.8 COMP’s architecture and the test bench . . . . . . . . . . . . . . . 48 3.9 SEQ2 architecture and the test bench . . . . . . . . . . . . . . . . . 48 3.10 SEQ1 architecture and the test bench . . . . . . . . . . . . . . . . . 49 3.11 SAR ADC MLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.12 cost vs optimization iteration in global search phase, and local search phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 x 3.13 Global search phase with Par-MC/Adam optimization (n mc =20) vs. SA optimizer (a) in number of iterations (b) required time . . . . . 57 3.14 Comparison between Par-MC and conventional sequential MC (a) The optimization result for 20 MC sample with identical initializa- tion (b) The required time for different n mc . . . . . . . . . . . . . . 59 3.15 The local optimization results of 6-bit, 500 MS/s ADC vs. num- ber of variables n v (a) cost function (b) number of iterations (c) optimization time (d) ADC’s FOM . . . . . . . . . . . . . . . . . . 60 3.16 The output spectrum of the MOHSENN designed SAR ADC when input is a single tone sine wave for (a) [10-bit, 200MS/s], (b) [8bit, 340MS/s] , and (c) [6-bit, 500MS/s] . . . . . . . . . . . . . . . . . . 61 4.1 Proposed CCI-NN regression model structure, and the sequential and direct path connections . . . . . . . . . . . . . . . . . . . . . . 66 4.2 Two-stage amplifier with Miller compensation (2S-MCA) . . . . . . 71 4.3 The regression error of CCI-NN vs FC-NN for 2S-MCA’s (a) Gain, (b) UGB, (c) PM, (d) Power, (e) negative output swing, and (f) positive output swing . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Three-stage amplifier with nested Miller compensation (3S-NMCA), (a) schematic, (b) top-level sub-circuit break-down and correspond- ing CCI-NN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.5 The regression error of CCI-NN vs FC-NN for 3S-NMCA’s (a) Gain, (b) UGB, (c) PM, (d) Power (e) negative output swing , and (f) positive output swing . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.6 8-bit current steering DAC (a) the schematic, (b) the top-level sub- circuit breakdown, and the CCI-NN structure . . . . . . . . . . . . 76 xi 4.7 The regression error of CCI-NN vs FC-NN for CS-DAC’s (a) SFDR, (b) full-scale output current, (c) power consumption . . . . . . . . . 77 5.1 Two-step SAR ADC with passive residue transfer . . . . . . . . . . 81 5.2 Global search phase with Par-MC/Adam optimization (n mc =20) vs. SA optimizer (a) in number of iterations (b) required time . . . . . 82 5.3 Chip micrograph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4 Measured output spectrum at 246 MHz input frequency . . . . . . . 85 5.5 SNDR and SFDR vs Frequency . . . . . . . . . . . . . . . . . . . . 86 xii Abstract Analogandmixed-signal(AMS)computer-aideddesign(CAD)toolsareofincreas- ing interest owing to demand for the wide range of AMS circuit specifications in the modern system on a chip and faster time to market requirement. This the- sis presents AMS automatic parameter synthesis methodologies enabled by neural networks and time-based circuit architectures. It explores two primary techniques to enhance the possibility of design through automation. The first is to use digital cells to perform AMS operation, explained in passive pulse shrinking time-based analog to digital converter (ADC). Using only digital cells and passive elements helps to use digital automation tools while achieving comparable performance to state-of-the-art. The second method is to replace slow SPICE evaluations with fast and accurate neural network (NN) models to accelerate the design process. More- over, a module linking graph (MLG) has been introduced, allowing to characterize circuits in the presence of the interfaces. To enhance the accuracy of the models, circuit connectivity inspired NN was proposed, which is a deep learning model built based on the signal flow graph. Finally, by enabling mentioned techniques, a two-stage SAR ADC in GF 14nm FinFet has been designed. The measurement results show competitive FOM with a significantly fast design procedure. xiii Chapter 1 Introduction 1.1 Thesis Organization This thesis is organized as follows: In Chapter 1, we first discuss the background of computer automatic design tools for circuit synthesis. Chapter 2 discusses a mostly digital ADC that uses novel passive pulse shrink- ing (PPS) time to digital converter (TDC) to resolve 4-6 bits while achieving above GS/s sample rate. We firstly discuss the differences between the PPS TDC and other TDC techniques. Then, we discuss the PPS TDC’s circuit parameters enabling design automation with predominantly digital circuits. Finally, the mea- surement results indicate 10 GS/s with 5-bit ENOB while achieving a competitive FOM of 86 fJ/c-step. Chapter 3 discusses the new methodology in CAD tools for AMS circuit genera- tion that can model the interfaces among modules. We firstly discuss the necessity of interface effects for proper circuit evaluation, and we introduce our method to capture these effects. Then we introduce a hybrid search technique mixed with SPICE and neural network (NN) to accelerate the search. Finally, we show that the new algorithm can successfully design known SAR ADC for different resolu- tions and speeds with fast and high accuracy. Chapter 4 shows a new type of NN regression that can be used to model large circuits. This type of NN is built based on how the sub-circuits are connected; therefore, it achieves higher accuracy compared to other modeling techniques. To 1 show its accuracy, we show three different case studies where the new NN model can achieve the same accuracy as the conventional type while using a seven times smaller dataset. Chapter 5 uses the methods discussed in Chapters 3 and 4 to design a two- step SAR ADC in 12nm GF. We first discuss the ADC architecture and the new techniques to achieve a high speed with minor modification. Next, we show the required graph to model the interfaces. Furthermore, we conclude this chapter by the measurement results achieved from chip implementation. Chapter 6 concludes this thesis by summarizing its contribution and discussing the future directions. 1.2 Background The future trends of electronic systems, such as the Internet of Things (IoT), 5G communications, autonomous vehicles, and artificial intelligence (AI) devices, demand high-performance analog and mixed-signal (AMS) intellectual properties (IP) [7]. Concurrently, the AMS IP design is becoming more expensive and time- consuming due to more sophisticated technology design rules and the increasing complexity of systems on a chip (SOCs). While sharing an IP among different applications can significantly reduce the design cost, this is an inefficient solution since the AMS circuit is application-specific and only performs efficiently for a particular task. Accordingly, the ideal solution is to optimize the same IP for different target specifications but with a significantly reduced design cost. Since AMS circuit design of a particular topology usually follows a similar procedure, the fixed-topology IP can be synthesized for different design specifications by a 2 User Desired Specifications Design Parameters Objective and Constraints Optimization Step Final Netlist p u Evaluation Step • SPICE-based • Equation-based • Regression-based power power power =f.c.v 2 power =f.c.v 2 Meeting Stopping Criteria? No Yes Figure 1.1: Design of custom AMS parameter synthesis with optimization computer-aided design (CAD) tool with reduced design cost and development time [8]. OnecrucialstepintheautomatedsynthesisofacomplexAMSsystemisparam- eter synthesis, the appropriate sizing of circuit elements. As discussed in [8–10], enhancing parameter synthesis performance can significantly improve layout gen- eration and final verification in the AMS design process. Usually, the netlist gen- eration of a complex AMS system is a two-step iterative process evaluation and optimizationaspresentedinFigure1.1. First, randominitialvaluesareassignedto the design parameters. Then, the evaluation step takes place in which the system- levelperformanceisestimated. Afterward, theoptimizationsteptakesplace, which decides the new design parameters for the next evaluation step based on the dif- ference between the estimated and the desired performance metrics. There are three main ways to perform the evaluation step: SPICE-based, equation-based, and regression-based [8] as illustrated in Figure 1.1. While it is accurate, performing the evaluation step of a complex AMS system in SPICE is time-consuming [11]. As a result, it is infeasible to synthesize the entire AMS circuit with SPICE in-loop and find the appropriate design param- eters that meet the specification. To speed up the evaluation step, the speci- fications and the constraints can be expressed as an analytical equation of the design parameters [12–14]. Consequently, the evaluation can occur much faster 3 than using a SPICE simulation. However, it is complicated to derive accurate equations between the design parameters and the specifications. Further, higher- order effects are often ignored to devise analytical relations, and simplifications are necessary, limiting the functions’ accuracy. To bypass manual equation derivation yet maintain evaluation speed, regression models are used instead of equations. In this case, the accuracy depends on the number of system-level simulation data used for training the regression model [15–25]. One promising example is the NN model, which has shown excellent performance when approximating any non-linear function (given sufficient training data points) [9,21,22]. However, An AMS sys- tem often consists of a large number of parameters. As a result, it requires a huge training data set for reasonable accuracy, incurring many time-consuming SPICE simulations [23]. Sinceevaluatinganentirecomplexnetlistisproblematic, animprovedapproach istobreakthecomplexAMSsystemdownintomultiplesimplersub-systems(mod- ules). This improves the accuracy and maintains fast evaluation [11]. Each module is then evaluated individually, and the top-level metrics can be estimated from the module-level metrics. Any complex AMS system is inherently designed in a modu- larfashionbyahumandesigner; therefore, modulebreakdownisreadilyachievable. Since a module can be evaluated faster than a system, the overall evaluation step is much quicker. Moreover, all modules can be evaluated in parallel, which can further enhance the speed. However, module breakdown poses some unique challenges. The complete system-level performance needs to be estimated correctly from the module-level metrics, which itself should be sufficiently accurate and efficiently computed [11]. 4 Moreover, the input-output conditions of the modules change when they are con- nected to construct the system. Known as interface problem indicating if neighbor- ing modules are not designed with equal interfaces, their performance estimation may become inaccurate [26]. Previous works have addressed this challenge by introducing a set of connectivity rules that help to preserve the module input- output conditions when designed separately [17]. However, these methods do not guarantee equality of interfaces and can suffer from inaccuracies. 1.3 Research Contributions To alleviate the above challenges, we proposed two different methodologies. In the first method, we propose a novel time to digital converter (TDC) passive pulse shrinking (PPS) TDC. The PPS TDC has fewer design parameters than the con- ventional medium resolution TDCs. Moreover, the design parameters are mostly passive element values and detachable from digital circuits. Therefore, the CAD tool can design the digital and passive sides independently without considering the interface effects leading to a simpler design process. In the second method, we propose a module linking graph (MLG) assisted hybrid parameter search engine with a neural network (MOHSENN) to alleviate the above challenges. We pro- pose the MLG to address the interface issue, which forces equality on the shared circuit elements at the interface. We propose the hybrid search to accelerate the design process and cover a wide design parameter range. In the first phase, the hybrid search exploits the adoption of NN regression models on the MLG in the global search, where it performs a fast and parallel gradient-based optimization on the design parameters. In the second phase, we perform a local search on the MLG using SPICE simulation to attenuate the modeling inaccuracy. This step is 5 further accelerated with the proposed gradient-based variable reduction technique that limits the number of selected design parameters for optimization. To increase the performance of the MOHSENN algorithm, we need to increase the accuracy of NN models. Therefore we introduced a new type of NN model called circuit connectivity NN (CCI-NN), capable of modeling modules with far more accuracy. To validate the effectiveness of the proposed approach, we apply MOHSENN to design a two-step SAR ADC in 14nm FinFet technology. The measurement results of this ADC show a competitive figure of merit compared to state-of-the-art ADCs, while the schematic was designed by the CAD tool. 6 Chapter 2 Passive Pulse Shrinking Time-based ADC 2.1 Introduction Digital synthesis has always been more straightforward and less complicated than analog synthesis since analog design involves more non-linearity and intercorre- lated parameters than digital. Therefore, we can encounter more commercially successful digital synthesis CAD tools than analog ones. In recent years, to employ digital synthesis CAD tools for analog domain design, the designers have replaced most traditional, sophisticated analog circuits with simpler digital ones. Examples such as Asynchronous SAR ADC [27–33] instead of the power-hungry conventional pipelined ADC or digital phased lock loops (DPLL) [34–36] instead of conventional area-hungry analog ones illustrate these substitutions. These mainly digital archi- tectures can use digital standard cells to significantly improve the implementation time, including circuit sizing and layout generation. In some cases, such as DPPL in [37], or ADC in [28], the entire system has been built by digital design flow and earned a competitive performance compared to their analog counterparts. One of the ongoing trends in using digital cells instead of analog is using time- based ADC (T-ADC). T-ADC uses a time to digital converter (TDC) to encode the information stored in phase, frequency, delay, or pulse-width. The TDC is often built by digital circuits; therefore, it can be designed automatically by digital 7 synthesisCADtools. However, onemusttranslatetherequiredanalogperformance metricssuchaslinearitytothedigitalCADtools. Unfortunately, digitalCADtools only understand uncomplicated metrics such as setup time and hold time or power consumption; therefore, the process of translation usually fails, and most works require another on-chip calibration scheme to correct the wrong predictions made by CAD tools. This chapter introduces a new type of TDC called passive pulse shrinking (PPS) TDC, which uses passive elements to reduce the pulse-width linearly. The passive element values are parameters and can model the non-linearity and other performance metrics. In other words, the sophisticated analog metrics are derived from the passive elements rather than the digital circuits. Therefore, digital CAD tools can design such TDC only satisfying simple metrics (timings and power) while the passive elements derive the sophisticated analog metrics. Finally, we show a prototype and the measurement results. 2.2 High Speed T-ADC 2.2.1 Speed-Resolution Trade-off It is well-known that increasing the sample rate yields less tracking time in a track and hold (T/H), decision time in comparators, or amplification time in pipelined stages, thus limiting the performance in all ADC topologies [38]. For the worse, T-ADC’s performance fundamentally declines when increasing the sample rate. In voltage controlled oscillator (VCO)-based T-ADC, this problem is well-studied when the best achievable number of bits is log 2 n pi f tune t clk , where t clk is the clock period, n pi is the number of phase interpolators, and f tune is the VCO’s tuning 8 range [1,39–41]. Increasing the sample rate reduces the period, and so the achiev- able number of bits decreases. To compensate, we can either augment f tune or n pi ; the first leads to more power consumption and is usually limited by tech- nology speed, and the second slows down f tune and increases mismatch related non-linearities which is difficult to calibrate [40]. Similarly, in the delay line based T-ADC, maximum achievable number of quantization bits is log 2 t clk −ttr t lsb , wheret tr is the tracking time of the T/H, and t lsb is the TDC’s quantization time step [3]. The problem is that the maximum input time frame that VTC can generate is less than t clk −t tr which decreases when sampling frequency increases. Therefore, to maintain the same dynamic range, t lsb should be reduced. When the required t lsb is less than the delay of a digital buffer t buffer , the TDC requires particular modification. The vernier delay line (VDL) [42], time amplification [43–46], delta-sigma modulation [47–50], stochastic design [51,52], and pulse shrinking [53–56] are presented to achieve fine time steps in TDCs. Amongtheseapproaches, delta-sigmamodulationandstochasticapproachareonly operational with small input bandwidth and cannot be used in Nyquist rate GS/s ADC. The time amplification approach helps to increase the time swing in the latter stages of the TDC so that the fine latter stages can be implemented simply by usual delay lines. Although theoretically effective, time amplification adds excessive noise, power consumption, and non-linearity, degrading the fast TDC’s performance. VDL and pulse shrinking approaches produce fine time resolution in the TDC. In VDL, the time step is the delay difference between two buffers, and in the pulse shrinking, the time step is the amount of pulse reduced in an imbalanced buffer. In both cases, considering no differential implementation, to achieve B-bits, 2 B cascaded digital buffers are required. These buffers generate a total delay of 2 B t buffer orders of magnitude higher than t clk . Therefore, for 9 (a) (b) Figure 2.1: Comparison between proposed PPS T-ADC to other reported T-ADC configuration [1–6] in 65nm CMOS for (a) area and sample-rate (b) single ADC sample-rate and effective time step correct synchronization, pipelining is required adding more complexity and power consumption [42–44]. 2.2.2 Related Works In this section we study some of the successful medium resolution T-ADC that can achieve high speed sample rates. [1] proposed an AC-coupled VCO ADC with 10 a sampling after the VCO’s quantization. Although achieving significant per- formance, the ADC was only operating at a sub-GS/s sample rate since it had inherently limited bandwidth caused by the VCO. Putting a T/H in front of the VCO, [57] increases the ADC’s bandwidth; however, single channel ADCs were below 1GS/s sample rate despite using advanced CMOS node. [2] increased the number of parallel VCOs to generate more phase combinations simultaneously, so pushed the speed-resolution dilemma in the T-ADC to obtain 2GS/s. Unfortu- nately, the method requires extra calibration and bit-error correction circuits that may take more area and power. Reference [3] showed a two-way interleaved T-ADC realized by VTC VDL configuration. This topology avoided the pipelining, so despite the low resolution, it achieved a sub-GS/s sample rate. [4] improved the sample rate by pipelining, and to avoid the smallt lsb , a time residue amplifier was exploited between the two stages. Therefore, despite the small input time frame, t lsb could be realized with conventional gate delays so that a single ADC could obtain up to 2.5GS/s sample rate. However, time residue amplifiers are either noisy or power/area-consuming, which leads to ADC performance deterioration. [5] proposed a two-step topology. However, insteadofaresiduetimeamplifier, itusedtwodifferentVTCsforthefirst and the second stage and used a voltage-domain residue generation. This method improved the T-ADC’s FOM at the 1GS/s sample rate. [6] proposed a two-step ADC, the first stage TDC is a traditional delay line TDC and the second stage uses cascaded PIs for finer time quantization. This method helped to generate the residue in the time domain and to avoid time amplification. As a result, the single ADC achieves 2.5GS/s with no significant power/area overhead. With the concept of proposed PPS TDC, in this paper, we can achieve single channel sample rate of 5GS/s, while achieving ENOB above 5-bit. In Fig. 2.1 we 11 compare the proposed architecture with its sister T-ADCs [1–6] implemented in the same 65nm CMOS technology. Generally, fast sample-rate in T-ADC occu- pies large area, while the PPS T-ADC occupies area as small as low speed ones. Moreover, the PPS T-ADC achieves a competitive effective time step of (≈6pS), while achieving fastest single channel ADC among mentioned T-ADC configura- tion. Both of these attributes helps the PPS T-ADC to achieve high sampler rate with small area in low-resolution regime. 2.3 Proposed Passive Pulse Shrinking TDC The concept of pulse shrinking TDC is to store time information within the pulse- width of one digital signal, rather than a delay between two signals in the delay line method. Then the pulse is sent through unbalanced buffers, which shrink the pulse width until it vanishes—the number of edge detectors that have seen the pulse assigns the final digital code. Instead of buffers, in this paper, we use a passive R-C network to linearly shrink the pulse leading to less area and much faster quantization. In this section, we further investigate the PPS key parameters by analyzing pulse response in the R-C network. 2.3.1 PPS TDC with R-C network WhenwesendapulsefromanidealvoltagesourceintotheR-CnetworkofFig.2.2, the pulse moves through the network, and it eventually shrinks until its amplitude is less than a threshold value v t making the pulse unrecognizable by edge detec- tors (ED). To analyze the behavior of this network, we can use Elmore delay to approximate the step response only to the first order. In the i th node, the step response is approximated by e − t rc(iN− i 2 ) , where r, c, and N are unit resistance 12 Figure 2.2: Concept of the PPS TDC with R-C network. (a) (b) Figure 2.3: PPS TDC quantization transfer function curve (a) without and (b) with termination circuit and capacitance and the number of R-C units in this network. Let us assume the pulse-width of the entering pulse ispw in att 0 = 0S, thus at thei th node, the signal is v amp e − t τ i t<pw in v amp e − pw in τ i × (1−e − t−pw in τ i ) t>pw in , (2.1) 13 where τ i =rc(iN− i 2 ), and v amp is the pulse amplitude, which is usually equal to the supply voltage. Therefore, we can have 1-bit quantization with an edge detector; if v amp e − pw in τ i >v t , the output is one, and otherwise, the output is zero. So the threshold pulse-width to be detected at the i th node is pw t (i) =τ i ln( v amp v t ). (2.2) We can find quantization time step as the time difference between pulse-width threshold of the (i + 1) th and i th that is expressed as t lsb (i) =pw t (i + 1)−pw t (i) =rc(N−i)ln( v amp v t ). (2.3) Eq (2.3) indicates that t lsb is a function of node number i leading to a non-linear quantization transfer curve shown in Fig. 2.3(a). For a linear curve, t lsb should be irrelevant to i; hence, the R-C network requires modifications. 2.3.2 R-C network Linearization From (2.3), we observe that the time step eventually declines until it is zero. The root cause can be when the pulse entering the network loads different impedance compared to the ending nodes. For instance, assume the size of the network is 2N, and the quantization is only the firstN levels. In this case, according to (2.3), the time step difference t lsb (1)−t lsb (N) t lsb (1) increases to 50%. Similarly, if we assume the size of the network is infinite, the time step difference inclines toward zero. Instead of having a large R-C network, we can append the R-C lumped model of the network 14 after theN th node. If we include the termination capacitancec T in (2.3), the new time step will be t lsb (i) =r (c(N−i) +c T )ln( v amp v t ), (2.4) where it suggests the linearization is only possible when c T N×c. However, (2.3) is obtained with Elmore delay modeling which is the abstraction of the R-C network to only the first order. If we completely model the network as an (N +1) th order filter, we can observe, with much less c T values, the quantization curve can be linearized. An example of Fig. 2.3(b) shows the case for N = 31 and c T = 31c which is a linearized curve. The complete PPS TDC contains the R-C network and the termination circuit including the capacitance c T and the resistance r T . One buffer drives the whole network modeled by voltage source with input resistance of r I . Every node is connected to an edge detector with the threshold value of v t . The R-C network’s unit resistance r is obtained through metal routing, while the unit capacitance c is the parasitics and input capacitance of the edge detectors. Every one of these variables can change the behavior of the PPS TDC. Since deriving the equations of (N + 1) th order filter is sophisticated, we use numerical LTI system modeling to understand the PPS mechanism in the R-C network. In the following section, we sweep each variable and simulate the LTI model to understand their effects on the PPS TDC’s quantization curve. Effects of r and c From Elmore delay, we can prove thatt lsb ∝r×c×N in the R-C network. Fig. 2.4 shows the dependency between the R-C network’s unit time constant r×c and N. The value of elements in the termination circuits are r T = Nr and c T = Nc. The input resistance is r I = Nr, and vt vamp = 0.5. With the curve fitting we find 15 Figure 2.4: t lsb relation to R-C unit time constant and the number of units in the network the quantization time step is t lsb ≈ 1.6rcN. Fig. 2.4 also shows the possibility of reaching t lsb below 1ps with PPS TDC requiring r < 4Ω and c < 2fF. As mentioned before, with metal resistance and edge detectors’ input capacitance these values can be implemented. Effects of v t It is imperative to study the effect of the edge detector’s threshold voltage on the PPS TDC’s quantization curve. The threshold voltage sets the boundary of pulse existence and nonexistence; therefore, it is expected that the TDC’s curve to be significantly related to it. To examine this effect, we varied vt vamp on the PPS TDC’s transfer curve and gathered the results in Fig. 2.5. In this examination, we used N = 32, v amp = 1V, rc = 6fs and the other parameters are nominal. We can observe in Fig. 2.5(a) that changing vt vamp , alters gain and offset of the quantiza- tion curve the results of which are shown in Fig. 2.5(b) and 2.5(c), respectively. Fig. 2.5(b) indicates an approximately linear relationship between vt vamp and t lsb . That is±5% variation on vt vamp around 0.5 causes 15% variation on t lsb , meaning 16 (a) (b) (c) Figure 2.5: (a) PPS TDC quantization transfer function curve (b) time step, and (c) time offset vs v t the gain of the PPS TDC is linearly dependent to the edge detector’s threshold voltage. Additionally, Fig.2.5(c)showstheoffsetofthequantizationtransfercurve t offset is inversely proportional to vt vamp . Effects of r I In CMOS implementation of the PPS TDC, a buffer drives the whole R-C net- work, and r I represents the driver’s strength. The lesser the r I , the more power consuming the buffer becomes. Fig. 2.6(a) shows the effect of varying r I on the PPS TDC’s quantization transfer curve, while Fig. 2.6(b) and 2.6(c) show the 17 (a) (b) (c) Figure 2.6: Input resistance vs (a) PPS TDC quantization curve (b) t lsb , and (c) t offset effects on the time step and time offset, respectively. In Fig. 2.3(b) we can see for r I >N×r, t lsb becomes independent of r I . However, the value of r I significantly alters the quantization time offsett offset as shown in Fig. 2.6(c). This aspect ofr I helps to tune the offset of the PPS TDC which is exploited for the calibration. Tuning r T and c T As mentioned in Section 2.3.2, the termination circuit’s role is to linearize the PPS TDC. Therefore, it is expected to see changes in the linearity when sweeping 18 (a) (b) Figure 2.7: PPS TDC’s quantization curve vs (a) termination resistance, and (b) termination capacitance the termination circuit’s parameters. To measure the non-linearity, we used the conventionalmaximumINLterm. Ifitispositive, thequantizationcurveisconvex, and if it is negative, the curve is concave, and if it is zero, the curve becomes linear. Fig. 2.7(a) and 2.7(b) show the TDC’s transfer curve’s change vs. the termination resistance and capacitance, respectively. These figures indicate that 19 (a) (b) Figure 2.8: (a) Maximum INL, and (b)t lsb vs different termination circuit element values the termination circuit has a notable impact on the linearity and the TDC’s gain while not changing the time offset. The maximum INL and t lsb associated with the curves of Fig. 2.7 are shown in Fig. 2.8(a) and 2.8(b), respectively. A linear curve is possible if the values of c T are larger than Nc, that is, for every value of c T , one value of r T exists that can linearize the curve. In this example, r T = Nr and c T = Nc gives a linear 20 quantization transfer curve. Another observation is related to the sensitivity of the maximum INL and t lsb to the values of r T and c T . If we pick larger r T , thus largerc T for linearization, a less sensitive curve to these elements is expected. This aspect can help to design a robust linear PPS TDC. However, if in a design gain of the PPS TDC should be controllable, we should pick smaller sized r T and c T . With the acquired information for the PPS TDC, one can design such architecture. 2.3.3 PPS TDC Design Process We can interpret the effects as follows: v t varies the offset and gain of the TDC while not changing the linearity. r I changes only the offset of the TDC. Finally,r T and c T change the TDC’s linearity and gain while not affecting the offset. Proper design of the PPS TDC requires decent control over all these parameters. Proper fore-groundcalibrationtechniquestotunetheseparameters, suchasusingR-DACs for r I and r T , CDACs for c T , and differential edge detectors with a tunable refer- ence, can help to calibrate the PPS TDC. One beneficial aspect of the parameters’ effects is that they are separable, making the calibration process easier. As we know, r I can be used only to change the offset while r T changes t lsb or the gain of the TDC. 2.4 Circuit Implementation Fig. 2.9 shows the overall block diagram of the proposed time-based ADC architec- ture, composed of two-way time-interleaved ADC channels. An external 10 GHz clock, sent to an on-chip clock divider, generates two out-of-phase sampling clocks (CLK i1 and CLK i2 ). In each ADC, the differential input is first sampled and con- verted into differential pulse signals (V TP , and V TN ) via a VTC generator. Their 21 Figure 2.9: Simplified proposed time-based folding ADC architecture with PPS TDC delay difference is proportional to the sampled voltage value. The signal is then split into two paths: 1) a coarse TDC, where a 2-bit VDL is used to quantize the time and determine the first two MSBs; 2) the time-folding and subtraction circuit, where the maximum pulse duration is reduced and converted into a single-ended residue signal (V R ). The transition boundaries of the four residue time regions are determined by the coarse TDC output, i.e., folded by MSB bit and subtracted by (MSB-1) bit. Finally, the V R pulse width is quantized via PPS cells, where a 40- bit thermometer code is generated. Along with the coarse TDC output, they are retimed by an asynchronous clock’s rising edge generated from the folding circuitry (CLK O ) for synchronization purposes and sent to the multiplexer and decoder. 22 Figure 2.10: Proposed PPS TDC Implementation Fig. 2.9 also includes the S/H and VTC architecture. The S/H is a top-plate sampling scheme with a sampling capacitance of 14fF. After input voltage sam- pling, the voltage increases via a current source and soon crosses the VTC thresh- old, generating a pulse signal. Thus, the higher the sampled input signal, the shorter the pulse duration. The circuit generates a pulse with the duration of the crossing instant and clock’s rising edge. Fig. 2.10 shows the proposed PPS TDC implementation, in which the edge detectors (EDs) are realized with dynamic resettable D-Flip-Flops. If the clock portofaresettableD-Flip-FlopisattachedtotheinternalnodeoftheR-Cnetwork, the port can detect the pulse’s presence whenever its voltage passes the detection threshold. The EDs are reset for every ADC sample for proper bit cache operation. Fig. 2.10 shows the ED circuit, where the NMOS transistor detects the rising edge 23 Figure 2.11: PPS section with metal layer as resistance ofpulsesintheR-Cnetwork. Tomitigatethememoryeffect, aclocksignal(CLK O ) is used to remove the remaining charges in the R-C network after the PPS cells finish the time quantization. The same clock signal stores the ED’s thermometer code in the retimer circuit and then, the ED’s are cleaned with a reset signal (RST O ). The corresponding timing diagram is shown in Fig. 2.10. Finally, to reduce offsets, a resistive ladder network is used to help average out the offset threshold voltages from adjacent EDs and parasitic mismatches in routing. In this prototype, we make a tunable termination resistor (r T ) and vary input driver strength (r I ) to study the effects on PPS linearity, resolution steps and TDC’s offset. The R-C network is simply implemented with a 0.1μm wide metal trace, as depicted in Fig. 2.11, and thusr is the metal resistance, whilec is mostly dominated by ED input capacitance. Fig. 2.12(a) shows the time-domain folding and subtraction block diagram and circuit. The input differential signal is first folded into a single-ended pulse at node V F . Next, we subtract T REF from the signal to generate a residue pulse at node V R . The pulse width at node V F is equal to |PW(V TN )-PW(V TP )|+T OS , where PW(.) is the signal’s pulse width at a given node and T OS is a positive offset 24 (a) (b) (c) Figure 2.12: (a) Time folding/subtraction circuit and block diagram with (b) operation waveform, and (c) the corresponding transfer function 25 value. The folding circuit first generates two pulses with durations of PW(V TN )- PW(V TP )+T OS andPW(V TP )-PW(V TN )+T OS ; itthenselectsthelonger-duration pulse. The pulses are generated by phase detectors (PD) with an offset time of T OS . Next, the time-folded pulse passes through 3-stage buffers for subtraction. Fig. 2.12(a) shows both the PD and the subtraction buffer circuit. If the (MSB-1) bit is 1, the pull-down path of the 2nd-stage inverter is strengthened, leading to a higher overall threshold and thus reduced pulse width. The first two MSB bits’ coding scheme and with the corresponding folding/subtraction circuit’s waveform and transfer curve are shown in Fig. 2.12(b) and 2.12(c), respectively. In this circuit, both analog and corresponding digital paths should be synchronized, which the delay stages prior to the folding circuit ensure. 2.5 Measurement Results The prototype of PPS T-ADC is fabricated in 65nm CMOS technology with an active area of 0.015mm 2 (Fig. 2.13). An off-chip Sine-wave LUT-based linearity calibration on ADC [58] (at 43MHz) and on-chip inter-channel time-skew calibra- tion on the ADCs are performed to reduce distortion. Fig. 2.14 shows the ADC output spectrum for 5.001GHz input for TI, TI without calibration, and single ADC. With calibration, SNDR improves significantly from 24.4dB to 32.5dB for TI ADC, while single ADC can achieve 34.0dB. The Walden FOM achieves 86fJ/c- step at Nyquist input while sampling at 10GS/s. Fig. 2.15 shows the SNR, SNDR, and SFDR over different input frequencies for 1X and 2X time-interleaved ADC. At low input frequency (43MHz), the TI ADC achieves 33.3dB SNDR and 47.6dB SFDR, which are degraded to 30.4dB and 40.7dB (respectively) at 7.0GHz input frequency. The ADC core consumes 29.7mW from 1V supply, while 26.9mW is 26 Figure 2.13: Chip Micrograph consumed by ADCs; the rest is consumed by the clock, decimator, and decoder. TABLE 2.1 shows a comparison with the state-of-the-art T-ADCs. The ADC pro- totype shows comparable FOM and area, but higher sample rate for single ADC, allowing a potential for achieving even higher speed with a modest number of time-interleaved channels. 27 (a) (b) (c) Figure 2.14: Output spectrum with near Nyquist input (decimation factor of 257) for (a) TI-ADC with calibration, (b) TI-ADC without calibration, and (c) Single-ADC with calibration 28 Figure 2.15: Measured dynamic performance of ADC Table 2.1: COMPARISON TABLE WITH STATE-OF-THE-ART ADCS Specification This Work [6] JSSC20 [4] JSSC16 [2] JSSC18 [57] JSSC19 Architecture 2X-TI PPS 4X-TI PI- based 4X-TI VCO RNS 8X-TI VCO Technology 65nm 65nm 65nm 65nm 28nm Fs (GS/s) 10 10 10 2 5 Resolution 7.3 8 6 7.93 8 SNDR @Nyquist [dB] 32.5 40.1 27.2 40.7 45.2 SFDR @Nyquist [dB] 40.7 52.8 42.1 48.4 57.1 Power [mW] 29.7 50.8 98 21 22.7 FOM w [fJ/c.step] 86 61.5 504 119 30.5 Active Area [mm 2 ] 0.015 0.095 0.073 0.08 0.023 29 Chapter 3 Analog Mixed-Signal Parameter Search Engine 3.1 Introduction In this chapter, we propose a module-linking-graph assisted hybrid parameter search engine with a neural network (MOHSENN) to alleviate the above chal- lenges. We propose a module-linking-graph (MLG) to address the interface issue, which forces equality on the shared circuit elements at the interface. To acceler- ate the design process and covering a wide design parameter range, we propose the hybrid search. In the first phase, the hybrid search exploits the adoption of NN regression models on the MLG in the global search, where it performs a fast and parallel gradient-based optimization on the design parameters. In the second phase, to attenuate the modeling inaccuracy, we perform a local search on the MLG using SPICE simulation. This step is further accelerated with the proposed gradient-based variable reduction technique that limits the number of selected design parameters for optimization. To prove the concept, we use MOHSENN for a successive approximation register analog-to-digital converter (SAR ADC) design in 65nm CMOS technology, which achieves 5X and 700X faster search speed com- pared to the conventional hierarchical and flat approach, respectively, with rel- atively better ADC performance. Moreover, to demonstrate that MOHSENN is 30 capable of realizing custom AMS IP, it is used to design three custom SAR ADCs of a fixed-topology with diverse performance specifications. 3.2 Proposed MLG Assisted Hybrid Search Engine With NN In the following subsections, we will formulate the hierarchical design problem and then introduce the flow of MOHSENN and the enabling techniques. 3.2.1 Problem Statement The objective of the AMS circuit sizing problem is to find the design parameter vector p of size n p within a finite parameter space P to minimize and satisfy the system-level constraints set by the user desired specification vector u. Circuit design parameters are transistor geometries, biases, or any other variables that can affect the circuit performance. User desired specifications u are high-level and simple measures of the AMS IP’s performance understandable to the non-expert users. For example, the number of bits or the bandwidth of an ADC can be u. In hierarchical design, an AMS circuit is divided into multiple smaller circuits, which are referred to as modules in this chapter. For each module, a parameter- to-modules’ metric function (P2M) is defined which maps the module’s design parameters to its metrics. For example, metric can be the gain of an amplifier or the delay of a D-flip-flop. For an AMS circuit consisting of N modules, we can express the P2M function of the i th module as m i =f i (p i ), (3.1) 31 where p i is the parameter vector, m i is the metric vector and f i (.) is the P2M function. We assume that the P2M function can be accurately characterized using SPICE simulation and foundry-provided device models. In a model-based method, a regression model is approximating the P2M function and is denoted by ˆ f i that outputs the estimated modules’ metrics ˆ m i . To design the entire AMS circuit, equations or behavioral models which map the modules’ metrics to the system-level design objectives or constraints [11,59] are necessary. With the system-level objective function and constraint functions denoted byobj() andh(), respectively, the search problem can be expressed as [22] Given u, (3.2) arg min p∈P obj(m,p B ), s.t. : h j (m,u,p B )≥ 0, j∈{1, 2,··· ,n h }, m = [m 1 ,m 2 ,··· ,m N ], p = [p 1 ,p 2 ,··· ,p N ,p B ], m i =f i (p i ), i∈{1, 2,··· ,N}, where n h is the number of constraints. In (3.2), m is derived from concatenating everymodules’metrics,andp B isthesystem-levelparametervectorwhichisabsent in any module but would affect the final performance. For example, p B can be the number of modules required to drive a clock routing. In (3.2), f generates the modules’ metrics m through circuit-level simulations and both obj and h are system-level functions of m. However, as opposed to objective functionobj, the constraint functionsh also require information from the user desired specification to set the bounds in the search problem. As an example, obj can be the power or area of an AMS circuit, which should be minimized via 32 optimization. On the other hand, an example of h is the signal-to-noise-ratio (SNR) of an ADC which should be larger than 6×n bit + 1.76. In other words, SNR− 6×n bit − 1.76≥ 0, where u = [n bit ] is the user desired number of bits. A concrete example of constraint and objective function setup for optimizing a SAR ADC design will be provided in Section 3.4.1. Conventionally, two different levels of optimization are utilized to solve (3.2). The first step is the system-level, where the constraints on h j s are satisfied by findingmodules’metrics. Thesecondstepisthemodule-level, whereeachmodule’s parameters are chosen individually to satisfy the found metrics at the system- level. Designing modules separately exacerbates the interface problem since it ignores the interrelations between modules. In MOHSENN, we use the proposed MLGtodesignthedependentmodulesjointly. Thiscouldsignificantlyincreasethe conventional optimization time; therefore, we propose a hybrid search to accelerate the design process. 3.2.2 MOHSENN Flow Overview Figure 3.1 illustrates the MOHSENN design flow, which requires a one-time offline preparation of the MLG and the hybrid search. In the preparation phase, after the module break-down, we identify the nec- essary interfaces among modules that should be modeled in their P2M charac- terization functions. Subsequently, we propose to construct the MLG (g(.)) and incorporate the interfaces as the graph’s vertices with a method that will be dis- cussed in Section 3.3.1. The MLG accommodates a platform for obtaining the module metrics in the presence of interfaces from adjacent modules. Note that 33 Step2: Hybrid Search Global Search: Proposed Par-MC + Adam Optimizer Local Search: Powell Optimizer Step1: Preparation Input: User Desired Specifications u AMS Topology p=? w 1 =?, w 2 =?, l 1 =?, ... p: AMS topology Ex. Gain Gain DAC DAC Gain DAC All design parameters Output: Final Netlist p=p opt w 1 =1 μm,w 2 =5 μm l 1 =45nm, ... p opt : AMS topology Ex. Choose one candidate C: Candidates set C: Candidates set Meeting Stopping Criteria? Meeting Stopping Criteria? h(), obj() h(), obj() . . . . . . f 1 f 2 f N Fewer design parameters No Yes Yes No . . . . . . f 1 f 2 f N ... ... ✓ ✓ ✓ p: n v Proposed gradient-based variable reduction ✓ ✓ ✓ ✓ ✓ ... p: n p Meeting Stopping Criteria? Meeting Stopping Criteria? Gain Gain DAC DAC Gain DAC MLG with NN model MLG with SPICE model Breakdown to modules and identify The interfaces = Gain Gain DAC DAC = = = Gain DAC = = Breakdown to modules and identify The interfaces = Gain DAC = = Proposed MLG construction (g()) p m f 1 f 2 f 3 f 4 f 1 f 2 f 3 f 4 p m f 1 f 2 f 3 f 4 Proposed MLG construction (g()) p m f 1 f 2 f 3 f 4 Breakdown to modules and identify The interfaces = Gain DAC = = Proposed MLG construction (g()) p m f 1 f 2 f 3 f 4 Figure 3.1: Proposed MOHSENN flowchart, the preparation and the hybrid search the MLG’s output (modules’ metrics) can be obtained either from fast NN mod- els or more accurate SPICE simulations. This is crucial for the fast yet accurate exploration of the parameter space using the hybrid search. The proposed hybrid search algorithm consists of a global and a local search phase. The global search phase utilizes NN models for evaluation, with the pro- posed parallel Monte-Carlo (Par-MC) to select multiple random initial points to 34 p,u . . . m ꓦ m . . . Desired Specs, Parameters Module’s metrics Figure 3.2: Graphical depiction of MLG search over a broad region. This phase outputs a set of potential parameter can- didates, one of which is selected for the local search. The proposed gradient-based variable reduction technique reduces the number of design parameters for the local search phase, resulting in a faster search result. This phase utilizes SPICE for evaluation to reduce the errors introduced by the regression models. The final output of the hybrid search is the design parameters for the given AMS system topology, which satisfies the user desired specification. 3.3 MOHSENN Preparation and Hybrid Search In the following subsections, we will describe several enabling techniques in the proposed MOHSENN flow. For simplification, we denote constraints by h = [h 1 ,h 2 ,...,h n h ], the SPICE model P2M functions by f = [f 1 ,f 2 ,...,f N ], and for the NN models we use ^ f = [ ˆ f 1 , ˆ f 2 ,..., ˆ f N ]. 3.3.1 MLG Construction If the modules are designed individually without considering the interface present intheAMSsystem,modules’metricsestimatedduringthedesignmaydeviatefrom actual values, causing interface problems. In this thesis, we assume the interface is 35 Interface Approximation Interface Replication AMS Netlist l th Module (driving) ... ... ... k th Module (loading) ... ... m l m k p k p l z in ... ... ... z in z in m l m k p k p l ... ... ... w in ... ... ... ... ... ... f l f k f l f k Interface Modeling in the l th Module Test Bench MLG Construction w in f apr f apr f rep f rep Figure 3.3: Two types of interface modeling in module test benches aninter-modulevariablethatcansimultaneouslyaffectallthephysicallyconnected modules. For example, this assumption includes the loading effect, where the load impedance of the driving module and the input impedance of the loading modules are the same variable and should be equated during the design stage. Among the prior hierarchical design methodologies, contract-based design [17, 60] is one of the most successful methods that specifically address the interface problem. This method includes a set of rules named contracts, which determine the connectivity and the relation among modules in the system-level design. By using these contracts, the CAD tool shrinks the feasible range for the interfaces. Hence, the derived interfaces during module design are more accurately matched compared to conventional methods. The contract method illustrated in these two 36 works requires SVM classifiers and two-step hierarchical optimization. However, the contract-based method can use NN regression models, therefore, gaining both speed and accuracy while satisfying the equality of interfaces requirement. The concept of MLG is a graphical representation of the contract-based method for the specific case of AMS circuit design when using NN models. We use an MLG in this work, which enforces full equality of the interfaces among adjacent modules. The MLG, as shown in Figure 3.2, can be represented as adirectedgraphthatconsistsofthedesignparametersp,userdesiredspecifications u, P2M functions (f or ^ f) and modules’ metrics (m or ˆ m) as the vertices. The functional representation of MLG is given by [m∨ ˆ m] =g p,u, [f∨ ^ f] . In other words, MLG outputs the modules’ metrics when the design parameters and desired specifications are provided. An interface between modules is modeled as a vertex in the graph, which either sources edges to or sinks edges from multiple f or ^ f vertices as shown in Figure 3.2. To construct the MLG, we should first depict each module’s sub-graph repre- sentation, then connect those graphs with the interface vertices. The graphical representation of the i th module sources only from its corresponding parameters (p i ) into the P2M function (f i ), then sinks into its metrics vertices (m i ). After constructing the module sub-graphs, we add the interfaces’ nodes and connect them according to the interface characterization type. MLG can be used with both conventional types of interface characterization (approximation and replication), as illustrated in the example of Figure 3.3. When interface approximation is used, the input impedance of the loading module should be characterized as its metric. Simultaneously, the load impedance of the driving module should be characterized as its design parameter. Since they are the same variable in the AMS system, they are represented as a single vertex (i.e., z in in 37 (a) (b) Figure 3.4: Calculating the gradient of metrics when using (a) approximation, and (b) duplication Figure 3.3), which sources an edge to the driving module and sinks an edge from theloadingmodule. Ontheotherhand, wheninterfacereplicationisused, boththe driving and the loading modules should be characterized with the input transistor of the loading module, making it a design parameter for both. Then, a single vertex in the MLG would represent this interface variable, sourcing two edges to the modules (i.e., w in in Figure 3.3). Interface replication is preferable over approximation for MLG. When approx- imation is employed, the evaluation of an adjacent module’s P2M functions inside the MLG must be serialized. In the previous example, the loading module’s input impedance (z in ) should be derived first before evaluating the driving module. In the case of replication, the evaluation can be processed in parallel as there is no such dependency, resulting in accelerated metric estimation by parallel process- ing. Interface approximation also causes multiple layers in the MLG, which makes gradient estimation more difficult. Illustrated in Fig. 3.4, in the same example, estimating the gradient of the driving module’s metrics with respect to the loading module’s parameters ( ∂ˆ m l ∂p k ) requires back-propagation through two P2M functions 38 ( ∂ˆ m l ∂ ˆ z in and ∂ˆ z in ∂p k ) according to the chain rule. On the other hand, the same gra- dient ( ∂ˆ m l ∂w in ) requires back-propagation through a single P2M function in case of replication. Another drawback of interface approximation arises in systems where the modules cyclically affect each other, such as in feedback-based circuits (i.e., delta-sigma modulators or PLL). In such circuits, a loop may be created inside MLG, which slows down the inference and gradient estimation. Besides ensuring equality of the interface, MLG is also recyclable. This means, the same MLG can be used to design an AMS system across different corners and CMOS technologies, provided its topology is fixed. The only parts of MLG that are a function of technology and corner cases are the P2M functions. This implies that incorporating the new technology or corner (such as typical, slow, or fast) in the test bench to generate the P2M model is sufficient, requiring no MLG alteration. When the topology of a specific module changes a slight MLG alteration is required. For example, when replacing a Cascode amplifier module with a folded Cascode amplifier [61], the interface models do not vary, the MLG remains almost unchanged and only the design parameters vertices of the new module need to be replaced. This feature of MLG helps accelerate the design flow using MOHSENN across different technologies, corners, and module topologies. 39 3.3.2 Hybrid Search Unlike hierarchical design, MOHSENN uses monolithic optimization on both system-level and module-level to solve (3.2). This is obtained by the composi- tion of MLG g() and the system-level objective function obj and constraints h. Then we can formulate the MOHSENN’s search problem as Given u, (3.3) arg min p∈P obj g(p,u, [f∨ ˆ f]),p B , s.t. : h j g(p,u, [f∨ ˆ f]),u,p B ≥ 0, j∈{1, 2,··· ,n h }. To solve (3.3), we propose to use NN model for global optimization (global search), followed by a local optimization with SPICE model (local search). The NN model evaluation time is usually less than milliseconds, while SPICE model is of the order of seconds to hours. Further, global optimization spends an order of magnitude larger number of function evaluations for proper exploration compared to the local one. The example of Section 3.4 complies with this statement, which shows almost 100,000 AMS circuit evaluations for global optimization, compared to less than 1,000 for local ones. Therefore, by sacrificing the accuracy and using NN models instead of SPICE simulations in global optimization, we can avoid several days of convergence, reducing it to minutes. This feature can help us to vary the optimizer’s hyper-parameters to achieve better results. The global optimization submits a set of design parameter candidates (denoted byC) for the user to choose one as the initial point of the local search. Precision is crucial in the local search; hence, we use precise SPICE model. Due to NN model utilization in the global search, its results may not be even sub-optimal or satisfactory. The local optimization helps to improve the results by moving the candidate toward the 40 nearest optimal point. This combination of the two optimizations helps to search over a wide range and find appropriate results within a short time. By transforming the constraint satisfactory problem of (3.3) into optimization problem we can use the hybrid search. In hybrid search, instead of design parame- ters p as variables we use a mapped version of design parameters denoted by x to remove the variable bounds. This step is necessary for the type of the optimizers used in MOHSENN. p has bounds in its domain P, i.e., p min ≤ p≤ p max . So, instead of p we use mapped parameter x transformed by the saturation function defined by sat :R np −→P, (3.4) sat(x) = 1 1 +e −x × (p max −p min ) +p min . Clearly, we can derive the design parameters at any iteration from the mapped variables by p =sat −1 (x). By constructing MOHSENN’s cost function cost(.) we can formulate (3.3) as optimization problem expressed by opt : arg min x∈R np cost x, [f∨ ˆ f] =w obj ×obj g sat(x),u, [f∨ ˆ f] ,p B (3.5) + n h X j=1 elu −w h j ×h j g sat(x),u, [f∨ ˆ f] ,u,p B ! , where w h ≥ 0 and w obj ≥ 0 are the positive weights vector for normalization and elu() is the non-linear exponential linear unit (ELU) defined by elu(x) = e x − 1 x≤ 0 x 0≤x . (3.6) 41 The ELU function imposes a high gradient when the specification is not met and exponentially reduces it if otherwise. Owing to its exponential part, ELU also helps optimize the specifications further, even after they are satisfied. Global Search Phase Intheglobalsearchphase, MOHSENNusesaGradient-BasedOptimizer(GBO)on theNNregressionmodelsincombinationwiththeproposedPar-MCschemetofind multipledesignparametercandidates. Comparedtoconventionalglobaloptimizers such as simulated annealing or genetic algorithm, GBOs require less optimization iterations to converge. However, they are more prone to getting stuck in local minimums if the problem is non-convex. In this work, we utilize Adam optimizer in the global search phase, which has a higher probability of hopping over the local minimums compared to the conventional GBOs such as Gradient Descent optimizer [62], leading to a better overall performance. We employ two hyper- parameters to terminate the search process: the maximum number of iterations (k max ) and the tolerance (ftol). The optimizer stops when the number of iteration crosses k max or the cost function reaches a value smaller than ftol. Par-MC performs considerably faster than conventional sequential MC schemes if combined with machine learning tools such as TensorFlow [63]. The conventional sequential MC technique picks uncorrelated random initial point for the optimizer in each MC experiment for n mc times. Unfortunately, this approach requires full convergence of each optimization forn mc experiences (each time with different ini- tialization), which slows down the search engine. Par-MC, however, optimizes only once by paralleling the experiments, which significantly increases the search speed. In this technique, the variables from different MC experiments are concatenated, 42 and the optimization is performed on the aggregation of all the resulting cost func- tions. We construct the MC’s concatenated variable matrix denoted by X with the size of [n p ,n mc ], and we formulate the Par-MC optimization as opt : arg min X∈R nmc×np cost mc (X, ˆ f) = nmc X i=1 cost(X i,: , ˆ f), (3.7) where cost() is the cost function from (3.5), and X i,: is the i th column of the X. The optimization begins with a randomly picked starting point X (0) and results in X og . We stack the columns of X og and their corresponding cost() value inC for the user to choose. The final results of the Par-MC are almost equivalent to the sequential MC if a GBO is used for optimization. In a GBO, the optimization direction is only dependent on the gradients and the starting points. Since the derivatives of the Par-MC’s cost functioncost mc with respect to columns ofX are independent of one another, the optimization direction is only related to the columns’ initialization. Therefore, if the initialization of X i,: s is equivalent to the sequential MC initializa- tion for each experiment, the two methods’ final results should be the same. Note that, Sincen mc cost functions are summed up in the Par-MC scheme, the stopping criterion of tolerance should also be multiplied by this factor. Therefore, the search result of the Par-MC scheme may vary slightly from its sequential counterpart for the same initial point. The proposed scheme reduces the total number of optimization iterations com- pared to the conventional scheme of optimizing multiple initial points sequentially. However, it comes at the cost of optimizing more variables per iteration. Thanks to the TensorFlow [63] tool used in this work, the cost becomes negligible. Tensor- Flow can handle millions of variables while training deep learning models, implying 43 Algorithm 1 Proposed gradient-based variable reduction Input:C, n v Output: n v variables 1: Choose one candidate fromC named x og 2: Find||∇ x cost(x)|| = [|| ∂cost x 1 ||,|| ∂cost x 2 ||,...,|| ∂cost xnp ||] at x = x og 3: Sort||∇ x cost(x)|| and choose n v top variables 4: return n v top variables of x that it can easily operate with n mc times more variables in an iteration even for a sufficiently large value of n mc . As a result, the search space exploration is greatly accelerated. The global search result cost(X i,:og ,f) can be far from the actual global optimal point if the NN models’ inaccuracy is too large. Modules’ NN mod- els have regression errors ( ˆ m) that cause deviation in the cost function esti- mation leading to global search performance degradation. The deviation in the cost function is proportional to the regression errors and derivatives with respect to the modules’ metric. Hence, to reduce the cost function estimation (i.e., ||cost(X i,:og , ˆ f)−cost(X i,:og ,f)||) by half, the regression errors should be reduced by half. Therefore, considering this difference, one can determine the tolerable regression errors and tune the corresponding hyper-parameters (such as dataset size) accordingly [64]. Local Optimization Phase In the local search phase, we use a gradient-free optimizer (GFO) [65] and SPICE model, which is more precise than NN models. The initial point of the GFO is one of the parameter candidatesC picked by the user. Since each optimization step with SPICE takes a significant amount of time (depending on the module test-bench), we want as few steps as possible for the optimization. Therefore, we enlarge ftol for the local search. In addition, we use the proposed gradient-based 44 (a) (b) Figure 3.5: The amount of error when the partial gradient is (a) near zero, or (b) larger than zero variable reduction technique illustrated in Algorithm 1 to decrease the number of optimizer’s variable. In this technique, we sort the design parameters based on their significance in affecting the cost function, and then keep n v most sensitive parameters as the variables for the local optimization. We set the sorting criteria to be|| ∂cost( ˆ f) ∂x i || obtained from NN model in decreasing order, in which x i is the i th element of the variable vector x. We know the initial point for the local search phase is one of the columns of X derived from Adam optimization. Let us call it x og . We also know that Adam obtains this value from the gradients and evaluations performed on NN model, which suffers from a regression error. The regression error deviates these partial derivatives for i th variable by i d ; therefore, the last partial derivative with respect to i th design parameter is expressed as ∂cost( ˆ f) ∂x i × (1 + i d ∂cost( ˆ f) ∂x i )| x=xog . (3.8) Thisshowsthecostfunctionismoredeviatedbythei th variableifeither i d islarger or the absolute partial derivative along this variable is smaller. Shown in Fig. 3.5 along thei th variable, if ∂cost( ˆ f) ∂x i is near zero the possibility of huge error increases. 45 Therefore, we propose to sort the parameters in x according to the absolute partial derivatives at x og in decreasing order, and choose the top n v variables. There is a trade-off between the choice of n v and the optimization time. Because regression errors and their effects on the update function are stochas- tical, we suggest considering n v as a design hyper-parameter. For this thesis, we usen v = 40%n p as a rule of thumb to enhance the local search speed by a factor of more than two. Another method can be to set a maximum bound d max for partial derivatives, and parameters are chosen that bring|| ∂cost( ˆ f) ∂x i ||<d max . After choosing then v most sensitive variables, we can run a local optimization with the few remaining parameters. We suggest using the Powell optimization algorithm, which practically converges faster than other typical GFOs for circuit design. We name the optimization result x ol , and we find the design parameter by p opt = sat −1 (x ol ). Finally, we send p opt to the output, which concludes the algorithm. 3.4 Case Study of a SAR ADC ThissectionillustratesanexampleofourproposedMOHSENNalgorithmtodesign aSARADC.ThealgorithmrunsontheTensorFlowplatformonamachinerunning Red Hat 6.10 OS with an Intel Xeon E5-2630 CPU and 32 GB of memory. The block-level verification tests are performed with Spectre on a machine running Red Hat 6.10 OS and with an Intel Xeon E7-4870 CPU and 256 GB of memory. The SAR ADC is a popular data conversion topology choice due to its power efficiency and wide range of applications [29–31]. This section illustrates the design of a SAR ADC proposed in [27] by the proposed MOHSENN. Figure 3.6 shows the ADC’s top-level block diagram, including four main modules: a comparator 46 THCDAC INN/INP COMP SEQ1 SEQ2 DP/DN<1:n bit > BN/BP CLKS CLKC RDY CIN/CIP OUT<1:n bit > Figure 3.6: SAR ADC Top level ... INN CLKS DN<1:n bit > 1 2 ... w tn w tp CIN DP<1:n bit > CIP CLKS CLKS INP 2 n -1 bit d 2 n -2 bit d 2 n -1 c u bit 2 n -2 c u bit 2c u 1c u ... ... INN CLKS DN<1:n bit > 1 2 ... w tn w tp CIN DP<1:n bit > CIP CLKS INP 2 n -1 bit d 2 n -2 bit d 2 n -1 c u bit 2 n -2 c u bit 2c u 1c u ... c in c in VDD VDD 2 2 2 n c u bit 2 n c u bit 2 n c u bit Bandwidth Estimation w tn w tp 0 v msb 2 n -1 bit 2 n -1 bit d c in 2 n -1 c u bit 2 n -1 c u bit Delay and full range voltage Estimation 2 n -1 c u bit 2 n -1 c u bit THCDAC’s circuit THCDAC’s test benches Figure 3.7: THCDAC architecture and the test bench (COMP), a track-and-hold with a capacitive DAC (THCDAC), a SAR sequential logic for driving THCDAC (SEQ1), and a SAR sequential logic for clocking the comparator (SEQ2). We set the user desired specification as u = [n bit ,freq S ], which holds the number of bits and sampling frequency to cover a wide range of operations for the SAR ADC. In this case study, we use 65 nm CMOS technology and the design is at the schematic level. 47 CLKC INN INP w ck1 w ck2 w t w in w in w cp w cn w cp w cn w r1 w r2 w r2 w r1 BN BP RDY w cinv w cinv COMP BN BP RDY V CLKC w or w dffinv2 xn bit w rdy xn bit BN BP RDY CLKC dly RST dly RDY v omin VDD 2 VDD 2 w cnand COMP’s circuit COMP’s test bench Figure 3.8: COMP’s architecture and the test bench ... w or n buf CLKS CLKQ<n bit > RDY CLKC w or dly or dly buf dly drv ... w ck1 w or SEQ2’s circuit SEQ2’s test bench Figure 3.9: SEQ2 architecture and the test bench 3.4.1 PreparationoftheMOHSENNforSARADCDesign SPICE P2M functions Modules in the SAR ADC load each other cyclically. Therefore, if we use interface approximation, the MLG will contain a loop. To avoid this problem, we use interface replication to model the SAR logic interface with the CDAC and the comparator. There is no significant addition to the number of module parameters due to this choice. On the other hand, the comparator’s loading effect on the CDAC is approximated with interface approximation for the purpose of this study. Table 3.1 indicates the module parameters with their ranges and brief illus- trations. And Table 3.2 contains the modules’ metrics, their illustrations, and the required simulation type to derive them. Figure 3.7, Figure 3.8, Figure 3.9, and Figure 3.10 shows the architecture and the test benches of THCDAC, COMP, 48 DP<1:n bit > DN<1:n bit > BP OUT<1:n bit > RDY CLKS Segment #1 CLKQ <1:n bit > Segment #2 Segment #2 BN D Q D Q D Q R R R ... ... RDY CLKS CLKQ<1> CLKQ<2> CLKQ<n bit > D Q CK CK CK CK CK CK CK CK CK CK w dffck1 w dffinv1 R R w dffnand Segment #1 w inv1 w inv2 w inv3 w nand DP/DN<i> OUT<i> BP/BN CLKQ<i> D Q CK CK CK CK CK CK CK CK CK CK w dffck2 w dffinv2 D Q CK Segment #2 w dffinv1 w dffinv2 SEQ1’s circuit SEQ1’s test bench Segment #2 D Q R D Q R D Q R D Q R Segment #2 RDY CLKQ 2 n -1 bit 2 n -1 bit d 2 n -1 bit d DP CDAC SEQ1 CLKQ RDY DP dly S1 dly S2 Segment #2 D Q R D Q R Segment #2 RDY CLKQ 2 n -1 bit d DP CDAC SEQ1 CLKQ RDY DP dly S1 dly S2 Figure 3.10: SEQ1 architecture and the test bench SEQ2, and SEQ1, respectively. Moreover, in each figure, the design parameters are shown as part of the architecture, while the interface modeling is presented in the corresponding test bench. The SAR ADC modules contain 26 design parameters, with 24 of them belongs to modules gathered in Table 3.1. It has two system-level parameters, which are p B = [n buf ,dc tr ]. The first is the buffer number used in SEQ2 and the second is the duty cycle of the sampling clock. MLG Construction After module-breakdown, we construct the MLG. As mentioned previously, g(p,u, [f∨ ˆ f]) is a function of user desired specifications, module parameters, and module test benches (P2M functions). Here, the only specification causing 49 Table 3.1: Design parameters with ranges Module p [p min ,p max ] Description THCDAC c u [0.5fF,5.0fF] Unit capacitance w tn [2,40] Sw. NMOS size w tp [2,60] Sw. PMOS size d [2,16] Division factor n bit [3,11] ADC resolution c in [2fF,30fF] Interface approx. COMP w ck1 [1,10] Inv. size w ck2 [1,20] Inv. size w cin [1,20] Input PMOS size w cn [1,40] NMOS size w cp [1,40] PMOS size w t [2,80] Tail PMOS size w r1 [1,8] Reset NMOS size w r2 [1,8] Reset NMOS size w cinv [1,40] Inv. load size w cnand [1,8] NAND load size n bit [3,11] ADC resolution w dffinv2 [1,16] Interface replicat. w rdy [1,48] Interface replicat. w or [1,10] Interface replicat. SEQ1 w inv1 [1,12] Inv. size w inv2 [2,24] Inv. size w inv3 [2,96] Inv. size w nand [1,16] NAND size w dffinv1 [2,16] Seg.#1 DFF Inv. size w dffck1 [1,10] Seg.#1 DFF Inv. size w dffinv2 [1,16] Seg.#2 DFF Inv. size w dffck2 [1,16] Seg.#2 DFF Inv. size w dffnand [2,32] Seg.#2 DFF NAND size n bit [3,11] ADC resolution d [2,16] Division factor SEQ2 w or [1,10] OR size w ck1 [1,12] Interface replicat. an effect is n bit , which is a parameter used in the THCDAC, COMP, and SEQ1. The SAR ADC also has two block-level parameters p b = [n buf ,dc tr ], which are the number of buffers used in SEQ2 and the THCDAC’s tracking time duty-cycle. The details related to both of these parameters are shown in Table 3.1. By knowing all the parameters, metrics, and interfaces, we can construct the MLG shown in Figure 3.11. The shared vertices of d, w ck2 , w dffinv1 , w dffinv2 , w dffck2 , w or shows 50 Table 3.2: Modules’ metrics of the SAR ADC Module m Description Sim. Type THCDAC bw n NMOS Sw. bandwidth AC bw p PMOS Sw. bandwidth AC v msb CDAC output swing Tran dly DAC DAC settling time Tran COMP pw c Power Tran dly RDY CLK-RDY delay Tran dly RST Reset delay Tran v omin Output common mode Tran c in Input Capacitance AC noise c Input referred noise Noise SEQ1 pw s1 Seg.#1 power Tran pw s2 Seg.#2 power Tran dly s1 RDY-DP delay Tran dly s2 RDY-CLKQ delay Tran SEQ2 pw or OR gate power Tran pw buf buffer power Tran pw drv driver power Tran dly or OR gate delay Tran dly buf buffer delay Tran dly drv driver delay Tran COMP THCDAC w tn c u d SEQ2 SEQ1 w tp w cinv w cin w cn w cp w t w r1 w r2 n bit w inv1 w inv2 w inv3 w nand w dffck1 w dffinv1 w dffck2 w dffinv2 w dffnand w or w or bw n bw p bw p v msb dly DAC dly RST c in dly RDY noise c pw c pw or pw buf pw drv dly or dly buf dly drv pw s1 pw s1 dly s1 dly s2 w ck2 w ck1 2 2 w rdy u p m v omin freq S n buf dc tr w cnand Figure 3.11: SAR ADC MLG the interface replication graph construction as discussed earlier in Section 3.3.1. For the specific case ofw rdy = 2w dffinv1 +w dffck2 , we placed the relationship in the 51 graph. The only interface approximation modeling is c in connecting the COMP’s metric to the THCDAC’s parameter. System-level design objective and constraints For the SAR ADC design, we construct the objective function obj and constraints h by obj =power, (3.9) h 1 =digital dly −analog dly ≥ 0 h 2 = 1 freq S −t tr −t q ≥ 0 h 3 =SNR− 6n bit − 11.76≥ 0 h 4 = 2πt tr min(bw n ,bw p )−n bit ln 2≥ 0 h 5 = 4− 2πt tr min(bw n ,bw p ) +n bit ln 2≥ 0 h 6 =v omin − 0.6≥ 0, where digital dly =dly or +n buf ×dly buf +dly drv (3.10) analog dly =dly s1 +dly s2 +dly DAC t q =n bit (dly RDY +dly RST + 2digital dly ) t tr =dc tr /freq S SNR = 10 log( (4v msb ) 2 4KT/(2 n bit c u +c in ) + (noise c ) 2 ) power =n bit pw s1 +pw s2 +pw or +n buf ×pw buf +pw drv +pw c . 52 Table 3.3: NN topology and hyper-parameters used in SAR ADC Design Module Neurons per layer φ(.) |D train | |D test | Training MAE Loss Test MAE Loss THCDAC [6, 64, 256, 64, 4] Sigmoid 7500 2500 0.013 0.0140 COMP [14, 256, 512, 256, 6] Sigmoid 7500 2500 0.013 0.014 SEQ1 [11, 256, 512, 256, 4] Sigmoid 7500 2500 0.011 0.014 SEQ2 [2, 50, 50, 6] Sigmoid 120 120 0.0003 0.0003 In (3.9), h 1 and h 2 hold timing constraints imposed by two different SAR ADC’s loop. h 3 holds the signal to noise ratio requirements, while h 4 and h 5 indicate the THCDAC’s bandwidth requirement. h 6 sets constraints on COMP’s output common mode voltage and obj is the power consumption. NN Regression Models We gathered information regarding the NN models for the four different modules, as presented in Table 3.3. For COMP, THCDAC, and SEQ1 we used an NN regression model with five layers, while for SEQ2 we used four layers. The number of neurons for each layer is shown in Table 3.3. To determine the number of layers and neurons per layer we used the method illustrated in [21]. The activation function of all layers except the last one is the Sigmoid function (i.e., φ(x) = 1 1+e −x ). The last layer uses a linear activation function. Other than SEQ2’s model, we trained the NN models with a dataset of|D train | =7,500 and tested it with |D test | =2,500 sample points generated by the previously described test benches. Generating the whole dataset took almost 5 h and 50 m. For training, we used the Adam algorithm and the loss function is the mean-absolute-error (MAE). The training and testing loss for the given datasets are presented in Table 3.3. 53 Table 3.4: Metrics introduced in Table 3.2, the statistical parameters and the regression error by kNN, RF, SVR and NN modeling with different size of training dataset|D train | m μ(m) σ(m) kNN ( ˆ m) RF ( ˆ m) SVR ( ˆ m) NN ( ˆ m) NN ( ˆ m) NN ( ˆ m) NN ( ˆ m) |D train | |D train | |D train | |D train | |D train | |D train | |D train | =7,500 =7,500 =7,500 =1,000 =2,500 =5,000 =7,500 THCDAC metrics bw n [GHz] 17.33 32.19 6.80 3.20 3.03 1.12 0.58 0.33 0.17 bw p [GHz] 8.17 15.62 3.46 1.46 1.53 0.50 0.26 0.15 0.13 v msb [mV] 475.55 39.93 8.24 3.77 1.85 1.04 0.35 0.26 0.21 dly DAC [pS] 241.86 172.52 30.05 15.63 7.70 9.87 7.78 7.45 6.92 COMP metrics pw c [μW] 118.93 46.63 13.61 10.77 5.79 9.25 6.24 3.10 1.36 dly RDY [pS] 320.49 226.78 68.24 46.75 40.68 38.0 24.6 15.5 7.77 dly RST [pS] 170.90 116.6 38.10 27.69 21.93 26.8 17.7 9.85 4.58 v omin [mV] 745.8 229.0 91.4 47.43 39.48 34.7 21.0 15.8 8.32 c in [fF] 17.1 8.0 2.7 0.87 0.17 0.15 0.12 0.10 0.06 noise c [μV 2 ] 367.8 255.5 84.51 29.41 52.12 54.7 35.7 30.9 13.8 SEQ1 metrics pw s1 [μW] 189.5 606.8 181.47 31.30 119.13 43.7 20.0 13.0 5.68 pw s2 [μW] 12.5 7.1 1.59 0.45 0.37 0.21 0.12 0.08 0.05 dly s1 [pS] 159.5 258.7 72.12 29.18 44.68 30.0 14.5 9.65 6.04 dly s2 [pS] 59.1 25.3 9.80 2.60 5.52 1.70 0.90 0.66 0.50 SEQ2 metrics,|D train |=120 for all cases pw or [μW] 10.2 6.9 0.60 0.11 0.08 - - - 0.01 pw buf [μW] 3.06 1.88 0.164 0.031 0.028 - - - 0.004 pw drv [μW] 3.43 1.26 0.121 0.025 0.024 - - - 0.005 dly or [pS] 31.60 1.85 0.355 0.008 0.148 - - - 0.007 dly buf [pS] 12.71 1.62 0.286 0.007 0.094 - - - 0.006 dly drv [pS] 14.76 4.33 0.580 0.018 0.185 - - - 0.014 With the available dataset we approximated all modules’ metrics with four different nonlinear regression models that have been used in circuit design tools (SVR, kNN, RF, and NN), and their corresponding regression errors are presented in Table 3.4. For SVR, kNN, and RF we used the python package of scikit- learn [66], and for NN we used TensorFlow. For SVR we used the non-linear rbf kernel, tol of 10 −3 , and epsilon of 10 −12 , and the remaining hyper-parameters are scikit-learn’s default. For kNN, we set the number of neighbors to 100, and for RF we set the number of estimators to 100. We chose these hyper-parameters based on manual tuning to achieve the minimum loss. We included the average μ() and standard deviation σ() of metrics from the training dataset in Table 3.4, 54 0 100 200 300 400 Iteration Number -50 0 50 100 150 200 250 cost NN Model @ global search SPICE @ global search SPICE @ local search 340 344 348 Iteration Number -2 0 2 4 cost Local Search 1,628 s Global Search 15 s Figure 3.12: cost vs optimization iteration in global search phase, and local search phase to be compared with regression errors. Comparatively, among all regression types, we can observe that the NN illicit fewer regression errors for all metrics, when the number of training sample points (|D train |) are equal to 7,500. We varied the |D train | to examine its effect on the NN model accuracy and presented the results in Table 3.4. As mentioned before, the NN model regression error can be reduced significantly by augmenting the dataset. 3.4.2 Search Engine Results After preparation, the CAD tool with the MOHSENN algorithm can receive u = [n bit , freq S ] from the user to generate the corresponding circuit. In this thesis, we startfromadesignofu =[6, 500MS/s]. Weshowtheglobalandlocalsearchphases when n mc is 1 (single-run Adam optimization) and when the stopping criterion is restrictivetol = 0.001 for local optimization. In the next experiment, we changen v to test whether reducing the dimensions with the proposed gradient-based variable 55 reduction method can still produce reliable results. We also examine the efficiency of the Par-MC and Adam algorithm, and we compare it to Simulated Annealing (SA) optimization and conventional sequential MC method. Next, we gathered three user desired specifications for u =[10, 200MS/s], u =[8, 340MS/s], and u =[6, 500MS/s]designedbytheMOHSENN.Finally, wewillcompareMOHSENN methodology to other circuit sizing algorithms. We tested MOHSENN’s hybrid search performance on SAR ADC design. The SAR ADC’s cost function curve vs. optimization iterations for both global and local searches are presented in Figure 3.12. For this test in the global search, the hyper-parameters were the learning-rate γ = 0.02, ftol = 10 −5 , and n mc = 1, and it took 342 iterations for convergence. In the local search, the hyper-parameters are ftol = 0.001, and n v = 10, and it took 337 SPICE evaluations within 6 optimization iterations for successful convergence. It can be seen that the global search improves the initial starting point significantly from cost(x (0) , ˆ f) = 218.05 to cost(x og , ˆ f) =−2.85 based on NN modeling. Because of the fast NN eval- uation, this step converges within 15 s. We evaluated every iteration of the global search with the SPICE model to examine the cost function trend with- out the regression errors. During the global search, the cost function is initially cost(x (0) ,f) = 221.66, which is minimized to cost(x og ,f) = 4.22 with the SPICE models. In the local search, the Powell algorithm conducts many SPICE evalua- tions per iteration. Therefore, for only six iterations, convergence takes more than 1,600 s. The final found point iscost(x ol ,f) =−2.7985, which is evaluated by the SPICE models. We compared the proposed Par-MC/Adam optimization (n mc = 50) with the SA optimizer to examine the efficiency of the proposed scheme for a global search. 56 10 0 10 1 10 2 10 3 Iteration 0 100 200 300 SA Par-MC Adam cost (a) 0 200 400 600 800 1000 Time elapsed (s) 0 100 200 300 cost SA Par-MC Adam 55.8 s 0 500 1000 Time elapsed (s) -3.5 -3 -2.5 -2 20X Faster cost 0 200 400 600 800 1000 Time elapsed (s) 0 100 200 300 cost SA Par-MC Adam 55.8 s 0 500 1000 Time elapsed (s) -3.5 -3 -2.5 -2 20X Faster cost (b) Figure 3.13: Global search phase with Par-MC/Adam optimization (n mc =20) vs. SA optimizer (a) in number of iterations (b) required time The results are presented in Figure 3.13(a) and Figure 3.13(b). SA optimiza- tion is stochastic and global, and conventionally used for circuit design prob- lems [11,17,25]. During one SA optimization iteration, several hundred func- tion evaluations are performed. We used an available open-source package of SciPy [67] and the function "scipy.optimize.dual_annealing." Figure 3.13(a) and 57 Figure 3.13(b) show cost(x, ˆ f) vs the optimizations iteration, and the optimiza- tion time, respectively. These two figures show that SA can achieve the cost function of cost(x og , ˆ f) =−3.00 after 960 iterations. The Adam optimizer with Par-MC requires 515 iterations to achieve cost(x og , ˆ f) =−3.06. Par-MC with Adam converges to the optimized value within only 56 s, while it takes 1,185 s for SA to converge. This time difference is as result of SA’s more function evaluations per iteration relative to the Par-MC method. This experiment shows that Par- MC with Adam optimization can reach the same optimized point derived by the conventional SA algorithm at least 20 times quicker. Figure 3.14 shows a comparison between conventional sequential MC vs. the Par-MC. In Figure 3.14(a), we show the optimization results for 20 different initial points. It can be seen that the final results between the sequential MC and Par- MC are similar, as claimed in Section 3.3.2. Interestingly, all derived results from the Adam optimizer for this problem are approximately close to the competitive point of cost(x og , ˆ f) =−3.06. It infers that even without MC, Adam optimizer can achieve respectable results for the SAR ADC’s design problem. Figure 3.14(b) presents a comparison of the Par-MC with sequential MC in terms of speed. In sequential MC, the required time is linearly proportional to n mc . Therefore, we tested sequential MC forn mc between 10 to 50, then we linearly projected the pos- sible required time forn mc =1,000, where it took almost 16,000 s (≈ 4 h). However, the same number of samples for Par-MC only requires 121.5 s. In Section 3.3.2, we claimed Par-MC’s completion time does not increase linearly with time on the TensorFlow platform. In this example, we can achieve more than 130 times faster speed when using the Par-MC (when n mc =1,000). We gathered the local search results with SPICE models and the Powell algo- rithm when altering n v values to examine its effects on the final optimization 58 0 5 10 15 20 MC experiment number -4 -3.5 -3 -2.5 -2 Seq-MC Par-MC cost(x og ,f) (a) 10 1 10 2 10 3 10 2 10 3 10 4 10 5 Required Time (s) Seq-MC Projected Seq-MC Proposed Par-MC n mc 133X Faster (b) Figure 3.14: Comparison between Par-MC and conventional sequential MC (a) The optimization result for 20 MC sample with identical initialization (b) The required time for different n mc results. For this test, the initial cost function is cost(x og ,f) =2.30. The Powell algorithm from SciPy package [67] in "scipy.optimize.minimize" was used with a stopping criteria of tol = 0.1. Figure 3.15 summarizes the effect of optimization 59 0 10 20 30 Number of variables n v -4 -2 0 2 cost(x ol , f) 0.6% Variation (a) 0 10 20 30 Number of variables n v 0 100 200 300 # of SPICE evaluations ~2.4X Less eval. (b) 0 10 20 30 Number of variables n v 0 500 1000 1500 2000 Optimization time (s) ~2.3X Faster (c) 0 10 20 30 Number of variables n v 40 50 60 70 80 FOM(fJ/c.step) 1.2% Variation (d) Figure 3.15: The local optimization results of 6-bit, 500 MS/s ADC vs. number of variablesn v (a) cost function (b) number of iterations (c) optimization time (d) ADC’s FOM result vs. n v . Figure 3.15(a) shows the final achieved optimized value cost(x ol ,f) vs. n v . This clearly shows that after n v =10 there is no significant improvement over the cost function. In other words, the other 16 variables with higher gradients, do not contribute to the local optimization. We can observe the benefit of reduc- ing the variables in Figure 3.15(b) and Figure 3.15(c), which show the required number of iterations and optimization time, respectively. Both of these increase linearly with n v , which indicates reducing the variables can significantly help to accelerate the search. For each local optimization with different n v , we derived p opt and tested them in the ADC verification test bench. Figure 3.15(d) shows the 60 0 20 40 60 80 100 Frequency (MHz) -150 -100 -50 0 SFDR = 62.5 dB SNDR = 57.8 dB SNR = 60.5 Amplitude (dB) (a) 0 50 100 150 Frequency (MHz) -100 -80 -60 -40 -20 0 20 SFDR = 55.7 dB SNDR = 48.4 dB SNR = 49.4 Amplitude (dB) (b) 0 50 100 150 200 250 Frequency (Hz) -100 -80 -60 -40 -20 0 20 SFDR = 43.6 dB SNDR = 37.3 dB SNR = 37.3 Amplitude (dB) (c) Figure 3.16: The output spectrum of the MOHSENN designed SAR ADC when input is a single tone sine wave for (a) [10-bit, 200MS/s], (b) [8bit, 340MS/s] , and (c) [6-bit, 500MS/s] measured FOM of the SAR ADC derived from the SPICE simulation, and there is no significant improvement of FOM ifn v ≥10. For this example, with the intro- duction of a variable reduction technique by only consideringn v =10 variables, we can achieve the same FOM as n v =26 with almost 2.3 times faster convergence time. Different User Desired Specifications We examined the diverse design capabilities of MOHSENN by three different user desired specifications of u =[10, 200MS/s], u =[8, 340MS/s], and u =[6, 500MS/s]. For this test, we use MOHSENN’s default hyper-parameters. The global search uses the Adam optimizer with γ = 0.02, n mc = 500, ftol = 10 −5 , and k max = 10, 000. The local search uses the Powell optimizer with ftol = 0.1, and k max = 1, 000. For all three cases, we simulated the SAR ADC with the MOHSENN’s offered p opt . We tested the ADCs with a monotone sine wave input, and their output spectra are shown in Figure 3.16, and the results are presented in Table 3.5. For the 10- and 8-bit ADC design cases, there is no significant improve- ment with the local search. That means, the global search result is already close to the nearest optimal point, and there is no need for further optimization. So 61 Table 3.5: MOHSENN Result of The SAR ADC with Given u Desired Spec. (u): [10, 200MS/s] [8, 340MS/s] [6, 500MS/s] Global Search # of iter. 1,149,500 360,500 282,000 Global Search Time 394s 124s 91s Local Search # of iter. 49 171 89 Local Search Time 311s 1224s 739s Verification Time 5h 35m 23s 1h 39m 47s 45m 18s SNDR (Global/Local) [dB] 57.8/57.8 48.9/48.4 37.0/37.3 SNR (Global/Local) [dB] 60.6/60.5 49.4/49.4 37.0/37.3 SFDR (Global/Local) [dB] 62.0/62.5 59.8/55.7 50.1/43.6 ENOB (Global/Local) [bit] 9.34/9.18 7.86/7.75 5.87/5.92 Power (Global/Local) [mW] 2.70/2.40 2.91/2.58 2.22/1.30 FOM (Global/Local) [fJ/c.step] 20.8/18.5 36.7/35.5 75.9/42.9 the local optimization time could be avoided for these two cases. However, for the 6-bit ADC, the FOM changes from 67.0 to 43.7 fJ/c.step, which is a significant improvement. Comparison to conventional methods To examine the efficiency of MOHSENN, we compared it to two other algorithms. The first is a conventional top-down hierarchical design [17,24,68,69] illustrated in Section 3.2.1.In this method, for the system-level we conduct a global search of the MOHSENN for the system-level optimization yielding p og , and thus, feasible metrics ˆ m og . In the module-level, we optimize each module separately to render f i (p i ) = ˆ m i og , or makef i (p i ) better than ˆ m i og . For the module-level optimization, 62 Table 3.6: Design Parameters and the Search Space of the SAR ADC Design Method Proposed MOHSENN Hierarchical Design Flat-SPICE (fast SA) Flat-SPICE (slow SA) Dataset Generation Time 5h 50m 5h 50m 0s 0s # of NN Evaluations 282,000 282,000 0 0 NN Search Time 91s 91s 0s 0s # of SPICE Evaluations 89 4178 27,000 103,302 SPICE Search Time 12m 19s 1h 9m 38s 1d 22h 12m 4s 7d 12h 2m 47s Total Search Time 13m 50s 1h 11m 17s 1d 22h 12m 4s 7d 12h 2m 47s Total Required Time 6h 3m 7h 1m 1d 22h 7d 12h 2m SNDR 37.30 dB 37.50 dB 24.07 dB 37.76 dB Power 1.30 mW 2.00 mW 1.77 mW 1.67 mW FOM 43 fJ/c.step 67 fJ/c.step 269fJ/c.step 52 fJ/c.step we use SA optimization and SPICE modeling for accurate results. However, this approach designs modules separately (causing interface problems), as illustrated in Section 3.2.1. The other algorithm is to directly use SPICE for the global search and use SA for the optimization. In other words, the optimization problem is the same as (3.5) with f. We call this approach the Flat-SPICE method. Since it is a global optimization directly on the overall AMS circuit functional model with SPICE obj(g(.,f)) and h(g(.,f)), we expect to find the global optimal point with no regression errors. However, the optimization time is prolonged. We designed a 6-bit 500 MS/s ADC with these two methods, and then insert the parameter candidate to the SAR ADC verification test bench. The result is shown in Table 3.6. MOHSENN uses its default hyper-parameters. In the hierarchical design, the first step is similar to MOHSENN’s global search, which uses NN modeling and MLG. However, the second step involves designing each module separately (without MLG). For the second step of hierarchical design, we used SA in the SciPy [67] library with a default format, and we only changed the 63 maximum number of iterations. For the comparator design the number is 50, and for the rest it is 10. Further, for the second level design, we do not take p B as variables, and they are set by system-level optimization. In Table 3.6, the number of SPICE simulations is the summation of running each module test benches. As indicated in Table 3.6, the hierarchical approach converges after 70 min, while MOHSENN converges within approximately 14 min. Accordingly, the proposed algorithm’s search engine is five times faster than the conventional hierarchical method while achieving a better FOM for the SAR ADC. The other approach (Flat-SPICE) is a single run SA optimization. Since one optimization convergence would take a considerable amount of time, we demon- strated this method for two cases. One is SA with a maximum number of iterations of 100, which we call fast Flat-SPICE. For the other case, this number is 500, and we call it slow Flat-SPICE. The fast technique took almost two days to converge, and the results were not satisfactory. However, the slow Flat-SPICE converged to an excellent point for the SAR ADC, i.e., a power consumption of 1.67 mW and SNDR of 37.76 dB. However, this required more than 7 days to achieve this point, and yet MOHSENN submits an approximately better result within total required time of 6 hours and 3 minutes including the initial training and testing dataset generation. This experiment shows that MOHSENN is capable of designing a SAR ADC as effectively as the Flat-SPICE method while converging 30 times faster. 64 Chapter 4 Circuit Connectivity Inspired Neural Network CCI-NN 4.1 AMS Circuit Modeling In this section, we discuss prior works that have addressed the modeling of com- plex AMS circuits. In the context of AMS circuit synthesis, the purpose is to approximate the top-level circuit design P2M function: m =f(p), (4.1) m = ˆ f(p) +e, where e is the regression error. To reduce the regression error, prior works have adoptedtwomaindirections. Oneapproachistoenhancethequalityofthetraining dataset by performing adaptive sampling (design of experiment, Latin Hypercube Sampling, boosting, transfer learning etc [64]) and the other is to explore different regression models to better approximate the information provided by the dataset. This chapter focuses on improving the regression models because better modeling combined with well-known sampling methods can lead to higher accuracy [22]. Different regression models have been reported for the AMS circuit. Methods suchasposynomial[14], RandomForest[19], k-Nearest-Neighbor[18], andSupport Vector Regression (SVR) [70] have been investigated in advanced CMOS nodes. 65 p=[p 1 , p 2 , … , p N , p R ] AMS Circuit sub- circuit 1 sub- circuit 2 sub- circuit N p 1 p 2 p N ... ... ... ... p i ... ... p i p 1 m p 2 p N sub- ANN 1 sub- ANN 2 sub- ANN N p R ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 1-layer NN Final layer CCI-NN ... ... ... ... p i+1 p i ... ... ... ... p i+1 p i sub- ANN i sub- ANN i+1 sub- ANN i Final layer ≡ ≡ Sequential Path Direct Path Figure 4.1: Proposed CCI-NN regression model structure, and the sequential and direct path connections Although they have been effective in diverse AMS circuit modeling, FC-NN has beenshowntodeliverthehighestaccuracyowingtoitsmultiplelayersandahigher degree of freedom in tuning the hyper-parameters [12,22]. FC-NN has been integrated into AMS circuit modeling and sizing for more than two decades. [71] proposed an ANN-based modeling algorithm for microwave circuits in 1997. [21] achieved high accuracy when designing an amplifier with five design parameters. Recently, [22] practically showed that FC-NN is more suitable for circuit modeling due to its large modeling capacity and reduced search time when designing for a nine-parameter amplifier. However, when both the number of parameters and their ranges are large, FC-NN requires a relatively large training dataset to achieve outstanding accuracy. Therefore, even for moderate-sized AMS circuits (20 to 30 parameters), a sub-circuit breakdown is used. Sub-circuit breakdown, which divides a large top-level circuit into smaller sub-circuits, is a well-known technique for simplification and easier analysis in a hierarchical design. In hierarchical design, we can denote the i th sub-circuit’s associated sub-function (f i ) by m i = f i (p i ), where p i and m i are its design 66 parameter and defined metrics. By constructing a behavioral top-level model of m = h([m 1 ,m 2 ,...,m N ],t) final specifications are derived from the sub-circuits’ metrics. In hierarchical design, f i s are simpler than the top-level P2M f; thus, ANN modeling is more precise. However, finding the relation between specifications and sub-circuits’ metrics is knowledge-based and can cause inaccuracy. A human designer needs to manually find the behavioral model h, the required sub-circuits’ design metrics m i and the loading effects among the sub-circuits for correct map- ping of m i s to m. Unfortunately, solving these three tasks accurately can be chal- lenging as shown in [72], specifically in the sub-micron regime where the second- order non-linearity becomes dominant. 4.2 Proposed CCI-NN To overcome the challenge of large training dataset requirements, we propose to use CCI-NN, which combines the features of FC-NN and sub-circuit breakdown, thereby improving the mentioned drawbacks: 1. CCI-NNusessub-ANNstomimicthesub-circuits. Thesearesimplertolearn compared to the case of one FC-NN modeling the whole circuit; therefore, fewer dataset samples are required to maintain the accuracy. 2. CCI-NN learns to construct h and choose the most sensitive m i s while con- sidering the loading effects during training automatically; hence, no excessive analog knowledge in the sub-circuit breakdown is required. As illustrated in Fig. 4.1, the proposed CCI-NN approximates the target top- level P2M function by the composition of several sub-ANNs. First, we identify the 67 sub-circuits which the top-level AMS circuit consists of and assign a sub-ANN in the regression model for each sub-circuit. Next, the design parameters of each sub- circuit(p i )arefedasaninputtothecorrespondingsub-ANN.Then, thesub-ANNs are sequentially linked to mimic the connections in the top-level circuit schematic. That is, the i th sub-ANN output is concatenated to the next sub-ANNs design parameters (p i+1 ) and fed as its input, as shown in Fig. 4.1. For the top-level circuits with feedback, we use the open-loop connection to build the sequential path. The direction of this path is the same as the signal direction in the top-level circuit. Finally, all the sub-ANNs outputs are concatenated and connected to an output layer, which is a fully-connected one. The parameters which belong to none of the sub-circuits (p R ) are also connected to the output layer. The dimension of each sub-ANN and the output layer can be found by iterative tuning techniques such as Bayesian optimization methods or manual tuning illustrated in [21]. CCI-NN is trained with the same dataset as conventional FC-NN, which is a list of [p,m]. Unlike conventional hierarchical design, it does not require the sub-circuits’ specifications dataset (i.e., [p i ,m i ]) for training. In fact, the train- ing procedure ensures that the sub-ANN outputs cast certain features containing the information about these specifications (i.e., m i s). Using the analogy of image classification, the sub-ANN outputs are equivalent to high-level features of the convolutional layers. These features can contain sub-circuit metrics or interface coupling that may influence other sub-ANNs specs. The final fully-connected layer in CCI-NN is analogous toh in the hierarchical design. However, instead of receiv- ing the sub-circuits specifications, it takes sub-ANNs trainable features as input to generate m. As mentioned before, h, a function that would otherwise require extensive analog knowledge to formulate, is found automatically using the final layer during the training. 68 The two types of connection, i.e., the sequential path between the sub-ANNs and the direct path to the output layer, help improve approximating the top-level specs. The direct path can enhance the modeling of the loading effect among sub-circuits since it sends all the sub-ANNs outputs (i.e., sub-circuits features) to the final layer simultaneously. Therefore, it helps the final layer to approximate specifications in the presence of all sub-circuits features. For example, the features can be the driving capability from the first sub-circuit and capacitive loading from the second, and the last layer combines the two to derive the bandwidth of the circuit. The sequential path improves specifications that depend on the order of the sub-circuits. For example, if the top-level has feedback, the specifications such as phase margin and bandwidth depend mainly on the sequence of sub-circuits. Thus, sequential paths can improve the approximation. Owing to this topological feature, CCI-NN can achieve better modeling performance than a conventional ANN architecture When the top-level AMS circuit contains feedback, we use open-loop connec- tionsinthesequentialpathtoavoidcyclesintheCCI-NNgraph. CyclicNNgraphs can cause longer training time or instability during training. The open-loop-based sequential path helps with the open-loop sub-circuit modeling. Then, CCI-NN’s final layer transforms the specifications into the closed-loop results, similar to con- ventional hierarchical design. In section 4.3 we study a three-stage amplifier with two nested feedbacks, where both are opened for the CCI-NN sequential path implementation. 69 Table 4.1: Statistic of performance metrics for AMS circuits AMS Circuits Performance Metrics Average Value Standard Deviation 2S-MCA Gain [dB] 37.0 28.4 UGB [GHz] 1.83 3.4 PM [ o ] 101.0 47.4 Power [mW] 11.7 15.2 p.o. swing [mV] 194 28 n.o. swing [mV] 471 6.6 3S-MCA Gain [dB] 67.9 29.4 UGB [GHz] 2.25 3.7 PM [ o ] 45.0 89.1 Power [mW] 5.6 7.9 p.o. swing [mV] 321 136 n.o. swing [mV] 410 122 CS-DAC SFDR [dB] 43.7 10.5 I out [mA] 24.5 10.8 Power [mW] 215 29 4.3 Simulation and Results In this section, we apply the proposed CCI-NN to model two different AMS circuit examples and compare them with conventional FC-NN modeling. We have used the TensorFlow platform for training and testing the ANN models. The Adamax optimizer with an initial learning rate of 0.001 and Mean Absolute Error (MAE) as loss function is utilized during the training. The test dataset for all circuit examples has 4,000 sample points. The top-level AMS circuit specifications and their statistics (average and standard deviation) in the test dataset for each circuit examples are summarized in TABLE 4.1. The sigmoid activation function is used for the FC-NN and CCI-NN hidden layers. 70 in- in+ o 1 + o 1 - o 1 - o 1 + out- out+ V ref Vbias AC VCM 1 st stage 2 nd stage AC (sub-circuit 1 ) (sub-circuit 2 ) 2 nd stage (sub-circuit 2 ) Figure 4.2: Two-stage amplifier with Miller compensation (2S-MCA) 4.3.1 Two-Stage Amplifier In the first example, we utilize CCI-NN to model a fully differential 2-Stage Miller Compensated Amplifier (2S-MCA) in 45nm PTM technology. Fig. 4.2 shows the circuit schematic. The 18 design parameters include the transistor sizes, voltage biases, compensation capacitance and resistance, load capacitance, and gain of the common-mode feedback. The target specifications to be modeled are gain, Unity Gain Bandwidth (UGB), Phase Margin (PM), power consumption, and negative and positive output swings. TABLE 4.1 shows the average and the standard deviation of the specifications from the test dataset. Due to the presence of three poles and one zero in the frequency response of the amplifier, UGB and PM are extremely nonlinear func- tions of the design parameters and are difficult to model, so they are calculated as the combination of the sequential and direct paths. Gain and power consumption are simply the summation of gain (in dB) and the power consumption of the two stages, so they can be estimated with the direct path. The positive and negative output swings are usually related to biases of the output stage. However, the bias 71 # of training samples 1 (Gain) (d B) Conventional FC-NN Proposed SFE-NN 18000 9000 4000 2000 1000 2 3 5 7 2.67X less data (a) # of training samples 0.4 0.6 0.8 1.0 1.2 (UG B) (GHz) Conventional FC-NN Proposed SFE-NN 18000 9000 4000 2000 1000 2.36X less data (b) # of training samples 10 15 20 (PM) ° Conventional FC-NN Proposed SFE-NN 18000 9000 4000 2000 1000 3.66X less data ( ) (c) # of training samples 1.0 (Power) (mW) Conventional FC-NN Proposed SFE-NN 18000 9000 4000 2000 1000 2.0 0.8 0.6 0.4 4.36X less data (d) # of training samples 10 (n.o. swin g) (mV) Conventional FC-NN Proposed SFE-NN 18000 9000 4000 2000 1000 20 30 5 2.50X less data (e) # of training samples 2 10 (p.o. swin g) (mV) Conventional FC-NN Proposed SFE-NN 18000 9000 4000 2000 1000 4 6 8 3.46X less data (f) Figure 4.3: The regression error of CCI-NN vs FC-NN for 2S-MCA’s (a) Gain, (b) UGB, (c) PM, (d) Power, (e) negative output swing, and (f) positive output swing of the output stage is related to the output voltage of the first stage, requiring sequential path for modeling. To generate the CCI-NN regression model, we use two sub-ANNs to represent the two stages of the amplifier; the corresponding parameters are connected to them. The remaining parameters (p R ) are load capacitance and the compensation resistanceandcapacitance, whicharefeddirectlytothefinallayer. Forthetraining 72 and test dataset, we use points picked uniformly at random. We optimize the hyper-parameters of FC-NN and CCI-NN to avoid any under- or over-fitting with methods suggested by [21]. We train both FC-NN and CCI-NN for varying numbers of samples in the training dataset. Then, we estimate the MAE of the specifications from the test dataset and show the results in Fig. 4.3. CCI-NN achieves less regression error compared to FC-NN when modeling each specification. For the gain, CCI-NN can achieve the same accuracy as FC-NN with less than 7k training samples, implying a 2.67x training dataset reduction. There is also a significant improvement in accuracy when modeling UGB and PM. In the case of PM, CCI-NN only requires about 5k training samples to achieve the same modeling accuracy of FC-NN with 18k samples, thus reducing the training dataset preparation time by 6h. Due to the modular property of CCI-NN, the power consumption is also estimated to be significantly better compared to modeling with FC-NN. 4.3.2 Three-Stage Amplifier In this example, we use a fully differential 3-Stage Nested Miller Compensated Amplifier (3S-NMCA) with a zeroing resistance. Fig. 4.4(a) shows the circuit’s schematic, which consists of 24 design parameters and six design specifications. The design parameters include the transistor sizes, voltage biases, compensa- tion capacitance and resistance, load capacitance and the common-mode feedback amplifier’s gain. The target specifications to be modeled are gain, unity gain band- width (UGB), phase margin (PM), power consumption, and negative and positive output swings. This amplifier has five poles and two zeros, which makes approxi- mating the UGB and PM challenging. 73 in- in+ o 1 + o 1 - o 1 - o 1 - o 1 + o 1 + V ref1 V ref2 AC VCM Vbias 1 st stage 2 nd stage 2 nd stage 3 rd stage 3 rd stage Common mode feedback out- out+ AC (sub-circuit 1 ) (sub-circuit 2 ) (sub-circuit 3 ) (sub-circuit 2 ) (sub-circuit 3 ) (a) Final Layer sub- circuit1 sub- ANN1 sub- ANN2 sub- ANN3 sub- circuit2 sub- circuit3 x 1 x 2 x 3 r 1 c 1 r 2 c 2 x R =[r 1 ,c 1 ,r 2 ,c 2 ,c l ] y c l (b) Figure 4.4: Three-stage amplifier with nested Miller compensation (3S-NMCA), (a) schematic, (b) top-level sub-circuit break-down and corresponding CCI-NN Table 4.2: MAE loss of the test dataset for different FC-NN and CCI-NN sizes modeling 3S-NMCA NN- Architecture NN layers Number of variables Testing MAE loss FC-NN [512, 1024, 512] 1,069,580 0.054 [256, 512, 256] 272,652 0.048 [128, 256, 128] 70,796 0.050 CCI-NN Sub-ANNs: [256, 256, 256] Final layer: [1024] 1,336,588 0.028 Sub-ANNs: [128, 128, 128] Final layer: [512] 340,620 0.026 Sub-ANNs: [64, 64, 64] Final layer: [256] 88,396 0.033 CCI-NN w.o. Seq. path Sub-ANNs: [128, 128, 128] Final layer: [512] 307,852 0.033 CCI-NN w.o. Dir. path Sub-ANNs: [128, 128, 128] Final layer: [512] 136,392 0.044 As shown in Fig. 4.4(b), the sub-circuit breakdown of 3S-NMCA includes three sequentially connected amplifiers that contain two nested feedback loops (r 1 -c 1 and r 2 -c 2 ) for compensation. To construct CCI-NN, we break the feedback loops and 74 # of training samples 4 6 8 10 12 Conventional FC-NN Proposed CCI-NN 5.36X less data 18000 9000 4000 2000 1000 ε(Gain) (dB) (a) # of training samples 0.8 1 1.2 1.4 1.6 1.8 Conventional FC-NN Proposed CCI-NN 18000 9000 4000 2000 1000 3.51X less data ε(UGB) (GHz) (b) # of training samples 30 40 50 Conventional FC-NN Proposed CCI-NN 18000 9000 4000 2000 1000 4.08X less data ε(PM) ( ° ) (c) # of training samples 0.5 1.5 Conventional FC-NN Proposed CCI-NN 7.63X less data 18000 9000 4000 2000 1000 1.0 2.0 ε(Power) (mW ) (d) # of training samples 10 20 30 40 50 Conventional FC-NN Proposed CCI-NN 6.85X less data 18000 9000 4000 2000 1000 ε(n.o. swing) (mV ) (e) # of training samples 10 Conventional FC-NN Proposed CCI-NN 7.09X less data 18000 9000 4000 2000 1000 8 20 30 6 4 ε(p.o. swing) (mV ) (f) Figure 4.5: The regression error of CCI-NN vs FC-NN for 3S-NMCA’s (a) Gain, (b) UGB, (c) PM, (d) Power (e) negative output swing , and (f) positive output swing only assume the main sequential path to connect the sub-ANNs. The value of the compensation elements which are r 1 , c 1 , r 2 , and c 2 , and the load capacitance are included in p R which directly connects to final layer. We trained various sizes of FC-NN and CCI-NN to find the best architecture that results in the lowest loss with the testing dataset. For the training, 18,000 and for testing 4,000 samples have been used. TABLE 4.2 indicates the loss value 75 Digital Input 15 Unary MSBs 4 Binary LSBs (sub-circuit 1 ) clk clk D D m clk clk m m m vb 1 vb 2 s s Master-slave Latches (sub-circuit 1 ) clk clk D D m clk clk m m m vb 1 vb 2 s s Master-slave Latches Analog Ouput vb 3 s s Driver vb 4 vb 5 Current-steering Cell vb 6 (sub-circuit 2 ) (sub-circuit 3 ) (a) sub- circuit 1 sub- circuit 2 sub- circuit 3 Final Layer sub- ANN 1 sub- ANN 2 sub- ANN 3 x 1 x 2 x 3 y Final Layer sub- ANN 1 sub- ANN 2 sub- ANN 3 x 1 x 2 x 3 y (b) Figure 4.6: 8-bit current steering DAC (a) the schematic, (b) the top-level sub- circuit breakdown, and the CCI-NN structure Table 4.3: MAE loss of the test dataset for different FC-NN and CCI-NN sizes modeling CS-DAC NN- Architecture NN layers Number of variables Testing MAE loss FC-NN [2048, 4096, 2048] 16,832,519 0.045 [1024, 2048, 1024] 4,221,959 0.047 [512, 1024, 512] 1,062,407 0.048 CCI-NN Sub-ANNs: [512, 512, 512] Final layer: [512] 2,900,487 0.026 Sub-ANNs: [256, 384, 256] Final layer: [256] 926,343 0.028 Sub-ANNs: [128, 192, 128] Final layer: [128] 233,799 0.029 CCI-NN w.o. Seq. path Sub-ANNs: [256, 384, 256] Final layer: [256] 795,271 0.027 CCI-NN w.o. Dir. path Sub-ANNs: [256, 384, 256] Final layer: [256] 729,479 0.037 and the number of trainable variables for each ANN configuration. The lowest loss value FC-NN can achieve is 0.048 while increasing the FC-NN layers’ size 76 # of training samples 1 2 3 4 Conventional FC-NN Proposed CCI-NN 4000 2000 1000 400 200 600 4.42X less data ε(SFDR) (dB) (a) # of training samples 1.0 Conventional FC-NN Proposed CCI-NN 4000 2000 1000 400 200 600 0.2 1.6 0.4 0.6 0.8 4.97X less data ε(I out ) (mA) (b) # of training samples 0.5 1.5 Conventional FC-NN Proposed SFE-NN 4000 2000 1000 400 200 600 5.56X less data 2.0 1.0 ε(Power) (mW) (c) Figure 4.7: The regression error of CCI-NN vs FC-NN for CS-DAC’s (a) SFDR, (b) full-scale output current, (c) power consumption (number of trainable variables) does not help improve this value. However, the loss value for CCI-NN is 0.026, which is almost half of FC-NN’s loss, while the number of variables is in the same order. We also trained CCI-NN when removing one of the paths without (w.o) the sequential path or without the direct path. The resultsshowremovingeachpathcansignificantlydegradetheCCI-NNperformance proving both paths are important for this structure. Fig. 4.5 shows the regression error as a function of different numbers of train- ing samples for each specification when modeled with CCI-NN and FC-NN. For both of the ANN structures, we used the sizes indicated in TABLE 4.2 delivering the lowest loss value. For both ANN structure, increasing the number of samples decreases the regression error. However, CCI-NN achieves much better accuracy 77 in estimating all the specs. Gain and power consumption approximation are sig- nificantly improved by CCI-NN owing to more modularity. The regression errors of UGB and PM, which are functions of most design parameters, are also signif- icantly lesser. For these two specifications, the CCI-NN model can achieve the same modeling accuracy as the FC-NN model trained with 18k training samples using only 5.2k and 4.4k training data, leading to a 3.51x and 4.08x reduction in the number of required dataset samples, respectively. The positive and nega- tive output swings are also significantly improved which shows the effectiveness of CCI-NN. Therefore, with CCI-NN, the required training dataset is reduced by more than 6.8x compared to that of FC-NN. 4.3.3 Current-Steering DAC In this example, we apply the proposed CCI-NN to model a high-speed mixed- signal circuit, namely, a high-speed (20GS/s) and medium-resolution (8-bit) Current-SteeringDAC(CS-DAC),whichisakeybuildingblockfornext-generation communication systems. A detailed schematic of the DAC is shown in Fig. 4.6(a), which is implemented in a segmented fashion. Each DAC element consists of master-slave latches, a switch driver, and a current-steering cell, all implemented in current-mode logic (CML) for high-speed operation [73], and thus, consisting of 16 design parameters. Spurious-free dynamic range (SFDR), power consumption, and full-scale output currentI out are used as the metrics for evaluating the perfor- mance of the DAC in terms of linearity and efficiency. A wide range of parameters and performance specifications are covered, as shown in TABLE 4.1. Although the DAC elements are connected in a parallel structure, their relative sizings are fixed based on the weightings. Therefore, without considering the parallel structure, 78 herein, we use three cascaded sub-ANNs to emulate the modular structure and signal flow of a single DAC element, as shown in Fig. 4.6(b). Similar to 3S-MSA, we examined different ANN structures with different sizes to find the best (lowest loss value) configuration and gathered the results in TABLE 4.3. It indicates that CCI-NN can approximately achieve sixty percent of FC-NN with an order of magnitude less trainable variables. We also removed the direct and sequential path, respectively, to examine different CCI-NN’s con- figuration capability. Interestingly, the CCI-NN with only the direct path can achieve a loss value as low as a two-path CCI-NN. Because no feedback exists at the top-level, the sub-ANNs placement does not affect the P2M characterization function. For these tests, we used 1k samples for training and 4k for testing. Fig. 4.7 shows the regression errors versus the number of training samples for bothconventionalFC-NNandtheproposedCCI-NN.ForSFDR,itcanbeobserved that CCI-NN only requires 1.04k training samples to achieve the same modeling accuracy as the FC-NN with 4.6k samples, i.e., 4.42x training sample reduction via CCI-NN. Considering that generating each sample in the training dataset requires almost 30 minutes of simulation time, CCI-NN saves a significant amount of time and computation power even with paralleling. In this CS-DAC example, CCI-NN shows better accuracy over different numbers of training samples. Specifically, SFDR is considered the most computationally complicated specification. CCI-NN achieves 1.5x improvement in the regression error. The training dataset reduction by CCI-NN is 5.56x and 4.97x, respectively, for the power consumption and full- scale output current estimation, which is substantially improved. 79 Chapter 5 Two-step SAR ADC with Passive Residue Transfer 5.1 Circuit Architecture This chapter explains the design and implementation of a two-step SAR ADC using the MOHSENN algorithm. The algorithm is used for parameter synthesis on a schematic level only, and the layout is manual with a few parameter tunings to achieve the same performance as the schematic. Fig. 5.1 shows the two-step SAR ADC block diagram with the passive residue transfer (PRT) technique. The first stage uses ordinary SAR operations to resolveb 1 -bit and leaves the residue on the PRT connected to the next stage. The second stage is a SAR ADC with the MSB as bit redundancy and resolves b 2 +1-bit; therefore, the whole architecture is a b 1 +b 2 -bit ADC. To achieve the one-bit redundancy, we increase the range of the second stage. We add tunable parallel capacitors to the second CDAC as an on-chip gain calibration to match this required range between the two stages. The PRT operation uses two clocks called ck A and ck B , with different phases andtwicetheperiodofthesamplingclock, tostoretheresiduefromthefirstCDAC on two PRT capacitors named CA and CB in Fig. 5.1. CA attaches to the first stage’s CDAC when ck A is high, including the sampling and quantization phases. Therefore, by the end of SAR operation, the residue is already stored on ck A . This scheme eliminates the conventional second stage’s sampling phase where the 80 DAC Driver (b2-bit) VCM VCM Vin CDAC1 SAR Logic CA CB cks cks CDAC2 Stage#1: SAR ADC Proposed PRT Stage#2: SAR ADC ckB ckA b2+1 Time Alignment + foreground calibration DAC Driver (b1-bit) D Q D Q cks ckA ckB Ckin (freqs) ckA, ckB Dout [1:12] + - + - SAR Logic b1 + - + + - + VCM Calibrations VrA VrB Vcdac2 b2 b1 Figure 5.1: Two-step SAR ADC with passive residue transfer residue is sampled in the second CDAC, saving power and time. When ck A is low, CA detaches from the first CDAC, and the second comparator activates the input branch connected to CA for quantization in the second stage. Simultaneously, the other PRT capacitor CB attaches to the first CDAC while its corresponding input to the second stage is deactivated. Therefore, ck B and ck A form a ping- pong residue transfer operation. Two feedthrough types can degrade the ADC’s performance when the PRT capacitor is connected to the second stage. The first is from the first CDAC, where its full swing is 32 times larger than the residue stored on the PRT cap. Therefore, this feedthrough can easily manipulate the delicate residue value and damage the second stage’s operation. The second kick from the second comparator’s activated branch can propagate through the PRT switches and distort the first CDAC. To alleviate both feedthroughs, we add the neutralization switches to both capacitors. 81 fck1th fck2th fck3th fnbridge fnckth fnckbth fnckbmth fndumth fnrrth fnswth fpckth fpckbth fpckbmth fpresth fck1c1 fck2c1 finnnc1 fninvc1 fpinvc1 fprstc1 ftttc1 frdynandc1 fdigin11 fdigin21 fdacsw fck1seq1 fck2seq1 finvseq1 fnandseq1 ftgseq1 fnand3cd1 frstbuf1cd1 finvoutlb1 fninvlb1 fnorlb1 fnrstlb1 fpinnlb1 fpinvlb1 fptt1lb1 fptt2lb1 fptt3lb1 finvoutls1 fninvls1 fnorls1 fnrstls1 fpinnls1 fpinvls1 fptt1ls1 fptt2ls1 fptt3ls1 Cs finvnorlb1 finvnorls1 fck1c2 fck2c2 finnnc2 fninvc2 fpinvc2 fprstc2 fswc2 frdynandc2 Cpprt fswrt fck3c2 fcknandc2 fninvrt fnresth flatdrv11 flatdrv12 TH COMP1 SEQ1 CDAC DRL1 CDAC DRS1 COMP2 PRT CDAC1 COMP DR1 b1 b1m b1l dlyrdy1 dlyrst1 dlylat1 dlydrv1 vcmout1 pwrcomp1 vmsb1 pwrdac1 dlydacdrab1 dlydacdrbb1 pwrdacdrb1 dlydacdras1 dlydacdrbs1 pwrdacdrs1 dlyseq1 pwrseq1 dlycompdr1 pwrbuf1 pwrcompdr1 (a) fdacsw fck1seq2 fck2seq2 finvseq2 fnandseq2 ftgseq2 fnand3cd2 frstbuf1cd2 finvoutlb2 fninvlb2 fnorlb2 fnrstlb2 fpinnlb2 fpinvlb2 fptt1lb2 fptt2lb2 fptt3lb2 finvoutls2 fninvls2 fnorls2 fnrstls2 fpinnls2 fpinvls2 fptt1ls2 fptt2ls2 fptt3ls2 Cs finvnorlb2 finvnorls2 fck1c2 fck2c2 finnnc2 fninvc2 fpinvc2 fprstc2 fswc2 frdynandc2 Cpprt fswrt fck3c2 fcknandc2 fninvrt flatdrv12 flatdrv22 SEQ2 CDAC DRL2 CDAC DRS2 COMP2 PRT CDAC2 COMP DR2 b2 b2m b2l dlyrdy2 dlyrst2 dlylat2 dlydrv2 vcmout2 pwrcomp2 vmsb2 dlydacdrab2 dlydacdrbb2 pwrdacdrb2 dlydacdras2 dlydacdrbs2 pwrdacdrs2 dlyseq2 pwrseq2 dlycompdr2 pwrbuf2 pwrcompdr2 Cbb fdigin12 fdigin22 (b) Figure 5.2: Global search phase with Par-MC/Adam optimization (n mc =20) vs. SA optimizer (a) in number of iterations (b) required time 5.2 MLG Construction Compared to the SAR ADC in Section 3.4, several modifications have been made to make the ADC implementable on the chip level. First, calibration circuits were added to the first and second comparator and the second CDAC. Furthermore, we separate the CDAC driver from the sequential logic previously known as SEQ1 and consider it one module. Therefore, it can be designed with more accuracy. Moreover, We consider two different sizes for the CDAC driver, the large size for driving the MSBs, and the small size for driving the LSB. This method helps to 82 Table 5.1: NN topology and hyper-parameters used in SAR ADC Design Module Neurons per layer NN Type |D train | |D test | Training MAE Loss Test MAE Loss COMP1 [12, 256, 512, 256, 6] FC-NN 8,000 2,000 0.012 0.013 COMP2 3X Sub-ANNs: CCI-NN 16,000 4,000 0.004 0.005 [256, 512, 256] F-layer:[1024,9] CDAC1 [10, 256, 512, 256, 2] FC-NN 8,000 2,000 0.004 0.004 CDAC2 [7, 64, 256, 64, 1] FC-NN 8,000 2,000 0.017 0.015 SEQ [12, 256, 512, 256,2] FC-NN 7,000 2,000 0.004 0.005 COMPDR [4, 256, 512, 256, 3] FC-NN 8,000 2,000 0.001 0.001 CDACDR [13, 256, 512, 256,3] FC-NN 7,100 2,000 0.005 0.004 reduce the power consumption in the SAR ADC. Finally, we consider the track and hold circuit as one module and separate it from the CDAC. The total number of modules is fourteen. The modules from the first stage are: Comparator (COMP1), Track and Hold (TH), Sequential logic (SEQ1), comparator driver (COMPDR1), CDAC (CDAC1), CDAC driver large (CDAC- DRL1), CDAC driver small (CDACDRS1). The second stage includes Compara- tor (COMP2), Sequential logic (SEQ2), comparator driver (COMPDR2), CDAC (CDAC2), CDAC driver large (CDACDRL2), CDAC driver small (CDACDRS2). Finally, the last module is the PRT circuit. User desired specifications can extend to u = [freq s ,b 1 ,b 2 ,b 1m ,b 2m ], in which the firstb 1m MSBs use the large-sized CDAC driver. The architecture contains 107 parameters and 37 metrics while solving specifications similar to the Section 3.4. The MLG of the two-step SAR ADC is shown in Fig. 5.2, where Fig. 5.2(a) and5.2(b)showthecorrespondingMLGofthefirstandsecondstages, respectively. Modules’ interactions are modeled with interface duplication shown by blue color 83 in this MLG to improve accuracy. Although MLG contains 14 modules, only seven P2M functions are modeled, and the test benches are created. That includes the first and second comparators, the first and second CDACs, the sequential logic, the CDAC driver, and the comparator driver. The last three are similar in both of the two-step stages, and the CDAC driver is the same topology for the large and small-sized but with different sizing. Therefore, to generate the NN models, we must only characterize seven modules, where their information is gathered in TABLE 5.1. 5.3 Design Consideration For the design of such ADC, we targeted for 10-bit resolution and 500MS/s sample rate. We put the user desired specification as u = [500MS/s, 5, 6, 2, 2], which means 5 bit for the first and second stage while choosing the first two CDAC drivers of both stages to have different sizes than the rest. We manually designed thelayoutsforthisADC,buttheyareflexible. Changingtheschematicparameters only requires slight modification on the layout; thus, post-layout iterations will be faster. To increase the post-layout iteration efficiency further, we add a 10% margin to all the timing requirements in the schematic design specifications. We choose the range and minimum step for the offset calibration based on the worst parameters obtained by Monte-Carlo estimation. The minimum size of CDAC capacitance values is chosen from technology and layouts placements. 5.4 Measurement Results The prototype of two-step SAR ADC is fabricated in 12nm FinFet technology with an active area of 0.006mm 2 (Fig. 5.3). The ADC consumes 1.33mW from a 0.65V 84 Figure 5.3: Chip micrograph Figure 5.4: Measured output spectrum at 246 MHz input frequency supplyvoltagewhileoperatingata500MS/ssamplerate. Fig.5.4showstheADC’s output spectrum for the Nyquist input frequency of 246 MHz with the decimation factor of 27. It achieves the SNDR of 48.4dB and SFDR of 65.0dB yielding the Walden FOM of 12.0fJ/c-step at Nyquist input. Fig. 5.5 shows the SNDR and SFDR over different input frequencies for the ADC operating at 500MS/s. The 85 Figure 5.5: SNDR and SFDR vs Frequency maximumachievedSNDRis49.5dBat29MHzinput, 1dBhigherthantheNyquist, showing almost similar performance for the entire ADC bandwidth. TABLE 2.1 shows a comparison with the state-of-the-art ADCs. The ADC prototype shows comparable FOM and area but a higher sample rate for a single ADC and a significantly smaller area. 86 Table 5.2: COMPARISON TABLE WITH STATE-OF-THE-ART ADCS Specification This Work [74] ASSCC15 [75] VLSI20 [76] ISSCC21 [77] ISSCC19 Architecture Pipe- SAR PRT Pipe- SAR SAR, Flash SAR- SDCT SAR- Pipe Technology 14nm 65nm 28nm 7nm 16nm Fs (MS/s) 500 1,200 1,000 600 600 SNDR @Nyquist [dB] 48.4 43.7 45.5 55.3 60.2 SFDR @Nyquist [dB] 65.0 58.1 59.4 67.8 78.3 Power [mW] 1.33 5.0 2.55 13.0 6.0 FOM w [fJ/c.step] 12.0 35.0 16.6 45.6 12.0 Active Area [mm 2 ] 0.006 0.013 0.006 0.037 – Amplifier Cali- bration No No Yes No Yes 87 Chapter 6 Conclusion 6.1 Summary The future trends of electronic systems, such as the IoT, 5G communications, autonomous vehicles, and AI devices demand high-performance AMS circuits. Developing design automation tools can significantly reduce the AMS circuits gen- eration’s cost and time to market. This thesis proposes a method for automatic AMS parameter synthesis. The method uses a novel MLG methodology to enforce equalitybetweentheinterfacesofadjacentmodules. MLGleadstoamoreaccurate characterization of modules and, as a result, a better AMS design. Moreover, this method accelerates the parameter search by the proposed hybrid search. We used the proposed Par-MC technique with Adam optimization during global search with NN models, which reduced the convergence time to minutes. In the local search with SPICE models, we introduced a novel variable reduction technique based on the gradients achieved from the NN regression models, which significantly acceler- ated the search time. To reduce the interface effects and leverage digital design flow, this thesis presents a T-ADC enabled by PPS TDC. The PPS TDC can quantize the pulse- width by linearly shrinking a pulse within a passive R-C network. The design parameters in PPS TDC are differentiable from the digital circuits’ parameters. Therefore, PPS TDC’s parameters can achieve the desired performance specifica- tions while the digital part only satisfies the timing constraints. The PPS TDC 88 is used in T-ADC fabricated in 65nm technology and at the Nyquist frequency of 10GS/s sample rate can achieve competitive SNDR of 32.5dB. The ADC occupies an area of 0.015mm2 while consuming 29.7mW from 1V supply voltage. This thesis also proposed CCI-NN for efficient AMS circuit modeling. The sub-ANNs of CCI-NN represent the AMS circuit’s modular structure, and the connection between sub-ANNs is constructed according to the circuit connection and the signal path. Compared to conventional NN, CCI-NN exploits the informa- tion about the circuit modules’ inter-relations, which helps achieve better modeling accuracy. Comprehensive experiments have proved that the novel NN structure can effectively reduce the required dataset volume for a particular target accuracy, implying that the proposed CCI-NN requires fewer exhaustive SPICE simulations to gain precise modeling. The effectiveness of this approach was tested on a two-step SAR ADC with 107 parameters in GF 14nm technology. The core ADC uses novel PRT to transfer the residue without any active elements to the second stage leading to five different modules. The measurement results show a competitive FOM of 12fJ/c-step at a 500MS/s sample rate while achieving a 7.77-bit ENOB. 6.2 Future Works The research in this thesis has demonstrated several essential methodologies to enhance circuit automation tools. The proposed MOHSENN algorithm accom- modates a platform to design an extensive circuit system while considering the interface effects from adjacent components. Then, it demonstrates that using NN models accelerates the design process. However, the MOHSENN algorithm is for parameter synthesis rather than complete circuit synthesis, and some modification 89 is required to fulfill such a purpose. Here are several improvements suggested for the future of the MOHSENN algorithm. • Make a template-based circuit layout generation tool based on the manual layout of the two-step SAR ADC in chapter 5. As mentioned earlier, the layout for this work is tunable for different design parameters and requires a few metal routing modifications. Therefore, one can automate the design by tuning the metal routing. • UsethetransferlearningconcepttogeneratetheNNmodelsfortheextracted netlists from the automated layout tools. The algorithm that can do this is demonstrated in [64]. Although MOHSENN already has been used to design with transfer learning in [78], however, the generated circuit had only one module, while the MLG concept is to design more sophisticated circuits such as the SAR ADC. • We can use transfer learning to retrain modules for the new technology with fewer samples. As illustrated in chapter 3, MLG is irrelevant to technol- ogy or layout. However, the testbenches and NN models are. One can use the transfer learning method to use pre-trained models to generate new NN models with fewer required sample points. The idea of CCI-NN also requires further investigation o achieve higher perfor- mance. Here we have several suggestions to improve the CCI-NN • CCI-NN can be used to design a large system to replace the MLG. CCI-NN can essentially capture the key intermediate variables that are modeled as interfaces in the MLG. One can investigate the correlation between interme- diate variables and interface elements in the MLG. 90 • CCI-NN already considers a modular circuit so that one can separate the different sub-ANNs for the training of the other similar circuits using the similar parts. For example, by learning a comparator and separating it to pre-amp, latch, and further digital circuits, CCI-NN can be generalized to learn new architectures The PPS TDC and the two-step SAR ADC can be improved based on the measurement data. For the PPS TDC, new methodologies are required to reduce the mismatch effects on its core. Also, more detailed calibration schemes are required to decrease the PVT effects on the circuit. For the SAR ADC, we need to demonstrate new methodologies to decrease the noise level. Such as adding an amplification phase or modifying the second stage. 91 Reference List [1] M. Hassanpourghadi, P. K. Sharma, and M. S.-W. Chen, “A 6-b, 800-MS/s, 3.62-mW Nyquist Rate AC-Coupled VCO-Based ADC in 65-nm CMOS,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 6, pp. 1354–1367, 2017. [2] S. Zhu, B. Wu, Y. Cai, and Y. Chiu, “A 2-GS/s 8-bit Non-Interleaved Time- Domain Flash ADC Based on Remainder Number System in 65-nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 1172–1183, 2018. [3] Y. M. Tousi and E. Afshari, “A Miniature 2 mW 4 bit 1.2 GS/s Delay-Line- Based ADC in 65 nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 46, no. 10, pp. 2312–2325, 2011. [4] S. Zhu, B. Xu, B. Wu, K. Soppimath, and Y. Chiu, “A Skew-Free 10 GS/s 6 bit CMOS ADC With Compact Time-Domain Signal Folding and Inherent DEM,” IEEE Journal of Solid-State Circuits, vol. 51, no. 8, pp. 1785–1796, 2016. [5] K. Ohhata, “A 2.3-mW, 1-GHz, 8-Bit Fully Time-Based Two-Step ADC Using a High-Linearity Dynamic VTC,”IEEE Journal of Solid-State Circuits, vol. 54, no. 7, pp. 2038–2048, 2019. [6] M. Zhang, Y. Zhu, C.-H. Chan, and R. P. Martins, “An 8-Bit 10-GS/s 16x Interpolation-Based Time-Domain ADC With <1.5-ps Uncalibrated Quan- tization Steps,” IEEE Journal of Solid-State Circuits, vol. 55, no. 12, pp. 3225–3235, 2020. [7] Y. Huo, X. Dong, and W. Xu, “5g cellular user equipment: From theory to practical hardware design,” IEEE Access, vol. 5, pp. 13992–14010, 2017. [8] R. Lourenço, N. Lourenço, and N. Horta, AIDA-CMK: Multi-Algorithm Opti- mization Kernel Applied to Analog IC Sizing. Springer International Publish- ing, 2015. [Online]. Available: https://doi.org/10.1007%2F978-3-319-15955-3 92 [9] K.Hakhamaneshi,N.Werblun,P.Abbeel,andV.Stojanović,“Bagnet: Berke- ley analog generator with layout optimizer boosted with deep neural net- works,” in 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). Westminster, CO, USA: IEEE, 2019, pp. 1–8. [10] N. Lourenço, R. Martins, A. Canelas, R. Póvoa, and N. Horta, “Aida: Layout- aware analog circuit-level sizing with in-loop layout generation,” Integr., vol. 55, pp. 316–329, 2016. [11] T. Eeckelaert, T. McConaghy, and G. Gielen, “Efficient multiobjective syn- thesis of analog circuits using hierarchical Pareto-optimal performance hyper- surfaces,” in Design, Automation and Test in Europe. Munich, Germany, Germany: IEEE, March 2005, pp. 1070–1075 Vol. 2. [12] T. McConaghy and G. G. E. Gielen, “Template-Free Symbolic Performance Modeling of Analog Circuits via Canonical-Form Functions and Genetic Pro- gramming,” IEEE Transactions on Computer-Aided Design of Integrated Cir- cuits and Systems, vol. 28, no. 8, pp. 1162–1175, 2009. [13] M. del Mar Hershenson, “Design of pipeline analog-to-digital converters via geometric programming,” in IEEE/ACM International Conference on Com- puter Aided Design, 2002. ICCAD 2002. San Jose, CA, USA: IEEE, 2002, pp. 317–324. [14] W. Daems, G. Gielen, and W. Sansen, “Simulation-based generation of posyn- omial performance models for the sizing of analog integrated circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 22, no. 5, pp. 517–534, May 2003. [15] F. De Bernardinis and A. Sangiovanni Vincentelli, “Efficient analog plat- form characterization through analog constraint graphs,” in ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005. San Jose, CA, USA: IEEE, Nov 2005, pp. 415–421. [16] P. Nuzzo, F. De Bernardinis, P. Terreni, and A. Sangiovanni Vincentelli, “Enriching an analog platform for analog-to-digital converter design,” in 2005 IEEE International Symposium on Circuits and Systems. Kobe, Japan: IEEE, May 2005, pp. 1286–1289 Vol. 2. [17] P. Nuzzo, A. Sangiovanni-Vincentelli, X. Sun, and A. Puggelli, “Methodology fortheDesignofAnalogIntegratedInterfacesUsingContracts,”IEEESensors Journal, vol. 12, no. 12, pp. 3329–3345, Dec 2012. 93 [18] X. Tang and A. Xu, “Multi-class classification using kernel density estimation on k-nearest neighbours,” Electronics Letters, vol. 52, no. 8, pp. 600–602, 2016. [19] T. Wu et al., “Complexity reduction for analog circuit performance models using random forests,” in 2009 17th IFIP International Conference on Very Large Scale Integration (VLSI-SoC). Florianopolis, Brazil: IEEE, 2009, pp. 29–34. [20] V. Ceperic and A. Baric, “Modeling of analog circuits by using support vec- tor regression machines,” in Proceedings of the 2004 11th IEEE International Conference on Electronics, Circuits and Systems, 2004. ICECS 2004. Tel Aviv, Israel, Israel: IEEE, 2004, pp. 391–394. [21] G. Wolfe and R. Vemuri, “Extraction and use of neural network models in automated synthesis of operational amplifiers,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 22, no. 2, pp. 198–212, Feb 2003. [22] Y. Li et al., “An Artificial Neural Network Assisted Optimization System for Analog Design Space Exploration,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1–1, 2019. [23] O. Garitselov et al., “Fast-Accurate Non-Polynomial Metamodeling for Nano- CMOS PLL Design Optimization,” in 2012 25th International Conference on VLSI Design. Hyderabad, India: IEEE, 2012, pp. 316–321. [24] N. Lourenço et al., “Using Polynomial Regression and Artificial Neural Net- works for Reusable Analog IC Sizing,” in 2019 16th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD). Lausanne, Switzerland: IEEE, 2019, pp. 13–16. [25] P. Nuzzo, F. D. Bernardinis, and A. Sangiovanni-Vincentelli, “Platform-based mixed signal design: Optimizing a high-performance pipelined ADC,” Analog Integrated Circuits and Signal Processing, vol. 49, no. 3, pp. 343–358, Dec 2006. [Online]. Available: https://doi.org/10.1007/s10470-006-9067-8 [26] R. De Bernardinis, P. Nuzzo, and A. Sangiovanni Vincentelli, “Mixed sig- nal design space exploration through analog platforms,” in Proceedings. 42nd DesignAutomationConference, 2005. Anaheim, CA,USA:IEEE,June2005, pp. 875–880. [27] C. Liu, S. Chang, G. Huang, and Y. Lin, “A 10-bit 50-MS/s SAR ADC With a Monotonic Capacitor Switching Procedure,” IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 731–740, April 2010. 94 [28] M. Ding, P. Harpe, G. Chen, B. Busze, Y. Liu, C. Bachmann, K. Philips, and A. van Roermund, “A hybrid design automation tool for SAR ADCs in IoT,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 12, pp. 2853–2862, Dec 2018. [29] A. Samiei and H. Hashemi, “A Chopper Stabilized, Current Feedback, Neural Recording Amplifier,” IEEE Solid-State Circuits Letters, vol. 2, no. 3, pp. 17– 20, March 2019. [30] Q. Fan and J. Chen, “A 1-GS/s 8-Bit 12.01-fj/conv.-step Two-Step SAR ADC in 28-nm FDSOI Technology,” IEEE Solid-State Circuits Letters, vol. 2, no. 9, pp. 99–102, Sep. 2019. [31] J. Nam, M. Hassanpourghadi, A. Zhang, and M. S. Chen, “A 12-Bit 1.6, 3.2, and 6.4 GS/s 4-b/Cycle Time-Interleaved SAR ADC with Dual Reference Shifting and Interpolation,” IEEE Journal of Solid-State Circuits, vol. 53, no. 6, pp. 1765–1779, June 2018. [32] H. Hong, L. Lin, and Y. Chiu, “Design of a 0.20–0.25-v, sub-nw, rail-to-rail, 10-bit sar adc for self-sustainable iot applications,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 5, pp. 1840–1852, May 2019. [33] C. Lu and D. Huang, “1.2 v 10-bits 40 ms/s cmos sar adc for low-power applications,” IET Circuits, Devices Systems, vol. 13, no. 6, pp. 857–862, 2019. [34] R. B. Staszewski, J. L. Wallberg, S. Rezeq, C.-M. Hung, O. E. Eliezer, S. K. Vemulapalli, C. Fernando, K. Maggio, R. Staszewski, N. Barton et al., “All- digital PLL and transmitter for mobile phones,” IEEE Journal of Solid-State Circuits, vol. 40, no. 12, pp. 2469–2482, 2005. [35] W.Deng, D.Yang, T.Ueno, T.Siriburanon, S.Kondo, K.Okada, andA.Mat- suzawa, “A fully synthesizable all-digital PLL with interpolative phase cou- pled oscillator, current-output DAC, and fine-resolution digital varactor using gated edge injection technique,” IEEE Journal of Solid-State Circuits, vol. 50, no. 1, pp. 68–80, 2014. [36] C.-R. Ho and M. S.-W. Chen, “A fractional-N DPLL with calibration-free multi-phase injection-locked TDC and adaptive single-tone spur cancella- tion scheme,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, no. 8, pp. 1111–1122, 2016. [37] Y. Okuma, K. Ishida, Y. Ryu, X. Zhang, P.-H. Chen, K. Watanabe, M. Takamiya, and T. Sakurai, “0.5-V input digital LDO with 98.7% current 95 efficiency and 2.7-μA quiescent current in 65nm CMOS,” in IEEE Custom Integrated Circuits Conference 2010. IEEE, 2010, pp. 1–4. [38] B. Razavi, “Design Considerations for Interleaved ADCs,” IEEE Journal of Solid-State Circuits, vol. 48, no. 8, pp. 1806–1817, 2013. [39] G. G. Gielen, L. Hernandez, and P. Rombouts, “Time-Encoding Analog-to- Digital Converters: Bridging the Analog Gap to Advanced Digital CMOS- Part 1: Basic Principles,” IEEE Solid-State Circuits Magazine, vol. 12, no. 2, pp. 47–55, 2020. [40] Y.-G. Yoon, J. Kim, T.-K. Jang, and S. Cho, “A Time-Based Bandpass ADC Using Time-Interleaved Voltage-Controlled Oscillators,” Circuits and Systems I:RegularPapers, IEEETransactionson, vol.55, no.11, pp.3571–3581, 2008. [41] G. Taylor and I. Galton, “A Mostly-Digital Variable-Rate Continuous-Time Delta-Sigma Modulator ADC,”Solid-State Circuits, IEEE Journal of, vol. 45, no. 12, pp. 2634–2646, 2010. [42] P. Dudek, S. Szczepanski, and J. Hatfield, “A high-resolution CMOS time-to- digital converter utilizing a Vernier delay line,” IEEE Journal of Solid-State Circuits, vol. 35, no. 2, pp. 240–247, 2000. [43] K.Kim, W.Yu, andS.Cho, “A9bit, 1.12psResolution2.5b/StagePipelined Time-to-DigitalConverterin65nmCMOSUsingTime-Register,”IEEEJour- nal of Solid-State Circuits, vol. 49, no. 4, pp. 1007–1016, 2014. [44] M. Lee and A. A. Abidi, “A 9 b, 1.25 ps Resolution Coarse–Fine Time-to- Digital Converter in 90 nm CMOS that Amplifies a Time Residue,” IEEE Journal of Solid-State Circuits, vol. 43, no. 4, pp. 769–777, 2008. [45] K. Kim, Y.-H. Kim, W. Yu, and S. Cho, “A 7 bit, 3.75 ps Resolution Two- Step Time-to-Digital Converter in 65 nm CMOS Using Pulse-Train Time Amplifier,” IEEE Journal of Solid-State Circuits, vol. 48, no. 4, pp. 1009– 1017, 2013. [46] H. Molaei and K. Hajsadeghi, “A 5.3-ps, 8-b Time to Digital Converter Using a New Gain-Reconfigurable Time Amplifier,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 66, no. 3, pp. 352–356, 2019. [47] D. Kim, K. Kim, W. Yu, and S. Cho, “A Second-Order δσ Time-to-Digital Converter Using Highly Digital Time-Domain Arithmetic Circuits,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 66, no. 10, pp. 1643–1647, 2019. 96 [48] S. Ziabakhsh, G. Gagnon, and G. W. Roberts, “A Second-Order Bandpass δσ Time-to-Digital Converter With Negative Time-Mode Feedback,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 4, pp. 1355–1368, 2019. [49] F. Yuan and P. Parekh, “Analysis and Design of an All-Digital δσ TDC via Time-Mode Signal Processing,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 6, pp. 994–998, 2020. [50] A. M. Z. Khaki, E. Farshidi, K. A. Asl, S. H. M. Ali, and M. Othman, “Design and Analysis of a Multirate 5-bit High-Order 52 fs r ms δσ Time-to-Digital Converter Implemented on 40 nm Altera Stratix IV FPGA,” IEEE Access, vol. 9, pp. 128117–128125, 2021. [51] S.-J.Kim, W.Kim, M.Song, J.Kim, T.Kim, andH.Park, “15.5a0.6v1.17ps PVT-tolerant and synthesizable time-to-digital converter using stochastic phase interpolation with 16× spatial redundancy in 14nm FinFET technol- ogy,” in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015, pp. 1–3. [52] J. S. Teh, L. Siek, A. M. Alonso, A. Firdauzi, and A. Matsuzawa, “A 14-b, 850fs Fully Synthesizable Stochastic-Based Branching Time-to-Digital Con- verter in 65nm CMOS,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018, pp. 1–5. [53] V.-N. Nguyen, X. T. Pham, and J.-W. Lee, “Three-Step Cyclic Vernier TDC Using a Pulse-Shrinking Inverter-Assisted Residue Quantizer for Low- Complexity Resolution Enhancement,” IEEE Transactions on Instrumenta- tion and Measurement, vol. 70, pp. 1–12, 2021. [54] R. Enomoto, T. Iizuka, T. Koga, T. Nakura, and K. Asada, “A 16-bit 2.0-ps Resolution Two-Step TDC in 0.18μm CMOS Utilizing Pulse-Shrinking Fine Stage With Built-In Coarse Gain Calibration,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 1, pp. 11–19, 2019. [55] D. Dinkar Toraskar, M. P. Mattada, and H. Guhilot, “Time domain ADC using pulse shrinking TDC,” in 2016 International Conference on Circuits, Controls, Communications and Computing (I4C), 2016, pp. 1–4. [56] Y. J. Park and F. Yuan, “0.25–4 ns 185 MS/s 4-bit pulse-shrinking time- to-digital converter in 130 nm CMOS using a 2-step conversion scheme,” in 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS), 2015, pp. 1–4. 97 [57] M. Baert and W. Dehaene, “A 5-GS/s 7.2-ENOB Time-Interleaved VCO- Based ADC Achieving 30.5 fj/cs,” IEEE Journal of Solid-State Circuits, vol. 55, no. 6, pp. 1577–1587, 2020. [58] B. Xu, Y. Zhou, and Y. Chiu, “A 23-mW 24-GS/s 6-bit Voltage-Time Hybrid Time-Interleaved ADC in 28-nm CMOS,” IEEE Journal of Solid-State Cir- cuits, vol. 52, no. 4, pp. 1091–1100, 2017. [59] M. Hassanpourghadi and M. Sharifkhani, “Fast Static Characterization of Residual-Based ADCs,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 60, no. 11, pp. 746–750, Nov 2013. [60] F. De Bernarclinis, S. Gambini, R. Vincis, F. Svelto, A. Sangiovanni Vincen- telli, and R. Castello, “Design space exploration for a umts front-end exploit- ing analog platforms,” in IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004. San Jose, CA, USA: IEEE, 2004, pp. 923–930. [61] B. Razavi, Design of Analog CMOS Integrated Circuits, 1st ed. USA: McGraw-Hill, Inc., 2000. [62] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014. [63] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corradoetal., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/ [64] J. Liu, M. Hassanpourghadi, Q. Zhang, S. Su, and M. S. W. Chen, “Transfer Learning with Bayesian Optimization-Aided Sampling for Efficient AMS Cir- cuit Modeling,” in 2020 IEEE/ACM International Conference on Computer- Aided Design (ICCAD). Westminster, CO, USA: IEEE, 2020, pp. 1–8. [65] L. Rios and N. Sahinidis, “Derivative-free optimization: A review of algo- rithms and comparison of software implementations,” Journal of Global Opti- mization, vol. 56, 11 2009. [66] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [67] P. Virtanen et al., “SciPy 1.0: Fundamental Algorithms for Scientific Com- puting in Python,” Nature Methods, vol. 17, pp. 261–272, 2020. [68] A. M. L. Canelas, J. M. C. Guilherme, and N. C. G. Horta, AIDA-C Variation-Aware Circuit Synthesis Tool. Cham: Springer 98 International Publishing, 2020, pp. 155–177. [Online]. Available: https: //doi.org/10.1007/978-3-030-41536-5_5 [69] R. Martins, N. Lourenço, R. Póvoa, A. Canelas, N. Horta, F. Passos, R. Castro-López, E. Roca, and F. Fernández, “Layout-aware challenges and a solution for the automatic synthesis of radio-frequency ic blocks,” in 2017 14th International Conference on Synthesis, Modeling, Analysis and Simula- tion Methods and Applications to Circuit Design (SMACD). Giardini Naxos, Italy: IEEE, 2017, pp. 1–4. [70] T. Kiely and G. Gielen, “Performance modeling of analog integrated circuits using least-squares support vector machines,” in Proceedings Design, Automa- tion and Test in Europe Conference and Exhibition, vol. 1, 2004, pp. 448–453 Vol.1. [71] G. L. Creech et al., “Artificial neural networks for fast and accurate EM- CAD of microwave circuits,” IEEE Transactions on Microwave Theory and Techniques, vol. 45, no. 5, pp. 794–802, 1997. [72] P. Nuzzo, A. Sangiovanni-Vincentelli, X. Sun, and A. Puggelli, “Methodology for the design of analog integrated interfaces using contracts,” IEEE Sensors Journal, vol. 12, no. 12, pp. 3329–3345, 2012. [73] S. Su and M. S. Chen, “A 16-bit 12-GS/s Single-/Dual-Rate DAC With a Suc- cessive Bandpass Delta-Sigma Modulator Achieving <−67-dBc IM3 within DC to 6-GHz Tunable Passbands,” IEEE Journal of Solid-State Circuits, vol. 53, no. 12, pp. 3517–3527, 2018. [74] H. Huang, L. Du, and Y. Chiu, “A 1.2-gs/s 8-bit two-step sar adc in 65-nm cmos with passive residue transfer,” in 2015 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2015, pp. 1–4. [75] D.-R. Oh, K.-J. Moon, W.-M. Lim, Y.-D. Kim, E.-J. An, and S.-T. Ryu, “An 8b 1gs/s 2.55mw sar-flash adc with complementary dynamic amplifiers,” in 2020 IEEE Symposium on VLSI Circuits, 2020, pp. 1–2. [76] S. Baek, I. Jang, M. Choi, H. Roh, W. Lim, Y. Cho, and J. Shin, “10.5 a 12b 600ms/s pipelined sar and 2x-interleaved incremental delta-sigma adc with source-follower-based residue-transfer scheme in 7nm finfet,” in 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, 2021, pp. 172–174. [77] B. Hershberg, B. v. Liempd, N. Markulic, J. Lagos, E. Martens, D. Dermit, and J. Craninckx, “3.6 a 6-to-600ms/s fully dynamic ringamp pipelined adc 99 with asynchronous event-driven clocking in 16nm,” in 2019 IEEE Interna- tional Solid- State Circuits Conference - (ISSCC), 2019, pp. 68–70. [78] J. Liu, S. Su, M. Madhusudan, M. Hassanpourghadi, S. Saunders, Q. Zhang, R. Rasul, Y. Li, J. Hu, A. K. Sharma, S. S. Sapatnekar, R. Harjani, A. Levi, S. Gupta, and M. S.-W. Chen, “From Specification to Silicon: Towards Analog/Mixed-Signal Design Automation using Surrogate NN Models with Transfer Learning,” in 2021 IEEE/ACM International Conference On Com- puter Aided Design (ICCAD), 2021, pp. 1–9. 100
Abstract (if available)
Abstract
Analog and mixed-signal (AMS) computer-aided design (CAD) tools are of increasing interest owing to demand for the wide range of AMS circuit specifications in the modern system on a chip and faster time to market requirement. This thesis presents AMS automatic parameter synthesis methodologies enabled by neural networks and time-based circuit architectures. It explores two primary techniques to enhance the possibility of design through automation. The first is to use digital cells to perform AMS operation, explained in passive pulse shrinking time-based analog to digital converter (ADC). Using only digital cells and passive elements helps to use digital automation tools while achieving comparable performance to state-of-the-art. The second method is to replace slow SPICE evaluations with fast and accurate neural network (NN) models to accelerate the design process. Moreover, a module linking graph (MLG) has been introduced, allowing to characterize circuits in the presence of the interfaces. To enhance the accuracy of the models, circuit connectivity inspired NN was proposed, which is a deep learning model built based on the signal flow graph. Finally, by enabling mentioned techniques, a two-stage SAR ADC in GF 14nm FinFet has been designed. The measurement results show competitive FOM with a significantly fast design procedure.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Towards high-performance low-cost AMS designs: time-domain conversion and ML-based design automation
PDF
Nonuniform sampling and digital signal processing for analog-to-digital conversion
PDF
Digital to radio frequency conversion techniques
PDF
Graph machine learning for hardware security and security of graph machine learning: attacks and defenses
PDF
Compiler and runtime support for hybrid arithmetic and logic processing of neural networks
PDF
Silicon photonics integrated circuits for analog and digital optical signal processing
PDF
Circuit design with nano electronic devices for biomimetic neuromorphic systems
PDF
Energy aware integrated circuits for communication and biomedical applications
PDF
Mixed-signal integrated circuits for interference tolerance in wireless receivers and fast frequency hopping
PDF
A biomimetic approach to non-linear signal processing in ultra low power analog circuits
PDF
Multi-phase clocking and hold time fixing for single flux quantum circuits
PDF
Electronic design automation algorithms for physical design and optimization of single flux quantum logic circuits
PDF
Low-power, dual sampling-rate, shared-architecture ADC for implantable biomedical systems
PDF
Charge-mode analog IC design: a scalable, energy-efficient approach for designing analog circuits in ultra-deep sub-µm all-digital CMOS technologies
PDF
Memristive device and architecture for analog computing with high precision and programmability
PDF
Calibration of digital-to-analog converters in highly-integrated RF transceivers using machine learning
PDF
Radiation hardened by design asynchronous framework
PDF
Efficient graph learning: theory and performance evaluation
PDF
Bidirectional neural interfaces for neuroprosthetics
PDF
Human motion data analysis and compression using graph based techniques
Asset Metadata
Creator
Hassanpourghadi, Mohsen
(author)
Core Title
Analog and mixed-signal parameter synthesis using machine learning and time-based circuit architectures
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2022-05
Publication Date
01/28/2022
Defense Date
01/20/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
analog mixed-signal,analog to digital converter,CAD tool,circuit connectivity inspired neural network,computer-aided design,design automation,hybrid search,module-linking-graph,neural network,OAI-PMH Harvest,passive pulse shrinking,successive approximation register,time to digital converter,time-based ADC
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Chen, Mike Shuo-Wei (
committee chair
), Gupta, Sandeep (
committee member
), Hashemi, Hossein (
committee member
), Nakano, Aiichiro (
committee member
), Nuzzo, Pierluigi (
committee member
)
Creator Email
mhassanp@usc.edu,muhsinhsn@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC110582767
Unique identifier
UC110582767
Legacy Identifier
etd-Hassanpour-10362
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Hassanpourghadi, Mohsen
Type
texts
Source
20220201-usctheses-batch-910
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
analog mixed-signal
analog to digital converter
CAD tool
circuit connectivity inspired neural network
computer-aided design
design automation
hybrid search
module-linking-graph
neural network
passive pulse shrinking
successive approximation register
time to digital converter
time-based ADC