Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Calibration of digital-to-analog converters in highly-integrated RF transceivers using machine learning
(USC Thesis Other)
Calibration of digital-to-analog converters in highly-integrated RF transceivers using machine learning
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CALIBRATION OF DIGITAL-TO-ANALOG CONVERTERS IN HIGHLY-INTEGRATED RF TRANSCEIVERS USING MACHINE LEARNING by Daniel Beauchamp A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2022 Copyright 2022 Daniel Beauchamp Dedicated to my grandfather, Anthony Lalingo. In loving memory. ii Acknowledgements I will rst say that I never expected to move over 2000 miles away from home (Toronto to Los Angeles) to pursue a PhD. I am deeply indebted to Charles Harper (CEO, Jariet Technologies) for the initial encouragement to go down this path. A PhD is a signicant dedication of several years of your life, so while I was hesitant at rst, Charles promised that it would be a highly rewarding journey; now I can see why. The PhD process has helped me grow tremendously as a researcher, an engineer, and as a person in general. None of this would have been possible without my industry sponsor, Jariet Technologies, which provided the nancial support for the full duration of my studies. On the Jariet side, in addition to Charles, I am also highly grateful to Leo Ghazikhanian, Craig Hornbuckle, and Thomas Krawczyk, as they all provided me with top-tier industry training, especially when I was just an early-career engineer. On the USC side, I would rst and foremost like to thank Professor Keith Chugg for being a terric PhD adviser. This dissertation would not have been possible without his invaluable guidance, support, and patience during the past several years. Keith was always available to help whenever I needed it, and it was a pleasure to work with him on so many interesting problems. I am also grateful to Professors Mike Chen, Leana Golubchik, Antonio Ortega, and Peter Beerel, all for being part of my committee and shaping the direction of my research. I would also like to acknowledge Professor Andreas Molisch and his research group for the helpful research advice during the early stages of my PhD. In addition, I would like to thank Diane Demetras and Corine Wong for their administrative support. Finally, heartfelt thanks to my family for their constant love and support, without which this dissertation would not have even been started. iii Table of Contents Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 RF Communication Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Digital Transceivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 High-Speed DACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Summary of Dissertation Contributions . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 Analysis and Calibration of Wideband Times-2 Interleaved CS-DACs 4 1.3.2 Linearization of CS-DACs Using Neural Networks . . . . . . . . . . . 5 1.3.3 Wideband Analysis of Timing Errors in CS-DACs . . . . . . . . . . . 6 1.3.4 Calibration of Analog Dot Products . . . . . . . . . . . . . . . . . . . 6 1.4 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 DAC Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Spectrum Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Binary-Weighted Current-Steering DACs . . . . . . . . . . . . . . . . . . . . 12 2.3 Nonlinearity in Current-Steering DACs . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Static Linearity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Current Source Mismatch . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.3 Dynamic Linearity Metrics . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.4 Timing Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.5 Output Impedance Modulation . . . . . . . . . . . . . . . . . . . . . 23 2.4 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.1 Fully-Segmented CS-DACs . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.2 Partially-Segmented CS-DACs . . . . . . . . . . . . . . . . . . . . . . 27 2.4.3 Dynamic Element Matching . . . . . . . . . . . . . . . . . . . . . . . 28 3 Analysis and Calibration for Wideband Times-2 Interleaved CS-DACs . . . . . . . 29 iv 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Overview of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Gain and C2 Clock Duty Cycle Errors . . . . . . . . . . . . . . . . . . . . . 39 3.4.1 Coupled Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.2 Calibration by Gain Error Cancellation . . . . . . . . . . . . . . . . . 42 3.5 Data Timing Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5.1 Finite Settling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5.2 Finite Settling With Data Timing Errors . . . . . . . . . . . . . . . . 45 3.6 Analytical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6.1 Signal Flow Diagram and Output Spectrum . . . . . . . . . . . . . . 50 3.6.2 Accuracy, Speed, and Utility . . . . . . . . . . . . . . . . . . . . . . . 52 3.7 Calibration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.7.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.7.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.7.3 Measured Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4 Linearization of CS-DACs Using Neural Networks . . . . . . . . . . . . . . . . . . 68 4.1 System Identication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Background Calibration (Future Work) . . . . . . . . . . . . . . . . . . . . . 79 4.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5 Wideband Analysis of Timing Errors in CS-DACs . . . . . . . . . . . . . . . . . . 85 5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2.1 Equivalent Timing Error Model . . . . . . . . . . . . . . . . . . . . . 88 5.2.2 Previous SDR Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.3 Wideband SDR Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6 Calibration of Analog Dot Products . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.1.1 Analog Multiplier Nonlinearity . . . . . . . . . . . . . . . . . . . . . 100 6.2 Analog Dot Product Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.1 Calibration Considerations . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2.2 Calibration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2.3 Counter-Based Correction Factor Computation . . . . . . . . . . . . 110 6.2.4 Calibration Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.3 Calibration Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.4 Extension to 2D Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.4.1 (x;w) Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 v 6.4.2 Additively Separable Calibration . . . . . . . . . . . . . . . . . . . . 119 6.4.3 Hybrid Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.5 Concluding Remarks & Future Work . . . . . . . . . . . . . . . . . . . . . . 124 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.2 Recommendations for Future Work . . . . . . . . . . . . . . . . . . . . . . . 130 A Two-Tone Calibration Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 B Analysis of Data Timing Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 C Approximation Value Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 C.1 Approximation Value Selection . . . . . . . . . . . . . . . . . . . . . . . . . 138 C.2 Approximation Value Computation . . . . . . . . . . . . . . . . . . . . . . . 139 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 vi List of Tables 3.1 Interleaving and Data Timing Errors in times-2 interleaved CS-DACs. . . . . 36 3.2 Parameter distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 vii List of Figures 1.1 Conventional superheterodyne RF transmitter. . . . . . . . . . . . . . . . . . 1 1.2 Block diagram of a RF digital transceiver. . . . . . . . . . . . . . . . . . . . 3 2.1 General DAC behavior illustrating the ZOH concept. . . . . . . . . . . . . . 9 2.2 Mathematical model of an ideal DAC. . . . . . . . . . . . . . . . . . . . . . 9 2.3 DAC output spectrum based on a single tone input with frequency f 0 using (a) NRZ hold pulses and (b) RZ hold pulses. . . . . . . . . . . . . . . . . . . 12 2.4 Circuit diagram of an M-bit binary-weighted CS-DAC with output current I out =I p I n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Transistor-level view of a binary-weighted CS-DAC current cell. . . . . . . . 14 2.6 Transfer characteristics for a 3-bit DAC, (a) linear and (b) nonlinear. . . . . 16 2.7 Transfer characteristic of a CS-DAC that has discontinuities caused by current source errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.8 Output spectrum (rst Nyquist zone) of a DAC that has nonlinear distortion. The red markers are on the harmonics, which are annotated in dBc, and the SFDR is limited by HD3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.9 IMD products for a DAC that has nonlinear distortion. The red markers are on IM3, IM5, and IM7, all of which are annotated in dBc. . . . . . . . . . . 22 2.10 DAC code transitions (a) ideal, (b) with timing errors. . . . . . . . . . . . . 23 2.11 Detailed view of the CS-DAC driver unit cell in the ON state driving a load with resistance R L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.12 Fully-segmented M-bit CS-DAC, C = 2 M 1. . . . . . . . . . . . . . . . . . 26 2.13 Partially-segmented 10-bit CS-DAC, where only the upper two bits of the binary input are thermometer-coded. . . . . . . . . . . . . . . . . . . . . . . 28 3.1 M-bit times-2 interleaved CS-DAC with a sample rate of f s . . . . . . . . . . 32 3.2 (a) Serializer block diagram. (b) Clock and data timing down to the C4 level. 33 3.3 (a) Circuit schematic of a times-2 interleaved CS-DAC. (b) Ideal timing of the C2 clock and a bit slice of each sub-DAC. . . . . . . . . . . . . . . . . . . . . 34 3.4 Interleaving errors. (a) Gain error. (b) C2 clock duty cycle error. (c) C2 clock skew. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.5 Data timing errors. (a) C4 clock duty cycle errors. (b) C4 clock phase errors. 37 3.6 Solutions (; g ) that result in no interleaving spur. . . . . . . . . . . . . . . 41 3.7 Calibration by gain error cancellation ( g =0:018). . . . . . . . . . . . . . 42 3.8 (a) Single bit slice of the even sub-DAC. (b) C2 clock. (c) Output of the even sub-DAC. (d) Settling error for the even sub-DAC, e even (t). . . . . . . . . . . 44 viii 3.9 Data timing errors for the even sub-DAC (a) C4 clock duty cycle error. (b) C4 clock phase error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.10 Behavioral simulations vs. theory (a) C4 clock duty cycle errors ( I = Q = 0:02). (b) C4 clock phase errors ( I ==180; Q ==180). (c) Output PSD from a simulation that includes both C4 clock duty cycle errors ( I = Q = 0:02) and I/Q imbalance ( I ==180; Q ==180), =T s = 0:2. . 49 3.11 Signal ow diagram of the analytical model for interleaving and data timing errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.12 (a) Histogram ofR =C model =C simulation in dB based on 500 randomly generated test vectors. (b) Power and phase accuracy of the analytical model for the spurs generated by a tone at f 1 = 0:4f s (relative to behavioral simulations). . 53 3.13 Wideband SFDR results (a) Two-tone calibration with tones at cal,0 = 0:05 and cal,1 = 0:4. (b) Single-tone calibration with a tone at cal = 0:24. . . . . 58 3.14 Overlapped parameter trajectories for 20 runs of simulated annealing (a) Two- tone calibration. (b) Single-tone calibration. . . . . . . . . . . . . . . . . . . 59 3.15 Narrowband SFDR results (a) Two-tone calibration with tones at cal,0 = 0:05 and cal,1 = 0:4. (b) Single-tone calibration with a tone at cal = 0:24. . . . . 60 3.16 Modern transceiver used to demonstrate the simulated annealing calibration algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.17 Two-tone and single-tone simulated annealing calibration (a) Maximum of the C2 and C4 image spurs in dBc (from Table 3.1). Losses from the test board, cables, and balun have been de-embedded from the measurements. Raw output spectrum comparison (using a Keysight N9040B UXA Signal Analyzer) with fundamental tones at (b) f 0 = 8:54GHz. (c) f 0 = 16.08GHz. 64 4.1 Block diagram illustrating the DPD concept, where the inverse of the DAC static transfer characteristic is stored in a LUT. . . . . . . . . . . . . . . . . 69 4.2 (a) Block diagram of the DAC-to-ADC system, (b) System identication using a dataset to determine model parameters , (c) DAC-to-ADC system model with input x n and output ^ y n . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3 Polynomial vs. NN regression in the vicinity of a discontinuity for a CS-DAC behavioral model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Single layer MLP with 1 input node, H hidden nodes, and 1 output node. . . 73 4.5 Two-tone FFT comparison before and after NN-based DPD. The signal fre- quencies are f 1 = 3:1GHz, f 2 = 3:2GHz with amplitudes -12dBFS/tone and the DAC is sampling at f s = 40:96GS/s. . . . . . . . . . . . . . . . . . . . . 74 4.6 Test bench with the high-speed DAC and ADC test board. . . . . . . . . . . 76 4.7 DAC output spectrum without DPD, f s = 40:96GS/s, f sig = 100MHz. . . . . 76 4.8 DAC output spectrum with NN-based DPD, f s = 40:96GS/s, f sig = 100MHz. 77 4.9 IM3/IM5/IM7 performance across Nyquist for two-tone signals, -12dBFS/tone (-6dBFS total amplitude), 100 MHz spacing. . . . . . . . . . . . . . . . . . . 78 4.10 IM3/IM5/IM7 performance across Nyquist for two-tone signals, -18dBFS/tone (-12dBFS total amplitude), 100 MHz spacing. . . . . . . . . . . . . . . . . . 79 4.11 Multi-LUT background calibration. . . . . . . . . . . . . . . . . . . . . . . . 81 ix 4.12 Single layer TDNN with L + 1 input nodes, N H hidden nodes, and 1 output node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.13 Adaptive DPD via online training of a TDNN. . . . . . . . . . . . . . . . . . 82 4.14 DAC system identication using two dierent training signals { random codes (red) and low-frequency sine waves (green). . . . . . . . . . . . . . . . . . . . 83 5.1 Error e(t) based on equivalent timing errors T (n). The charge error Q (n) is the area under the n th error pulse. . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2 SDR analysis vs. simulation for a single tone input, f 0 =f s = 0:01 { the simulated data points are comprised of 50 independent runs of the behavioral model, where 95% condence intervals are shown. . . . . . . . . . . . . . . . 93 5.3 SDR analysis vs. simulation over frequency, =T s 3 10 3 { the simulated data points are comprised of 50 independent runs of the behavioral model, where 95% condence intervals are shown. . . . . . . . . . . . . . . . . . . . 94 5.4 Simulation of the wideband SDR versus thermometer bits forM-bit CS-DACs (M = 8; 10; 12) { the simulated data points are comprised of 50 independent runs of the behavioral model, where 95% condence intervals are shown. . . 95 5.5 Squared error for various code transitions x n1 ! x n in an M-bit CS-DAC, f 0 =f s 0:11, =T s 3 10 3 . For each case, the nominal switching instant nT s is aligned with t = 0, (a) M = 3 and (b) M = 6. . . . . . . . . . . . . . 96 6.1 Analog dot product engine from [63]. . . . . . . . . . . . . . . . . . . . . . . 99 6.2 Basic operation of an analog multiplier. . . . . . . . . . . . . . . . . . . . . . 101 6.3 Simulated I-V response of an analog multiplier in GF 12LP. . . . . . . . . . 102 6.4 Analog multiplier error in GF 12LP, extracted from a simulation and t to a 4 th order polynomial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.5 Analog dot product calibration (a) Traditional approach. (b) Proposed approach.104 6.6 ZOH approximation of a reference multiplier error with non-uniform quantiza- tion (NUQ) { approximation values are selected by evaluating the polynomial t at the midpoint of each bin. . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.7 ZOH approximation of a reference multiplier error with NUQ { approximation values are selected using the methodology in Appendix C.1. . . . . . . . . . 107 6.8 Partition of the unit interval via the state vector in (6.4). . . . . . . . . . . . 108 6.9 Neighbor state selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.10 ZOH approximation of a reference multiplier error using the methodology outlined in Section 6.2.2 for dierent threshold counts (N = 8; 16; 32; 64). . . 112 6.11 Cost function (6.6) vs. number of simulated annealing iterations for dierent threshold counts (N = 8; 16; 32; 64). . . . . . . . . . . . . . . . . . . . . . . . 113 6.12 NUQ vs. uniform quantization cost comparison. . . . . . . . . . . . . . . . . 113 6.13 Multiplier nonlinearity mismatch due to process variation (M = 1024 multi- pliers). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.14 Calibrated vs. uncalibrated MAC array. . . . . . . . . . . . . . . . . . . . . 116 6.15 Simulation of calibrated vs. uncalibrated analog dot products. . . . . . . . . 116 6.16 (x;w)-calibration applied to a MAC array. . . . . . . . . . . . . . . . . . . . 119 6.17 Additively separable calibration approach applied to a MAC array. . . . . . . 121 x 6.18 NN system for deriving additive separable approximations. . . . . . . . . . . 122 6.19 ASA approximations of various targete ref (x;w) (a) additively separable case. (b) weak dependence on w. (c) multiplicatively separable case. . . . . . . . . 123 6.20 Hybrid calibration approach applied to a MAC array. . . . . . . . . . . . . . 124 xi List of Acronyms ADC analog-to-digital converter AMUX analog multiplexer ASA additively separable approximation CS Current-Steering DAC digital-to-analog converter DDC digital down-converter DEM dynamic element matching DnC Deep-n-Cheap DNL dierential nonlinearity DPD digital pre-distortion DUC digital up-converter FFT Fast Fourier Transform GS/s gigasamples-per-second IC integrated circuit IF intermediate frequency IMD intermodulation distortion INL integral nonlinearity LO local oscillator LSB least signicant bit LUT Look Up Table MAC multiplier-accumulator MLP Multilayer Perceptron mmWave millimetre wave MSB most-signicant-bit MSE mean squared error NCO numerically controlled oscillator xii NN Neural Network NRZ non-return-to-zero NUQ non-uniform quantization OSR oversampling ratio PLL phase-locked loop PSD power spectral density ReLu rectied linear unit RF radio-frequency RMS root mean square RZ return-to-zero SDR signal-to-distortion ratio SFDR spurious-free dynamic range SGD Stochastic Gradient Descent SNR signal-to-noise ratio SPI Serial Peripheral Interface TDNN time-delay neural network THD total harmonic distortion TIDAC time-interleaved DAC ZOH zero-order-hold xiii Abstract RF communication systems have undergone a paradigm shift, which is in part due to the advancement of data converter technology, where sample rates are now several gigasamples- per-second (GS/s) with high resolution in compact, deep-submicron processes. This allows the converters to be placed closer to the antenna in a \direct RF" sampling conguration. Such a conguration is the preferred implementation of future communication systems, as it results in the highest system performance, lowest power consumption, and lowest cost. In direct RF sampling, operations such as ltering, up-conversion, and down-conversion are moved into the digital domain, which eliminates the need for costly, power-hungry, and bulky analog components. Moreover, transmitters and receivers following this simplied architecture may coexist on a single RF transceiver Integrated Circuit (IC). The primary focus of this dissertation is on modern digital-to-analog converters (DACs) in highly-integrated RF transceiver ICs. First, we provide a detailed analysis of the signicant spectral impairments associated with wideband, times-2 interleaved current-steering DACs (CS-DACs). The analysis leads to the proposal of a calibration algorithm based on machine learning, which, unlike single-tone approaches, does not suer from narrowband locking (i.e., where the calibration algorithm is only eective near the calibration frequency). We run extensive simulations of this algorithm by developing and leveraging a fast, accurate analytical model of the spectral impairments. In addition, we demonstrate the algorithm experimentally on a commercial RF transceiver IC in 14nm CMOS. Consistent with the trend toward high integration, the transceiver also contains a high-speed analog-to-digital converter (ADC), an embedded MCU, and programmable control over the spectral impairments; we show how these can be used to support the calibration of the DAC. Beyond DAC analysis and calibration, this dissertation also shows how a similar machine learning approach can be applied to analog dot products, which are starting to receive signicant attention for low-power, hardware implementations of neural networks (NNs). xiv Publications D. Beauchamp and K. M. Chugg, \Analysis and Calibration for Wideband Times-2 Inter- leaved Current-Steering DACs," in IEEE Transactions on Circuits and Systems I: Regular Papers, 2022. D. Beauchamp and K. M. Chugg, \Improved Analysis of Current-Steering DACs Using Equivalent Timing Errors," in IEEE International Symposium on Circuits and Systems (ISCAS), 2022. D. Beauchamp and K. M. Chugg, \Linearization for High-Speed Current-Steering DACs Using Neural Networks," in IEEE Latin American Symposium on Circuits and Systems (LASCAS), 2021. D. Beauchamp and K. M. Chugg, \Machine Learning Based Image Calibration for a Twofold Time-Interleaved High Speed DAC," in IEEE 62nd International Midwest Sym- posium on Circuits and Systems (MWSCAS), 2019. xv Chapter 1 Introduction 1.1 RF Communication Systems There has been signicant progress in the design and deployment of radio-frequency (RF) communication systems over the past several decades. This has given rise to several useful applications, including cellular communication, electronic warfare, and automotive radar. One of the most important discoveries is the heterodyne principle, which is used by virtually all RF communication systems. This allows a desired signal to be frequency shifted by multiplying it with a sinusoidal waveform (or local oscillator (LO)). For example, this is used in the superheterodyne transmitter [1] illustrated in Figure 1.1, where the baseband in-phase and quadrature components are upconverted to an intermediate frequency (IF). It is again used in an IF to RF conversion so that the desired signal can be transmitted via an antenna with practical dimensions. I Q 0 90 LO 1 LO 2 PA A GC IF RF D A C D A C Figure 1.1: Conventional superheterodyne RF transmitter. 1 1.2 Digital Transceivers Conventional RF communication systems have undergone a paradigm shift. A major reason for this is owed to the advancement of data converter technology, where sample rates are now several gigasamples-per-second (GS/s) with high resolution in compact, deep-submicron processes. This allows the converters to be mated with the antenna in a conguration known as direct RF sampling. Such a conguration is the preferred implementation for future com- munication systems, as it results in highest system performance, lowest power consumption, and lowest cost [2]. In direct RF sampling, operations such as ltering, up-conversion, and down-conversion are moved to the digital domain [3]. This simplies the transmit and receive paths considerably, i.e., it eliminates the need for costly, power hungry, and bulky analog components such as mixers, local oscillators, and bandpass lters. Moreover, transmitters and receivers following this simplied architecture may co-exist on a single integrated circuit (IC) known as a digital transceiver, as shown in Figure 1.2. This is especially useful for large phased array systems, where several radiating elements are required to transmit and receive simultaneously [4]. While originally used in the defense industry, phased arrays are now seeing light in commercial applications such as 5G cellular communication [5], automotive radar [6], and advanced military radar [7]. 1.2.1 High-Speed DACs The high-speed digital-to-analog converter (DAC) shown in Figure 1.2 is one of the most critical parts of the digital transceiver, and it is the main focus of this dissertation. Designing DACs that can reliably transmit high-speed data requires careful analog design, good layout 2 ADC PLL DAC DDC DUC Digital Transceiver Die RF Front End Die Digital Transceiver REFCLK T/R SW Focus of this work PA LNA Baseband TX Data Baseband RX Data Figure 1.2: Block diagram of a RF digital transceiver. practices, and robust calibration. In this dissertation, we assume that the DAC is part of a digital transceiver operating in the millimetre wave (mmWave) regime, i.e., we consider sample rates of at least 30 GS/s, which is consistent with the aforementioned phased array applications. The typical approach to achieving such high sample rates is to time interleave lower-speed DACs. However, nonidealities in the sub-DACs and mismatch between them (e.g. gain, oset, and timing) corrupts the Nyquist band with undesired spectral content. Unfortunately, this degrades wideband exibility, i.e., the ability to faithfully utilize the full Nyquist band, which is critical for systems that rely on, for example, software-dened or cognitive radio. 1.3 Summary of Dissertation Contributions The contributions of this dissertation can be divided into four topics: 1) Analysis and Cali- bration of Wideband Times-2 Interleaved Current-Steering (CS)-DACs, 2) Linearization of CS-DACs Using Neural Networks, 3) Wideband Analysis of Timing Mismatch in CS-DACs 3 and 4) Calibration of Analog Dot Products. Topics 1-3 are all related to CS-DACs, and Topic 4 explores how the calibration methodology in Topic 1 can be applied to analog dot products. In this section, we outline the contributions for each topic. 1.3.1 Analysis and Calibration of Wideband Times-2 Interleaved CS-DACs 1. We present a detailed study of the narrowband locking phenomenon, i.e., where the DAC error mechanisms (parameters) can cancel each other out when a single tone is used in calibration mode (i.e., as used in the previous methods), resulting in solutions that are only eective near the calibration frequency. Moreover, we analytically show that the use of two tones in calibration mode eliminates narrowband locking for gain and clock duty cycle errors, which leads to the proposal of a calibration algorithm that is eective over the full Nyquist band. In addition, we provide simulation results that show the proposed two-tone calibration algorithm eliminates narrowband locking even when other parameters are considered (e.g., clock skew and data timing errors). 2. We present a detailed analysis of data timing errors, i.e., nonidealities associated with the serializer clocks. While previously neglected, data timing errors are becoming more dominant in modern high-speed DACs due to the reduced timing margin at higher sample rates. This is an important contribution to the DAC analysis space, since such eects can in fact limit performance. 3. We develop an analytical model that includes, from our understanding, the most signif- icant interleaving and data timing errors. The model was validated against behavioral 4 simulations and shown to be extremely accurate with a run time that is four orders of magnitude faster. Since the analytical model is highly accurate in all regions of practical interest, it may be used in place of behavioral simulations in many cases. For example, it could be used to explore the circuit design space by mapping design tolerances to spectral impairments and making design trade-os across circuit compo- nents. The analysis is especially useful when extensive exploration or experimentation is desired since it is much faster than simulation. For example, we are able to run extensive experiments of our calibration algorithms by using the analytical expressions which would otherwise be impossible to conduct via simulations. 4. We demonstrate the proposed two-tone calibration algorithm on a modern, commercially-developed transceiver chip that contains a times-2 interleaved DAC, oper- ating at 40GS/s in 14nm CMOS. The transceiver also contains a high-speed analog-to- digital converter (ADC), an embedded MCU, and programmable control over various impairments, which is consistent with the trend toward integrated, system-on-chip (SoC) implementations. We show how these can be leveraged as calibration support for the DAC, which is more practical than previous methods that rely on o-chip measurements of the DAC output with a spectrum analyzer [8] and manual tuning [9]. 1.3.2 Linearization of CS-DACs Using Neural Networks 1. We propose a novel foreground calibration algorithm for static nonlinearity in CS-DACs using Neural Networks (NNs). 5 2. We demonstrate the proposed algorithm experimentally on a modern CS-DAC and showed that it can outperform two linearization approaches from the prior art. 3. We pave the way for extending the proposed NN-based approach to run in background mode, i.e., to compensate for environmental variations (e.g. temperature drifts). 1.3.3 Wideband Analysis of Timing Errors in CS-DACs 1. We leverage an analytical model from previous work [10] to conduct an analysis of timing mismatch on the wideband performance of CS-DACs (i.e., where we consider distortion terms beyond the rst Nyquist zone in the analysis). 2. We conduct simulations that show the wideband analysis in this dissertation is signi- cantly more accurate compared to the previous work. 1.3.4 Calibration of Analog Dot Products 1. We propose a calibration algorithm for multiplier nonlinearity in analog dot products that presents a signicant reduction in complexity compared to one of the traditional methods. 2. We simulated the proposed algorithm via multiplier nonlinearity that was extracted from circuit-level simulations in GF 12LP. 3. While the details of the proposed algorithm were presented for cases in which the multiplier nonlinearity is a function of only one variable, we also describe how the results can be extended to the 2D case (i.e., where the nonlinearity is a function of 6 both multiplying factors). We propose three dierent 2D calibration methods that are tailored to the characteristics of the target nonlinearity. 1.4 Dissertation Organization The rest of this dissertation is organized as follows. Chapter 2 provides background informa- tion on the key DAC concepts that are relevant to this work. Chapter 3 presents analysis and calibration for wideband, times-2 interleaved CS-DACs. Chapter 4 presents linearization for CS-DACs using NNs. Chapter 5 presents wideband analysis of timing errors in CS-DACs. Chapter 6 presents calibration of analog dot products. Finally, we conclude the dissertation in Chapter 7 by providing a summary of the results along with recommendations for future work. 7 Chapter 2 Background 2.1 DAC Overview 2.1.1 Basic Operation A high-level overview of the DAC is illustrated in Figure 2.1, where the discrete-time input sequence x n is mapped to a continuous-time output, y(t). The input sequence may be interpreted as a sampled, continuous-time target signal (depicted by the transparent blue curve in Figure 2.1). Each input samplex n is latched by a clock and held for the duration of the clock period,T s . Therefore, the DAC exhibits zero-order-hold (ZOH) behavior, which is the simplest form of interpolation. In practice, the input samples are represented by M-bit words, where M is the DAC resolution. 2.1.2 Spectrum Analysis The DAC performance for a particular application is often assessed by considering the output spectrum, Y (f), i.e., Y (f) = Z 1 1 y(t)e j2ft dt (2.1) which is the Fourier transform of the DAC output y(t). A mathematical model of an ideal 8 x n y(t) T s DAC T s Figure 2.1: General DAC behavior illustrating the ZOH concept. Figure 2.2: Mathematical model of an ideal DAC. DAC is illustrated in Fig. 2.2, which can be used to derivey(t). Note from Fig. 2.2 that the DAC input sequence is modeled as a continuous-time impulse train, which is modulated by the target signal and convolved with a rectangular function h(t) = rect t T s (2.2) 9 where rect(t) = 8 > > > < > > > : 1; jtj 1 2 0; jtj> 1 2 : (2.3) to produce a ZOH or \staircase-like" eect at the output. Hence, the output of the ideal DAC in Fig. 2.2 is y(t) = x(t) 1 X k=1 (tkT s ) ! h(t) (2.4) and proceeding with the Fourier transform of (2.4) we have Y (f) =f s X(f) 1 X k=1 (fkf s ) ! H(f) =f s H(f) 1 X k=1 X (fkf s ) = sinc(fT s ) 1 X k=1 X(fkf s ) (2.5) where sinc(f) := sin(f) f (2.6) There are two principles in (2.5) that are worth highlighting. First, the output spectrum Y (f) consists of spectral replicas (or images) of the input spectrumX(f), spaced by integer multiples of the sample rate, f s . These images, except for the one at k = 0, are often 10 undesirable and require ltering to suppress. In addition, the output spectrum follows a sinc prole due to the ZOH behavior. Both of these concepts are illustrated in Figure 2.3 for a target signal with Fourier transform X(f) = (ff 0 ) +(f +f 0 ), where f 0 is the fundamental frequency. Second, it is worth emphasizing that the output spectrum depends on the duration of the hold pulse,h(t). The spectrum shown in Figure 2.3(a) assumesh(t) is dened by (2.2). In this case, h(t) is referred to as an non-return-to-zero (NRZ) hold pulse, since its support covers the entire duration of the clock period, T s . In contrast, a return- to-zero (RZ) pulse may be used, meaning its support covers only a fraction of the clock period. The Fourier transform of such a pulse, H RZ (f), is shown in Figure 2.3(b), where its corresponding time-domain signal is nonzero for only half of the clock period. Note that this RZ pulse extends the bandwidth by a factor of 2 compared to its NRZ counterpart in Figure 2.3(a), however, it results in a 6dB output power loss due to the time/frequency scaling property of the Fourier transform [11]. In some cases, the bandwidth extension oered by RZ pulses may be useful when operation in higher Nyquist zones is desired [12]. 11 Images Fundamental f = f 0 /f s Figure 2.3: DAC output spectrum based on a single tone input with frequency f 0 using (a) NRZ hold pulses and (b) RZ hold pulses. 2.2 Binary-Weighted Current-Steering DACs While there are several DAC architectures available, the CS-DAC is regarded as the \de facto solution" at gigahertz frequencies [13]. A block diagram of an M-bit CS-DAC is shown in Figure 2.4, where it is modeled as an array of current sources with complementary switching. Moreover, each current source is comprised of a parallel combination of \unit elements" and, without loss of generality, we assume that these unit elements carry a weight of I u =2. For example, the current source corresponding to the m th bit is comprised of 2 m unit elements, so it has a value of 2 m1 I u . For this reason, this architecture is also referred to as a binary- weighted CS-DAC. The contribution of the m th current source to I out is controlled by the data switches (b m ; b m )2f(0; 1); (1; 0)g. Specically, a current of 2 m1 I u is steered to the 12 b 0 b 0 b 1 I u /2 ... b M-1 b M-1 I p I u 2 M-2 I u I n b 1 Current Cell Figure 2.4: Circuit diagram of an M-bit binary-weighted CS-DAC with output current I out =I p I n . positive output ifb m = 1 and to the negative output ifb m = 0. The ideal steady-state output current I out =I p I n for the inputfb m g M1 m=0 is I out =I u M1 X m=0 2 m1 (2b m 1) (2.7) We collectively refer to a current source and its corresponding data switches as a current cell, e.g, as outlined in Fig. 2.4. A transistor-level view of them th current cell is shown in Figure 2.5. Note that the unit current source is comprised of cascode transistors, 2 m of which are stacked in parallel in order to synthesize the total current for the m th bit. Furthermore, the complementary data switches are implemented as a dierential-pair. Sometimes, additional transistors are placed in cascode with the data switches to isolate the output node from switching eects. 13 Data Switches out_p out_n ... b m b m Unit Current Source 2 m Figure 2.5: Transistor-level view of a binary-weighted CS-DAC current cell. 2.3 Nonlinearity in Current-Steering DACs Linearity is one of the most signicant factors in evaluating the performance of a CS-DAC. Ideally, the transfer characteristic from input code to output current follows a linear relation- ship. Any deviation from such a relationship causes the output spectrum to have undesired frequency content. It is therefore important to study the errors behind DAC nonlinearity so that calibration algorithms can be developed. Broadly, these errors are divided into two categories { static and dynamic. We interpret static errors as those that stem from mem- oryless, nonlinear time-invariant systems. In a DAC framework, this means that they are independent of both sample rate and input frequency. For CS-DACs, static errors are mainly caused by mismatch between the current sources. In contrast to static errors, dynamic errors 14 are frequency dependent, and include any eect that disturbs the output node during code transitions. We interpret them as stemming from nonlinear, time-invariant systems with memory. Dynamic errors are important to study since they limit the high frequency perfor- mance of the CS-DAC [14]. 2.3.1 Static Linearity Metrics Two of the most common DAC linearity metrics are integral nonlinearity (INL) and dieren- tial nonlinearity (DNL). Before dening these metrics, we rst dene a couple of preliminary quantities. First is the notion of full-scale current, I FS , which is dened as the maximum peak-to-peak DAC output current. Referring to (2.7), we have I FS = I u (2 M 1) since code 00 0 yields a minimum current of Iu 2 (2 M 1) and 11 1 a maximum current of Iu 2 (2 M 1). If the DAC is excited by a sine wave that triggers these extreme values, i.e., resulting in an amplitude of A FS = I FS =2, then we say that the DAC is operating at \full-scale". For data converters, it is very common to report results in dBFS (dB relative to full-scale), and this is computed as 20 log 10 (A=A FS ) for a sine wave with amplitude A. For example, full-scale and half of full-scale correspond to 0dBFS and -6dBFS, respectively. Another important quantity is the least signicant bit (LSB), which is dened as = I out (2 M 1)I out (0) 2 M 1 (2.8) whereM is the DAC resolution andI out (n) is the output current that corresponds to input code n2 0;:::; 2 M 1 . The LSB is intuitively interpreted as the \average step size" (or gain), and it is a normalization factor for several important quantities, including INL and 15 Δ INL 0 1 0 I FS Figure 2.6: Transfer characteristics for a 3-bit DAC, (a) linear and (b) nonlinear. DNL. For the ideal CS-DAC illustrated in Figure 2.4, we have =I u , since a unit change in input results in an output change ofI u in all cases. In Figure 2.6(a), we illustrate the transfer characteristic for an ideal 3-bit DAC withI FS and annotated on the plot. The data on this curve is from (2.7) and normalized by I u . An example of a nonlinear transfer characteristic is shown in Figure 2.6(b). To quantify the nonlinearity, we introduce the concept of INL, which is dened as INL(n) = I out (n)I ideal (n) ; n = 0;:::; 2 M 1 (2.9) 16 whereI ideal (n) is an ideal reference that connects the endpoints of the transfer characteristic, as illustrated by the blue line in Figure 2.6(b). Lastly, note that it is quite common on DAC datasheets to see INL quoted as max n INL(n). DNL is another common linearity metric that quanties the deviation of each step size relative to 1 LSB, and it is dened as DNL(n) = I out (n)I out (n 1) ; n = 1;:::; 2 M 1 (2.10) ad by convention we dene DNL(0) = 0. Lastly, it is worth mentioning that INL is the cumulative sum of DNL, i.e., INL(n) = n X j=0 DNL(n); n = 0;:::; 2 M 1 (2.11) 2.3.2 Current Source Mismatch In practice, the current sources in Figure 2.4 dier from their ideal binary weights. In a CMOS process, this is a consequence of transistor W=L and V T variation. We model this eect by augmenting the binary weights in (2.7) with fractional errors m as follows I out =I u M1 X m=0 2 m1 (1 + m ) (2b m 1) =I u M1 X m=0 2 m1 (2b m 1) | {z } Ideal output +I u M1 X m=0 m 2 m1 (2b m 1) | {z } Error term (2.12) 17 Figure 2.7: Transfer characteristic of a CS-DAC that has discontinuities caused by current source errors. Note that (2.12) is the sum of the ideal DAC output (2.7) and an error term. Furthermore, linearity is only impacted when there is mismatch between m . It is observed from (2.12) that if m are all equal, then the DAC transfer characteristic is only adjusted by a scaling factor. Current source mismatch impacts the DAC transfer characteristic by causing discontinu- ities [13]. For example, referring to Figure 2.4, if all current sources are ideal, incrementing the binary input code by 1 produces an output current increase of I u in all cases. How- ever, if, for example, the current source corresponding to the most-signicant-bit (MSB) is 2 M2 I u (1+ M1 ), the transition from input code 011 1 to 100 0 will produce a change in output current of I u (1 + M1 2 M1 ) instead of the ideal value of I u . In Figure 2.7, we illustrate a region of a CS-DAC transfer characteristic that has discontinuities caused by current source mismatch. 18 2.3.3 Dynamic Linearity Metrics While INL and DNL quantify static performance, other linearity metrics are required to characterize the DAC at higher frequencies. Some examples include spurious-free dynamic range (SFDR), total harmonic distortion (THD), and intermodulation distortion (IMD), all of which are covered in this section. Several key metrics can be extracted from the DAC output spectrum when the DAC is excited by one or more sine waves. Ideally, the output spectrum based on these types of inputs only contains energy at the input frequencies. However, this is not true in practice due to nonlinear distortion. Nonlinear distortion manifests itself as harmonics when the DAC is excited by a single tone. Specically, if the input tone frequency is f 0 , then energy appears not only at f 0 , but also at 2f 0 , 3f 0 , and so on. This is shown in Figure 2.8, which illustrates the output spectrum of a DAC that exhibits nonlinear distortion. Typically, harmonics are measured in units of dBc (dB relative-to-carrier), and we refer to the n th harmonic as HDn. We use THD to quantify harmonic distortion, i.e., THD = q P 1 n=2 V 2 rms, HDn V rms, f 0 (2.13) where V rms, HDn and V rms, f 0 are the root mean square (RMS) values of HDn and the funda- mental, respectively. Note that in practice, the summation in the numerator covers only the dominant harmonics. Another important specication is SFDR, which is the power ratio of the fundamental and the highest spur in the rst Nyquist band (from dc to f s =2). It is typically measured in dBc, and it is useful to specify the corresponding limiting spur. For example, in Figure 19 SFDR = 51.2 dBc Figure 2.8: Output spectrum (rst Nyquist zone) of a DAC that has nonlinear distortion. The red markers are on the harmonics, which are annotated in dBc, and the SFDR is limited by HD3. 2.8, the SFDR is 51.2dBc and it is being limited by HD3. Note that this spur is not always harmonically related to the fundamental, since other clock-related or interleaving spurs may dominate. In a sampled system, any frequency that falls outside the rst Nyquist band will alias back in. Specically, the alias frequency f alias for a tone with frequency f sampled at f s is f alias = 8 > > > < > > > : f; ifb f fs=2 c is even f s =2 f; otherwise (2.14) where f = mod (f;f s =2). An example of this folding eect for HD10 is shown in Figure 2.8, where the fundamental f 0 = 3:1GHz and f s = 40:96GS/s. In this case, we have f = 20 31GHz, which falls outside the rst Nyquist band (i.e.,f >f s =2) and applying (2.14) results in f alias 10GHz. Another common linearity metric is IMD, which is derived from a two-tone excitation. Ideally, the resulting spectrum has energy at only the input frequencies. However, in a nonlinear system, sidebands are present at deterministic frequency osets. Intuition for this comes from a simple time-domain analysis of a two-tone input to a nonlinear system. For example, consider a two-tone signal x(t) = x 0 (t) +x 1 (t) as an input to the following static nonlinear system y(t) =a 1 x(t) +a 3 x 3 (t) (2.15) where x 0 (t) = cos(2f 0 t) and x 1 (t) = cos(2f 1 t). In this case, (2.15) can be written as y =a 1 x(t) +a 3 (cos(2f 0 t) + cos(2f 1 t)) 3 =a 1 x(t) +a 3 x 3 0 (t) +x 3 1 (t) + 3x 2 0 (t)x 1 (t) + 3x 2 1 (t)x 0 (t) (2.16) and it is straightforward to verify that energy appears atf 0 ; f 1 ; 3f 0 ; 3f 1 ; 2f 0 f 1 , and 2f 1 f 0 by applying the product-to-sum trigonometric identities to (2.16). The frequency content at 2f 0 f 1 and 2f 1 f 0 are referred to as IM3 products, since they come from third-order terms. In general, there are higher-order terms, and the corresponding IM 2n+1 products appear at (n + 1)f 0 nf 1 and (n + 1)f 1 nf 0 (n = 1; 2;::: ). In Figure 2.9, we annotate IMD products for a nonlinear DAC that is excited by a two-tone signal. Similar to SFDR, these products are also measured in dBc. 21 IM3 = -45 dBc Figure 2.9: IMD products for a DAC that has nonlinear distortion. The red markers are on IM3, IM5, and IM7, all of which are annotated in dBc. 2.3.4 Timing Errors One key dynamic error to consider is timing errors between the current cells [15]. Ideally, all current cells re simultaneously at each switching instant, e.g., this is shown in Figure 2.10(a), where a 3-bit CS-DAC undergoes a perfect step-function transition 000! 111. However, simultaneous ring of the current cells is not guaranteed in practice due to propagation delay mismatch in the clock and data paths. This non-ideal case is illustrated in Figure 2.10(b), where the current cells re at dierent times { this results in a staircase-like output (black trace) as the DAC undergoes code transitions, and we overlay what this looks like through a lowpass lter (blue trace). These errors cause harmonics in the frequency domain that can degrade SFDR, especially for high input frequencies. 22 0.0 0.2 0.4 0.6 0.8 1.0 Time (normalized) 0.0 0.2 0.4 0.6 0.8 1.0 DAC Output (normalized) 0.0 0.2 0.4 0.6 0.8 1.0 Time (normalized) 0.0 0.2 0.4 0.6 0.8 1.0 Figure 2.10: DAC code transitions (a) ideal, (b) with timing errors. 2.3.5 Output Impedance Modulation Another dynamic error to consider for CS-DACs is output impedance modulation. In CS- DACs, the output impedance depends on the input code. Therefore, if the DAC is driving a load, then the output voltage is modulated by the input sequence, which causes nonlinear distortion. This is understood by observing Figure 2.11, which illustrates a detailed view of the unit current cell. In addition, a load resistance R L is also shown. 23 V DD R L R L M 2 M 1 M 3 M 4 - V out + Z off (s) Z on (s) Figure 2.11: Detailed view of the CS-DAC driver unit cell in the ON state driving a load with resistance R L . We depict the case where M 1 is on and M 2 is o. Note that Z on (s) has nite output impedance, which depends ther ds and drain capacitance of M1, M3, and M4 { an expression forZ on (s) is derived in [16]. Also note thatZ o (s) =1 since M2 is in the o state. For clarity of the exposition, we consider a CS-DAC with 2 M 1 equally-weighted current sources 1 and 1 We refer to this as a fully-segmented CS-DAC and describe this architecture in more detail in Section 2.4.1. 24 assume that n of these current sources are connected to the positive output node. The output voltage is then V out (s) =I u R L n 1 +n R L Zon(s) I u R L 2 M 1n 1 + (2 M 1n) R L Zon(s) (2.17) From (2.17), we observe that the output voltage is a nonlinear function of the input code, n. Note that the nonlinearity in (2.17) is a result of impedance mismatch between Z on (s) and Z o (s). Specically, if the input code isn, then the impedance looking into the positive out- put is Zon(s) n , and the impedance looking into the negative output is Zon(s) 2 M 1n . This imbalance causes the denominators in (2.17) to dier. If instead Z on (s) =Z o (s), then (2.17) becomes V out (s) =I u R L n 1 + (2 M 1) R L Zon(s) I u R L 2 M 1n 1 + (2 M 1) R L Zon(s) (2.18) =I u R L 2n (2 M 1) 1 + (2 M 1) R L Zon(s) (2.19) which is in fact a linear function of n. Small bleed currents can be employed to reduce the eect of output impedance modulation by keeping M1 and M2 on at all times [17], thus enforcing Z on (s)Z o (s). 2.4 Segmentation Dierent CS-DAC architectures have been explored in order to reduce the impairments discussed in Section 2.3.2 (i.e., current source mismatch and timing errors). The objective of this section is to cover two such architectures { fully segmented and partially segmented. 25 t 0 t 0 t 1 I u /2 ... t C-1 t C-1 I p I n t 1 Binary/Thermometer Decoder ... b 0 b 1 b M-1 ... t 0 t 0 t 1 t C-1 I u /2 I u /2 Figure 2.12: Fully-segmented M-bit CS-DAC, C = 2 M 1. In addition, we discuss a common technique that leverages these architectures to improve linearity. 2.4.1 Fully-Segmented CS-DACs A block diagram of anM-bit fully-segmented CS-DAC is shown in Fig. 2.12. It is modeled as an array ofC = 2 M 1 current cells, each weighted byI u =2 with complementary switching. A binary-to-thermometer decoder is used to map the binary input code b M1 b 0 to a thermometer code t C1 t 0 . The number of ones in the decoder output is equal to the decimal representation of its binary input, e.g., for a 2-bit DAC, 00 ! 000, 01! 001, 10! 011, and 11! 111. Each current source can be steered to either the positive or 26 negative output, depending on the input code. In the absence of nonidealities, the steady- state output of the fully-segmented CS-DAC is I out = I u 2 C1 X m=0 (2t m 1) (2.20) One benet of the fully-segmented architecture is that any unit change in input code results in only one current source switching, which is a consequence of the thermometer decoding. Note that this generally improves linearity at the cost of area and routing-related issues [18], particularly for high-resolution DACs [13] (e.g., a 10-bit fully-segmented DAC would require 1023 driver cells, which is generally not practical). 2.4.2 Partially-Segmented CS-DACs To alleviate the area and routing-related issues in fully-segmented architectures, design- ers often use partial-segmentation [13]. In this case, only a subset of the binary input is thermometer-coded. For example, this is shown in Figure 2.13 for a 10-bit CS-DAC where only the upper two MSBs (b 9 b 8 ) are thermometer coded (t 2 t 1 t 0 ). Typically, the upper-most bits are thermometer-coded since they have the largest eect on the DAC output. The output of the DAC in Fig. 2.13 in steady-state is I out = 128I u 2 X m=0 (2t m 1) +I u 7 X m=0 2 m1 (2b m 1) (2.21) 27 t 0 t 0 t 1 128I u t 2 t 2 I p I n t 1 Binary/Thermometer Decoder b 9 b 8 t 0 t 1 t 2 b 7 b 7 b 6 64I u ... b 0 b 0 32I u I u /2 b 6 128I u 128I u b 7 b 6 ... b 0 Binary Input Binary-Weighted Elements Unit Elements Figure 2.13: Partially-segmented 10-bit CS-DAC, where only the upper two bits of the binary input are thermometer-coded. 2.4.3 Dynamic Element Matching Segmented architectures provide the blueprint for dynamic element matching (DEM), which is a technique used to correct analog errors that cause nonlinear distortion [19], [20], [21]. The idea of DEM is to randomize over the permutations of the thermometer-coded elements, which eectively \averages out" mismatch errors. For example, if b 9 b 8 = 01 in Figure 2.13, then one thermometer coding is t 2 t 1 t 0 = 001. However, the permutations t 2 t 1 t 0 = 010 and t 2 t 1 t 0 = 100 also work, since they have the same eect on the DAC output. A digital circuit clocked at the same sample rate as the DAC is required to perform the randomization. Lastly, note that DEM converts nonlinear distortion into white noise [22]. Therefore, if the analog mismatch is high to begin with, then this technique can raise the noise oor considerably. 28 Chapter 3 Analysis and Calibration for Wideband Times-2 Interleaved CS-DACs This chapter presents analysis and calibration of interleaving and data timing errors that are encountered in modern times-2 interleaved DACs with a CS architecture. Such errors corrupt the DAC output spectrum with spectral images that require calibration. We develop an analytical model for the interleaving and data timing errors that we understand are most signicant and propose a calibration algorithm that treats all of them. Extensive simulations of the algorithm are made possible by leveraging the speed and accuracy of the analytical model. The algorithm is demonstrated on a commercially-developed 10-bit times-2 interleaved CS-DAC, operating at 40GS/s in 14nm CMOS. 3.1 Motivation Direct RF sampling paves the way for several interesting applications, including 5G/6G cellular communication, electronic warfare, and automotive radar. Typically, these appli- cations require data converters with sample rates in the mmWave regime. Designing such 29 converters at the full rate is challenging, so the typical approach is to time interleave lower speed sub-converters. Without calibration, nonidealities in the sub-converters and mismatch between them corrupts the Nyquist band with undesired spectral content. This impedes the ability to use the full Nyquist band, which is critical for systems that rely on, for example, software-dened or cognitive radio. The discussion in this chapter is limited to times-2 interleaving. While research on time-interleaved DACs (TIDACs) with higher interleaving factors exists [23], it is not very common, since interleaving by 2 typically provides enough timing margin for the settling of the sub-DACs [24]. However, in modern times-2 interleaved DACs, incomplete settling is possible due to the reduced timing margin at higher sample rates. Hence, we consider nite settling in the analysis and, under this framework, show how data timing errors create spurs. Interleaving errors have been discussed in prior works [8,9,24,25]. For example, calibra- tion algorithms for gain and clock duty cycle are presented in [24] and [8]. In [24], the DAC is excited by a single tone near Nyquist and the resulting interleaving spur is driven to zero via duty cycle control. In [8], simulated annealing is applied with a single tone at the desired calibration frequency. In this chapter, we show that calibration with single tone inputs can result in narrowband solutions, i.e., where the calibration is eective near the calibration frequency but not over the full Nyquist band. This phenomenon, which we refer to as nar- rowband locking, is not observable when these errors are analyzed in isolation of each other as in [9, 24], however, it becomes clearer when they are analyzed in a coupled framework as in [26]. Using the coupled analysis, we prove that calibration with two tones leads to wideband solutions, i.e., those that are eective across the full Nyquist band. Moreover, we propose a two-tone calibration algorithm based on simulated annealing and show that it is 30 eective even when additional nonidealities are present (e.g., clock timing skew and data timing errors). We also develop an analytical model that accounts for the interleaving and data timing errors that we understand are most signicant. The analytical model is then validated against behavioral simulations and proven to be extremely accurate with a run time that is four orders of magnitude faster. We then propose a calibration algorithm that treats all of the errors considered in this chapter. Extensive simulations of the algorithm are made possible by leveraging the speed and accuracy of the analytical model. The algorithm is then demonstrated on a modern, commercially-developed transceiver chip that contains a times- 2 interleaved DAC, operating at 40GS/s in 14nm CMOS. The transceiver also contains a high-speed ADC, an embedded MCU, and programmable control over various impairments, which is consistent with the trend toward system-on-chip (SoC) implementations. We show how these can be leveraged as calibration support for the DAC, which is more practical than previous methods that rely on o-chip measurements of the DAC output with a spectrum analyzer [8] and manual tuning [9]. The rest of the chapter is organized as follows. Section 3.2 provides background on times- 2 interleaved CS-DACs. Section 3.3 provides a summary of the signicant interleaving and data timing errors. Section 3.4 focuses on the coupled analysis of gain and duty cycle errors. Section 3.5 focuses on the analysis of data timing errors. Section 3.6 presents an analytical model that accounts for all of the errors considered in this chapter. Section 3.7 presents the proposed calibration algorithm, along with simulations and experimental results. Finally, we conclude the chapter in Section 3.8. 31 AMUX Serializer x[0] x[2] x[4] 2T s 2T s C2 clock 2T s Output Node Dump Node x[1] x[3] Even sub-DAC M M K x M Low-speed Input data x[5] Odd sub-DAC Figure 3.1: M-bit times-2 interleaved CS-DAC with a sample rate of f s . 3.2 Background A block diagram of an M-bit times-2 interleaved DAC with a sample rate of f s = 1=T s and input sequence x[k] is illustrated in Fig. 3.1, where T s is the sample period. A half-rate clock (C2 clock 1 ) is passed to a serializer and analog multiplexer (AMUX). The serializer multiplexes low-speed, parallel data lanes into high-speed even and odd lanes at f s =2, and the AMUX is used to toggle the sub-DACs between the output node and a dump node. A common serializer with K inputs per bit slice 2 is shown in Fig. 3.2(a), and the nominal timing of clock and data signals down to the C4 level is shown in Fig. 3.2(b). Note that the C4 clocks (i.e., C4I, C4Q) are oset by 90 , or delayed byT s relative to each other. The high-speed data lanes, D2E and D2O, are also delayed byT s relative to each other since they are aligned with C4I and C4Q, respectively. 1 We use the notation CN and DN to denote clock and data signals at f s =N, respectively. 2 Examples with K = 4 and K = 8 are found in [27] and [28], respectively. 32 D4_0 2:1 D2E D4_2 D4_1 2:1 D4_3 D2O C4I (0°) 4T s x[2] x[6] D4_2 x[0] x[4] x[8] D4_0 x[10] D2E x[0] x[2] x[4] x[6] x[8] 2T s T s C4I (0°) C4Q (90°) (a) (b) x[3] x[7] D4_3 x[1] x[5] x[9] D4_1 D2O x[1] x[3] x[5] x[7] x[9] C4Q (90°) Φ C2 K Parallel Data Lanes ÷2 C4 Φ M bit slices K:4 ... Phase rotator Phase rotator Figure 3.2: (a) Serializer block diagram. (b) Clock and data timing down to the C4 level. A circuit schematic of a TIDAC with a CS architecture is shown in Fig. 3.3(a). Note that the AMUX uses opposite phases of the C2 clock to toggle the sub-DACs between the output node (V p , V n ) and the dump node, V dump . Moreover, current is drawn from these nodes via current cells that are comprised of binary-weighted current sources and complementary data switches. Fig 3.3(b) shows the ideal timing of the C2 clock and sub-DAC data, where there is a timing margin of T s =2 to avoid clock and data edge overlap. 33 x[0] x[2] x[1] x[3] C2 clock Even sub-DAC (D2E) Odd sub-DAC (D2O) x[4] 2T s output T s output output dump output dump output dump dump dump dump n p V dump Even sub-DAC current cells V p p n 2T s C2 clock p n p Odd sub-DAC current cells p n p n n AMUX M D2E 2 m I u R L R dump R L AVDD V n D2O D2O 2 m I u M D2E (a) (b) Timing margin = T s /2 Figure 3.3: (a) Circuit schematic of a times-2 interleaved CS-DAC. (b) Ideal timing of the C2 clock and a bit slice of each sub-DAC. Ideally, the contribution of each sub-DAC to the output node is y even (t) = x(t) 1 X k=1 (t 2kT s ) ! h even (t) (3.1a) y odd (t) = x(t) 1 X k=1 (t (2k + 1)T s ) ! h odd (t) (3.1b) 34 where x(t) is the analog target signal, h even (t) = h odd (t) = rect(t=T s ) is the hold pulse, rect(t) = 1 ifjtj 1=2 and 0 ifjtj > 1=2, and * denotes convolution. The ideal TIDAC output spectrum is derived by summing the Fourier transforms of (3.1a) and (3.1b), which is Y ideal (f) = 1 2 sinc(fT s ) 1 X k=1 X(fkf s =2) + 1 2 sinc(fT s ) 1 X k=1 X(fkf s =2) (1) k (3.2a) = sinc(fT s ) 1 X k=1 X(fkf s ) (3.2b) where sinc(f) := sin(f)=(f). Note from (3.2a) that spectral images at odd multiples of f s =2 cancel each other out, resulting in a spectrum identical to that of a single ideal DAC with a sample rate of f s . 3.3 Overview of Errors The perfect cancellation of undesired spectral images only occurs for an ideal TIDAC. In practice, there are various errors that cause imperfect image cancellation. The errors that we consider are summarized in Table 3.1 and depicted in Figs. 3.4 and 3.5. From our experience with times-2 interleaved CS-DACs, these are the most signicant errors associated with interleaving and data timing. Under this framework, we show that the TIDAC output 35 Nonideality Spectral Image Locations Spur Locations (1st Nyquist Zone) Analysis Parameter Gain error (2k + 1)f s =2 f s =2f 0 Section 3.4 g C2 clock duty cycle error (2k + 1)f s =2 f s =2f 0 Section 3.4 C2 clock skew (2k + 1)f s =2 f s =2f 0 Presented in [9] skew C4 clock duty cycle errors (4k + 1)f s =4 f s =4f 0 , f s =4 +f 0 if f 0 2 (0;f s =4) Section 3.5 I ; Q (4k + 3)f s =4 3f s =4f 0 , f 0 f s =4 if f 0 2 (f s =4;f s =2) (2k + 1)f s =2 f s =2f 0 C4 clock phase errors (2k + 1)f s =2 f s =2f 0 Section 3.5 I , Q Table 3.1: Interleaving and Data Timing Errors in times-2 interleaved CS-DACs. 1 1+ ε g T s + 2 αT s T s - 2 αT s 2T s h even (t) h odd (t) Δθ skew 180° (a) Gain error (b) C2 clock duty cycle error (c) C2 clock skew p n n h even (t) h odd (t) Figure 3.4: Interleaving errors. (a) Gain error. (b) C2 clock duty cycle error. (c) C2 clock skew. spectrum is, in general, described by Y (f) =K 0 (f) 1 X k=1 X(fkf s ) +K fs=2 (f) 1 X k=1 X(f (2k + 1)f s =2) +K fs=4 (f) 1 X k=1 X(f (4k + 1)f s =4) +K 3fs=4 (f) 1 X k=1 X(f (4k + 3)f s =4) (3.3) Ideally,K 0 (f) = sinc(fT s ) and the rest of the coecients are zero, as shown in (3.2b). With errors present, the coecients K fs=2 (f), K fs=4 (f) and K 3fs=4 (f) may be nonzero, which results in spurs when the DAC is excited by a tone. This can degrade performance metrics, such as SFDR, thus motivating calibration. We collectively refer to the errors in Fig. 3.4 as 36 2T s + 4 β I T s 2T s - 4 β I T s ΔΦ I C4Q 2T s + 4 β Q T s 2T s - 4 β Q T s C4I ΔΦ Q (a) C4 clock duty cycle errors (b) C4 clock phase errors C4I C4Q Figure 3.5: Data timing errors. (a) C4 clock duty cycle errors. (b) C4 clock phase errors. interleaving errors. As mentioned, these include gain mismatch between the sub-DACs and nonidealities on the clock that toggles between them (i.e., the C2 clock). Figs. 3.4(a) and (b) illustrate how gain and C2 clock duty cycle errors distort the sub-DAC hold pulses. In general, this creates images at odd multiples of f s =2 (i.e, K fs=2 (f)6= 0). While the analysis of these errors has been studied in isolation [24], [9], their coupled analysis has received little attention aside from the appendix in [26]. In Section 3.4, we show how the coupled analysis uncovers critical information that helps design a robust calibration algorithm. Another error that results in K fs=2 (f)6= 0 is C2 clock skew, i.e., where the C2 clock phases do not have a perfect 180 oset, as shown in Fig. 3.4(c). The analysis of C2 clock skew is presented in [9], so it is excluded here. However, we do include this eect in our analytical model (Section 3.6) and in the simulations of our calibration algorithm (Section 3.7). We collectively refer to the errors in Fig. 3.5 as data timing errors. These include nonidealities associated with the serializer clocks, i.e., C4I and C4Q in Fig. 3.2(b), as these clocks are aligned with the sub-DAC data. The serializer in Fig. 3.2(a) may also rely on lower rate clocks (C8, C16, etc.), but in contrast to the C4 clocks, these lower rate clocks are not aligned with the high-speed data lanes, which makes their nonidealities less critical. The analysis in Section 3.5 shows how C4 clock duty cycle and phase errors creates spurs that 37 depend on nite settling of the sub-DACs. In general, such errors can result inK fs=2 (f)6= 0, K fs=4 (f)6= 0, and K 3fs=4 (f)6= 0. The AMUX that toggles between the sub-DACs is an important part of the TIDAC, so its nonidealities are worth discussing. Such nonidealities can result in either interleaving spurs or nonlinearity. Interleaving spurs are caused by dynamic mismatch in the AMUX [24], e.g., C2 clock duty cycle error and C2 clock skew. As mentioned, we include these eects in the analysis and calibration since they are within the scope of this chapter. To isolate the interleaving eects (as in [9]), we assume that the toggling between the sub-DACs is a linear operation, which allows the TIDAC output to be modeled as the sum of the individual sub-DAC contributions. The linearity of the AMUX is determined by the circuit design and, specically, the choice of device for the switches outlined in Fig. 3.3(a). While circuit design is beyond the scope of this dissertation, it is worth mentioning that the choice of these devices has been discussed in detail [24], [9], particularly regarding linearity tradeos with bandwidth and output swing. Clock jitter is another eect known to distort the DAC output spectrum, and the analysis of this eect has been studied in [29]. In addition, modern phase-locked loops (PLLs) are capable of synthesizing clocks (i.e. the C2 clock) with very low integrated jitter (e.g. in the tens of femtoseconds-rms range [30]). Fine tuning of the PLL loop parameters (charge pump current, loop lter resistor, etc.) can reduce jitter even further [31]. For these reasons, we do not futher consider the eects of clock jitter. To simplify the presentation, we analyze the eect of various errors in isolation and then present an analytical model that includes all of them. Specically, the analysis of errors in this chapter are organized as follows. Section 3.4 focuses only on the coupled analysis of 38 gain and C2 clock duty cycle errors, i.e., with all other nonidealities equal to zero. Similarly, Section 3.5 focuses only on data timing errors. Finally, we develop an analytical model in Section 3.6 that accounts for all of the errors in Table 3.1. 3.4 Gain and C2 Clock Duty Cycle Errors Gain and C2 clock duty cycle errors are discussed in [24], [9] and SFDR expressions are derived for each in isolation of all other parameters. The work in [32] extends the analysis of gain errors to general times-N interleaved DACs, also in an isolated environment. In this section, we analyze gain and duty cycle errors in a coupled framework, which is motivated by the fact that they can cancel each other out, as mentioned in the appendix of [26]. We show that this can cause narrowband locking, i.e., where the calibration locks onto a single frequency solution with non-ideal parameters. Moreover, we propose a calibration signal that is immune to this eect. 3.4.1 Coupled Analysis In this section, we only consider errors in gain and C2 clock duty cycle. Specically, we analyze model (3.1) with hold pulses dened as h even (t) = rect t T s (1 + 2) (3.4a) h odd (t) = (1 + g ) rect t T s (1 2) (3.4b) 39 where g and are the gain and duty cycle errors, respectively. Under this framework, it can be shown that the output spectrum is of the form in (3.3) with coecients K 0 (f) = f s 2 (H even (f) +H odd (f)) (3.5a) K fs=2 (f) = f s 2 (H even (f)H odd (f)) (3.5b) where H even (f) =T s (1 + 2) sinc(fT s (1 + 2)) (3.6a) H odd (f) = (1 + g )T s (1 2) sinc(fT s (1 2)) (3.6b) In general, since K fs=2 (f)6= 0, we observe from (3.3) that the output spectrum contains images at odd multiples of f s =2. Hence, if the DAC is excited by a tone at f 0 , then the image at f s =2 creates an interleaving spur at f s =2f 0 , and the resulting SFDR is SFDR = 20 log 10 K 0 (f 0 ) K fs=2 (f s =2f 0 ) (3.7) By inspection of (3.5b) and (3.6), we have K fs=2 (f) 0 if g = = 0, i.e., the images vanish for all frequencies when both gain and C2 clock duty cycle errors are zero. However, if either 6= 0 or g 6= 0, then this is no longer the case. For example, suppose the DAC is 40 Increasing frequency Slope = Figure 3.6: Solutions (; g ) that result in no interleaving spur. excited by a tone at f 0 , creating an interleaving spur at f s =2f 0 . This spur is zero when K fs=2 (f s =2f 0 ) = 0, and solving this amounts to writing g in terms of as g (; 0 ) = (1 + 2) sinc ((1=2 0 ) (1 + 2)) (1 2) sinc ((1=2 0 ) (1 2)) 1 (3.8) 4( 1 2 0 ) tan ( 1 2 0 ) (3.9) where 3 0 =f 0 =f s , and a small approximation has been used. In Fig. 3.6, we plot (3.9) for various frequencies 0 2 [0; 0:5]. Note that each frequency admits a distinct family of solu- tions (; g (; 0 )) that satisesK fs=2 (f s =2f 0 ) = 0, i.e., no interleaving spur. Furthermore, the solutions are linear near the origin with slopes as indicated in Fig. 3.6. The magnitude of the steepest slope is 4, which is found by taking the limit of (3.9) as 0 approaches 0.5. 3 When convenient, we use normalized frequency =f=f s . 41 Figure 3.7: Calibration by gain error cancellation ( g =0:018). 3.4.2 Calibration by Gain Error Cancellation The cancellation of the interleaving spur with nonzero parameters is important to consider for calibration. For example, for the calibration approach in [24], the TIDAC is excited by a single tone at f cal = f s =2 so that the interleaving spur appears at DC (f s =2f cal ). A low-speed auxiliary ADC is then used in tandem with on-chip duty cycle control to drive this spur to zero, cancelling out the gain error. We refer to this calibration approach as gain error cancellation. Fig. 3.7 illustrates the SFDR (from (3.7)) over the Nyquist band after applying gain error cancellation with dierent calibration frequencies. Note that, in all cases, the SFDR monotonically decreases away from the calibration frequency. This is due narrowband locking, since the calibration algorithm locks onto a single frequency solution, (; g )6= (0; 0). We say that gain error cancellation exhibits deterministic narrowband locking, since such locking occurs every time the algorithm is run for g 6= 0. In Section 3.7, we explore another 42 single-tone calibration algorithm that exhibits random narrowband locking, since it occurs randomly as opposed to every time. Narrowband locking is problematic for wideband systems where the calibration should hold over the full Nyquist band. In Appendix A, we prove that calibration with two tones promotes convergence to the wideband solution, (; g ) = (0; 0). Furthermore, an algorithm that uses two tones as the input in calibration mode is proposed in Section 3.7. 3.5 Data Timing Errors The sub-DAC data timing is aligned with the C4 clocks, as illustrated in Fig. 3.2(b). Therefore, duty cycle and phase errors on these clocks aects the data timing. Again, we refer to these as data timing errors and show that can they can create spurs when nite settling is considered. In this section, to isolate the eect of data timing errors, we assume that there are no interleaving eects, i.e., we assume an ideal C2 clock with no sub-DAC gain error. For clarity of the exposition, we explain the key concepts using the even sub-DAC and leave most of the detailed derivations to Appendix B. 3.5.1 Finite Settling First, we focus only on nite settling in the absence of all other nonidealities. In Fig. 3.8(a), we depict a scenario where the even sub-DAC undergoes a data transition, focusing on a single bit slice. 4 The shaded regions dene the times during which the sub-DAC is connected to the dump node. Note that there is a timing margin of T s =2 for the sub-DAC 4 We assume that there is no timing skew between the bit slices, which means that they all re simulta- neously. Calibration for this impairment is covered in [33]. 43 x[2k-2] x[2k] C2 clock Even sub-DAC bit slice Even sub-DAC output 2T s T s T s /2 Settling Error e even (t) (a) (b) (c) (d) 2T s 0 T s /2 Ideal transition Figure 3.8: (a) Single bit slice of the even sub-DAC. (b) C2 clock. (c) Output of the even sub-DAC. (d) Settling error for the even sub-DAC, e even (t). to settle to its desired value on this node. After this settling period, the C2 clock, shown in Fig. 3.8(b), routes the sub-DAC to the output node. Ideally, the sub-DAC output changes instantaneously at each switching instant, as shown by the green waveform in Fig. 3.8(c). In practice, the sub-DAC current sources change gradually according to a time constant, , determined by the load [34]. Under this framework, the settling error 5 at the output node is e even (t) = x(t) 1 X k=1 (t 2kT s ) ! p(t) (3.10) where x(t) =x(t 2T s )x(t) (3.11) p(t) =e (t+Ts)= h(t) (3.12) 5 In the analysis, we assume that the settling errors are negligibly small just prior to each data transition, i.e., e 2Ts= 0, which implies 2T s . 44 2T s + Δt x[0] x[2] x[4] x[6] 2T s - Δt Δt/2 Δt/2 2T s x[0] x[2] x[4] x[6] 2T s C4 clock duty cycle error Δt C4 clock phase error (a) (b) Δt = 4 β I T s Δt = ( ΔΦ I / 2 π) 4T s Figure 3.9: Data timing errors for the even sub-DAC (a) C4 clock duty cycle error. (b) C4 clock phase error. with hold pulse h(t) = rect(t=T s ). We refer to (3.12) as the settling pulse, i.e., the hold pulse weighted by a decaying exponential. An expression similar to (3.10) may be derived for the odd sub-DAC, e odd (t), and the total settling error at the output node is then e(t) = e even (t)+e odd (t). By taking the Fourier transform ofe(t), it can be shown that this results in an output spectrum (3.3) where all coecients except for K 0 (f) are zero. Hence, this does not result in any in-band spurs for sinusoidal inputs. However, it does result in a mild output power reduction, e.g., a 1dB reduction (excluding the sinc attenuation) for f 0 = 0:26f s and =T s = 0:3. 3.5.2 Finite Settling With Data Timing Errors While nite settling alone does not create spurs, it does when combined with data timing errors (as we prove in Appendix B). For example, referring to Fig. 3.9(a), C4 clock duty cycle errors I cause samples x[4k] to re early by 2 I T s and samples x[4k + 2] to re late 45 by the same quantity. Similarly, referring to Fig. 3.9(b), C4 clock phase errors I delay samplesx[2k] by 2 I T s . We also assign these errors to the odd sub-DAC, i.e., Q and Q for C4Q in Fig. 3.2(b). In Appendix B, we show that duty cycle errors I ; Q and phase errors I ; Q result in an output spectrum (3.3) with the following coecients K 0 (f) = sinc(f=f s ) + f s 4 (c + ( I ; I ) +c + ( Q ; Q )) M(f)P (f) (3.13a) K fs=2 (f) = f s 4 (c + ( I ; I )c + ( Q ; Q ))M(f)P (f) (3.13b) K fs=4 (f) = f s 4 (c ( I ; I ) +jc ( Q ; Q )) M(f +f s =4)P (f) (3.13c) K 3fs=4 (f) = f s 4 (c ( I ; I )jc ( Q ; Q )) M(f +f s =4)P (f) (3.13d) where c + (u;v) = 2 cosh 2T s u e 2Ts v (3.14a) c (u;v) = 2 sinh 2T s u e 2Ts v (3.14b) and M(f) = 2 sin(2fT s )e j(2fTs+=2) (3.15) P (f) = 2e Ts 1= +j2f sinh T s 2 1 +j2f (3.16) 46 Note that the output spectrum contains images at odd multiples off s =2 whenc + ( I ; I )6= c + ( Q ; Q ), as implied by (3.13b). This occurs when I 6= Q (duty cycle mismatch) and/or I 6= Q (I/Q imbalance). Also note that there are images at odd multiples of f s =4, as implied by (3.13c) and (3.13d). These are nonzero when I 6= 0 and/or Q 6= 0, i.e., duty cycle error on one or both C4 clocks. These undesired spectral images create spurs when the DAC is excited by a tone at f 0 , as summarized by the last two rows in Table 3.1. The coecients in (3.13) can be used to compute the power ratio of these spurs relative to an input tone at f 0 . For example, if f 0 2 (0;f s =4), thenI fs=4f 0 (f 0 ) = 20 log 10 K fs=4 (fs=4f 0 ) K 0 (f 0 ) for the spur atf s =4f 0 , and similar quantities may be derived for the other spurs. In Fig. 3.10(a), we compare I fs=4f 0 (f 0 ) with that obtained from a behavioral simulation of a 10-bit TIDAC with only C4 clock duty cycle errors ( I = Q = 0:02). Note that the simulation and theory are closely matched over frequency with dierent=T s ratios, except for the extreme cases. Specically, when=T s = 0:05 (very small), the spur power is below the quantization noise oor and quantization eects are ignored in the analysis since they do not provide additional insight. When =T s = 1:1 (very large), the assumption in the analysis that 2T s is less accurate, resulting in an approximation error of 2.1dB. In Fig. 3.10(b), we observe qualitatively similar results for the f s =2f 0 spur where only I/Q imbalance is considered ( I 6= Q ). Fig. 3.10(c) illustrates the output power spectral density (PSD) from a behavioral simulation that includes both C4 clock duty cycle errors and I/Q imbalance. Lastly, there are two observations from Figs. 3.10(a) and (b) that are worth mentioning. First is that the spur power increases with=T s and, intuitively, this is because the magnitude of the settling errors also increases with =T s . Second, the spur power peaks near f s =4 and 47 is smallest near DC and Nyquist, i.e., the spurs are shaped by sin(2), and it can be shown that this is caused by the sine term in (3.15). 48 ν 0 1/4 – ν 0 1/4 + ν 0 1/2 – ν 0 2.1dB 4.9dB (a) (b) (c) Simulation Theory Figure 3.10: Behavioral simulations vs. theory (a) C4 clock duty cycle errors ( I = Q = 0:02). (b) C4 clock phase errors ( I ==180; Q = =180). (c) Output PSD from a simulation that includes both C4 clock duty cycle errors ( I = Q = 0:02) and I/Q imbalance ( I ==180; Q ==180), =T s = 0:2. 49 3.6 Analytical Model In this section, we develop an analytical model that captures all of the nonidealities in Table 3.1. We rst state the model and then highlight its speed and accuracy by comparing it to Matlab-based behavioral simulations. 3.6.1 Signal Flow Diagram and Output Spectrum The signal ow diagram in Fig. 3.11 captures all of the nonidealities in Table 3.1. As developed in Section 3.4, gain and C2 clock duty cycle errors ( g and ) are accounted for in the hold pulses that are dened in (3.4). In addition, we have included a parameter that accounts for C2 clock skew, t skew = skew T s . The details of the analysis of data timing errors are contained in Appendix B and summarized in the lower portion of Fig. 3.11, i.e., + + + β I , ΔΦ I + + Gain and C2 clock duty cycle errors ε g , α C2 clock skew ΔΘ skew = π/T s Δt skew TIDAC output Hold Pulses Data timing errors β Q , ΔΦ Q Settling Pulses Figure 3.11: Signal ow diagram of the analytical model for interleaving and data timing errors. 50 they are embedded in the constantsc 0 ;c 1 ;c 2 ;c 3 (referring to (B.4) and (B.6)). Moreover, the settling pulses are dened as p even (t) =e (t+Ts)= rect t T s (1 + 2) (3.17a) p odd (t) = (1 + g )e (t+Ts)= rect t t skew T s (1 2) (3.17b) where gain and C2 clock errors are now included. Using Fig. 3.11, it can be shown that the output spectrum is of the form (3.3) with coecients K 0 (f) = f s 2 H even (f) +H odd (f)e j2fTs skew + f s 4 M(f) " c + ( I ; I )P even (f) +c + ( Q ; Q )P odd (f) # (3.18a) K fs=2 (f) = f s 2 H even (f)H odd (f)e j2fTs skew + f s 4 M(f) " c + ( I ; I )P even (f)c + ( Q ; Q )P odd (f) # (3.18b) K fs=4 (f) = f s 4 M(f +f s =4) " c ( I ; I )P even (f)jc ( Q ; Q )P odd (f) # (3.18c) K 3fs=4 (f) = f s 4 M(f +f s =4) " c ( I ; I )P even (f) +jc ( Q ; Q )P odd (f) # (3.18d) 51 where c + (u;v), c (u;v) are dened in (3.14), and M(f) is dened in (3.15). The Fourier transforms of the hold pulses,H A (f) andH B (f), are given by (3.6). Finally, it can be shown that the Fourier transforms of the settling pulses dened in (3.17) are P even (f) = 2 exp Ts 1= +j2f sinh T s 2 1 +j2f (1 + 2) (3.19a) P odd (f) = (1 + g ) 2 exp Ts skew Ts +j2fT s 1= +j2f sinh T s 2 1 +j2f (1 2) (3.19b) 3.6.2 Accuracy, Speed, and Utility We now evaluate the analytical model by comparing it to behavioral simulations of a 10-bit times-2 interleaved CS-DAC. The simulations have an oversampling ratio (OSR) = 8192, i.e., one sample period,T s , is represented by 8192 samples. The large OSR is so that we can capture timing-related errors that are a small fraction of T s , which pertains to all errors in Table 3.1 with the exception of gain errors. Furthermore, nite settling of the sub-DACs is modeled using rst-order Butterworth lowpass lters (=T s = 0:3). Table 3.2 lists the parameters that are common to the analytical model and behav- ioral simulations. We assume that they are normally distributed with mean-zero and vari- ance 2 . Nominally, we ensure that a 3 error results in a wideband SFDR of 35dB for the behavioral simulations, e.g., according to Table 3.2, a gain error of 2.4% guarantees min f2(0;fs=2) SFDR(f) = 35dB when all other parameters are zero. To this end, the values of are found one at a time via heuristic tuning of each parameter. Also note that I = 0 is xed, since varying Q is sucient to model I/Q imbalance (as inferred from Fig. 3.5(b)). 52 C4 Image Spurs C2 Image Spur (a) (b) Figure 3.12: (a) Histogram ofR =C model =C simulation in dB based on 500 randomly generated test vectors. (b) Power and phase accuracy of the analytical model for the spurs generated by a tone at f 1 = 0:4f s (relative to behavioral simulations). We use the distributions in Table 3.2 to generate 500 random test vectors (of dimension 6), where each will serve as a common input to the analytical model and behavioral simulation for a comparison. Specically, for each test vector, we compute C = 10 log 10 power sum of the spurious tones power sum of the input tones (3.20) for a two-tone excitation with frequencies (f 0 ;f 1 ) = (0:05f s ; 0:4f s ). For the behavioral sim- ulations, (3.20) is derived from the DAC output PSD. The PSD bins that correspond to 53 Parameter N (0; 2 ) Description g 0.008 Gain error 0.002 C2 clock duty cycle error skew 0.017 C2 clock skew I 0.009 C4 clock duty cycle error (even sub-DAC) Q 0.009 C4 clock duty cycle error (odd sub-DAC) Q 0:037 C4 clock phase error (odd sub-DAC), I = 0 (xed) Table 3.2: Parameter distributions. the spurious tones can be derived from the spur locations in Table 3.1. For the analytical model, we compute (3.20) using (3.18), e.g., the power of the interleaving spur atf s =2f 1 is 1 2 K 2 fs=2 (f s =2f 1 ). Fig. 3.12(a) illustrates a histogram ofR =C model =C simulation in dB, which is the ratio of (3.20) from the analytical model to that obtained via simulation. Note that the mean of [R] dB is approximately 0dB with a low spread, which highlights the accuracy of analytical model for (3.20). In addition, the computations of (3.20) took 13.3 hours for the behavioral simulations on a modern workstation, while the analytical model took only 0.83 sec (5:7 10 4 times faster). The analytical model is fast since computations of (3.20) simply require the use of (3.18). In contrast, the behavioral simulations require oversampling, lter- ing, and PSD computations using Fast Fourier Transforms (FFTs). Fig. 3.12(b) highlights the power and phase accuracy of the analytical model for the spurs generated by the tone at f 1 = 0:4f s . Only 2 out of the 500 test vectors had parameters that were too small for the simulation to accurately resolve all spurs (without more oversampling), so these were discarded from the histograms in Fig. 3.12(b). Although omitted for brevity, we observed similar results for the tone at f 0 = 0:05f s . The minor dierences observed in Fig. 3.12 may be explained by quantization and FFT windowing eects that are present in the simulations but not in the analysis. 54 Since the analytical model is highly accurate in all regions of practical interest, it may be used in place of behavioral simulations in many cases. For example, it could be used to explore the circuit design space by mapping design tolerances to spectral impairments and making design trade-os across circuit components. The analysis is especially useful when extensive exploration or experimentation is desired since it is much faster than simulation. For example, in Section 3.7, we consider calibration algorithms and are able to run extensive experiments by using the analytical expressions which would otherwise be impossible to conduct via simulations. 3.7 Calibration Algorithm In this section, we propose a calibration algorithm that suppresses the spurs in Table 3.1. The algorithm is rst described and then simulated using the analytical model from Section 3.6. After discussing the simulated results, we demonstrate its ecacy on a real, commercially- developed 10-bit TIDAC operating at 40 GS/s in 14nm CMOS. 3.7.1 Algorithm Description We seek to design a wideband calibration algorithm, i.e., one that suppresses the spurs in Table 3.1 for frequencies over the Nyquist band. Hence, during calibration mode, we excite the DAC with a two-tone signal to avoid narrowband locking, which is motivated by the results in Section 3.4 and Appendix A. In many modern transceivers, we have control over the parameters in Table 3.1. For example, gain control has been demonstrated by adjusting the bias voltage in the DAC current cells [35]. The data timing may be adjusted using phase 55 rotators, as shown in Fig. 3.2(a) and demonstrated in [36]. The authors in [9] adjust duty cycle and clock skew for a times-2 interleaved DAC using circuits similar to those in [37]. Therefore, we propose to solve the following integer programming problem s = arg min s C(s) (3.21) with cost function as in (3.20) C(s) = 10 log 10 power sum of the spurious tones power sum of the input tones (3.22) wheres is a vector of integers that map to the parameter control settings. Although there are seven controllable parameters in Table 3.1 ( g ,, skew ; I ; Q ; I ; Q ), it is sucient to control only six of them. Specically, we can drop either I or Q , since only one of these is needed to correct the I/Q imbalance (as inferred from Fig. 3.5(b)). Hence, s2SR 6 is comprised of the parameters in Table 3.2, whereS is the parameter search space. Note that it is generally infeasible to solve (3.21) via a brute-force search overS. For example, if 5-bit control is used for each parameter, thenS is comprised of (2 5 ) 6 = 2 30 possible vectors. Instead, we propose to use simulated annealing [38], since it promotes convergence to the global optimum for problems with a large, discrete search space. The simulated annealing algorithm is outlined in Algorithm 1. Algorithm 1 has a temperature parameter T that starts high at T max and gradually reduces to T min exponentially with factor . At each value of T , we perform K iterations that involve a cost comparison of the current state s with a neighboring state s 0 := n(s). 56 Algorithm 1: Simulated annealing. Input: s 0 ;T max ;T min ; ;, K Output: s 1 s s 0 2 s s 3 T T max 4 while T >T min do 5 for k 0 to K 1 do 6 s 0 n(s) 7 E C(s 0 )C(s) 8 if E 0 then 9 s s 0 10 if C(s)<C(s ) then 11 s s 12 else if rand(0; 1)< exp E T then 13 s s 0 14 T T Note that statess 0 whose cost is less than or equal to the current states are always accepted (i.e. E 0). If a neighboring state is accepted under the criteria E 0, then we check whether or not it has a lower cost than the optimal state s and update the optimal state accordingly. Note that states with higher cost (i.e. E > 0) are not necessarily rejected. In fact, the acceptance of higher cost states is controlled by the temperatureT in a probabilistic manner. Note that the term exp E T ! 1 asT!1 where > 0 is a hyperparameter. This implies that the state space is explored aggressively whenT is large since the acceptance of higher cost states becomes more probable. A key component of Algorithm 1 involves constructing the neighboring state s 0 . In our case, this process rst involves sampling an integer i from the discrete uniform distribution Uf1; 6g, which corresponds to one of the controls outlined in Table 3.1. We then choose 57 (b) Minimum SFDR Over the Nyquist Band (Wideband Results) (a) Figure 3.13: Wideband SFDR results (a) Two-tone calibration with tones at cal,0 = 0:05 and cal,1 = 0:4. (b) Single-tone calibration with a tone at cal = 0:24. another number uniformly at random over the values inS i to obtain s 0 i . The neighboring state s 0 is then found by simply modifying the i th element of s accordingly, i.e. s i s 0 i . For the simulations that follow, we assume that the parameters are distributed as in Table 3.2 (each with a control range of3). 3.7.2 Simulations We now simulate the proposed two-tone simulated annealing algorithm for nding (3.21). For comparison, we also apply simulated annealing with a single tone as in [8]. Simulating K iterations of the simulated annealing algorithm requires K computations of (3.22), and 58 (a) (b) Two-Tone Calibration Single-Tone Calibration Figure 3.14: Overlapped parameter trajectories for 20 runs of simulated annealing (a) Two- tone calibration. (b) Single-tone calibration. we show that large values ofK are required when more control bits are used, e.g.,K > 1000 when 5-bit control is used for each parameter. Hence, we run the simulations with the analytical model (where the computations of (3.22) are done using (3.18)), since it is accurate and runs much faster than the behavioral simulations (as we described in Section 3.6). Furthermore, we leverage its speed to simulate the algorithm over hundreds of randomized initial parameters, which we draw from the distributions in Table 3.2. In Fig. 3.13(a), we show simulated results for the two-tone calibrated SFDR vs. iterations for dierent control resolutions (which are assumed to be identical for each parameter). Each 59 (a) (b) SFDR at the Calibration Frequencies (Narrowband Results) Figure 3.15: Narrowband SFDR results (a) Two-tone calibration with tones at cal,0 = 0:05 and cal,1 = 0:4. (b) Single-tone calibration with a tone at cal = 0:24. data point represents the minimum (worst-case) SFDR over the Nyquist band (averaged over 100 independent runs of simulated annealing) and the calibration aims to maximize this quantity. Without calibration, the SFDR is approximately 40dB. When calibrated with enough iterations, the SFDR is increased signicantly, e.g., by over 25dB for 5-bit control. Note that the number of iterations required for convergence increases with the number of control bits. For example, the algorithm converges after roughly 1000 iterations in the 5-bit case. In contrast, the 7-bit case requires roughly 3000 iterations since it has a larger parameter search space. It is worth mentioning that producing Fig. 3.13(a) took 16 hours using the analytical model on a modern workstation. With the behavioral simulations described in Section 3.6, this corresponds to an infeasible run time of 104 years. Fig. 3.13(b) shows SFDR results for single-tone simulated annealing, where there is only a modest improvement relative to the uncalibrated result (independent of the control reso- lution). This is a consequence of narrowband locking, as evident from Fig. 3.14(b) which overlays parameter trajectories for 20 runs of single-tone simulated annealing. Specically, 60 for several of these runs, parameters , g , Q , and skew converge to nonzero values. While these solutions result in a high SFDR at the calibration frequency, as shown in Fig. 3.15(b), they perform poorly over the full Nyquist band. Single-tone simulated annealing exhibits random narrowband locking, since locking onto narrowband solutions is not guaran- teed for every run of the algorithm (e.g., see the results outlined in green for the histogram in Fig. 3.13(b)). Such locking was not observed for the two-tone case, as evident in Fig. 3.14(a) where the parameters converge near zero for every run of the algorithm. The narrowband locking behavior shown in Fig. 3.14(b) is caused by a similar phe- nomenon as that discussed in Section 3.4. Specically, the single-tone gain error cancella- tion approach considered in Section 3.4 was proven to have deterministic narrowband locking due to coupling between the gain and C2 clock duty cycle errors. If single-tone simulated annealing was used to calibrate just gain and C2 clock duty cycle errors, it would suer from this eect, but since both of these parameters are adjusted, this narrowband locking would not be observed for every initialization (i.e., random narrowband locking). The single- tone simulated annealing algorithm in this section calibrates more than just gain and C2 clock duty cycle errors and may be prone to other parameter coupling eects. In fact, the results in Fig. 3.14(b) suggest that coupling exists between four parameters: , g , Q , skew . Specically, we observe narrowband locking with non-ideal settings for these four parameters. In contrast, the results in Fig. 3.14(a) suggest that using two-tone calibration eliminates narrowband locking. To investigate this further, we ran a numerical grid search over the parameter search space,S, and found that cost function (3.22) had a unique global minimum at the origin for the two-tone case (where all parameters in Table 3.2 are equal to 61 zero). In contrast, a similar search for the single-tone case had several global minima away from the origin. The choice of frequencies for the calibration tones is important to consider. We simulated the algorithm with calibration tones of the form ( cal,0 ; cal,1 ) = (0:25=2; 0:25+=2), where 2 (0; 0:5) is the tone spacing and > 0 is a small oset to ensure that the calibration tones do not overlap with the targeted spurs in Table 3.1. An SFDR penalty was observed for calibration tones that were too far apart or too close together (e.g., a 10dB penalty if = 0:1 or = 0:49). If the calibration tones are too far apart (e.g., > 0:49), then cal,0 0 and cal,1 0:5, which is undesired since the C4 image spurs are shaped by sin(2), as mentioned in Section 3.5. Hence, these targeted spurs would only negligibly aect (3.22) during calibration. On the other hand, calibration tones that are too close together (e.g., < 0:1) can result in narrowband locking. One practical implementation of simulated annealing is to utilize a high-speed ADC together with FFTs to compute (3.22), and we demonstrate this in Section 3.7.C. In contrast, gain error cancellation (from Section 3.4) with a single tone near Nyquist requires only a low- speed ADC (to measure thef s =2f cal spur), however, it exhibits deterministic narrowband locking and does not account for the C4 image spurs. 3.7.3 Measured Results We now demonstrate two-tone and single-tone simulated annealing calibration on a 10-bit times-2 interleaved CS-DAC in 14nm CMOS from Jariet Technologies Inc. 6 The DAC is part 6 Our motivation is to demonstrate the calibration algorithm on a modern TIDAC that exhibits the errors analyzed in this chapter. Comparing the specic DAC used to state-of-the-art circuit research is beyond the scope of the dissertation. 62 SPI ADC DAC REF CLK V p V n C2 (20 GHz) Digital PC FPGA PLL RAM Measurement Path Figure 3.16: Modern transceiver used to demonstrate the simulated annealing calibration algorithm. of a modern transceiver chip with a high-speed 10-bit ADC that we use in the algorithm for DAC output measurements. Moreover, we have control over all of the parameters in Table 3.2, except for C2 clock skew, so this is excluded during calibration. Fig. 3.16 illustrates a block diagram of the transceiver. A C2 clock at 20GHz is syn- thesized from a PLL that uses a reference clock (REFCLK) supplied by a signal generator. This allows both the DAC and ADC to operate at an aggregate sample rate of 40GS/s. The ADC can sample the DAC output via an on-chip measurement path, which allows the ADC to capture the full Nyquist band of the DAC. The Field Programmable Gate Array (FPGA) serves as a bridge between the PC and Serial Peripheral Interface (SPI), allowing the control registers that are part of the DAC and ADC to be read and modied. For testing purposes, we wrote the algorithm in Python to run on the PC. In practice, one would implement it on an embedded MCU. To compute (3.22), we sample the DAC output using the ADC and then compute a FFT of size N = 8192. The FFT is the dominant computation task of the calibration algorithm. While computing 500 FFTs of size N = 8192 on an embedded MCU 63 (a) (b) (c) Output Spectrum Comparison, f 0 = 8.54GHz Output Spectrum Comparison, f 0 = 16.08GHz Calibration frequency 6.3dB 15dB 6.3dB 15dB C2 image spur C4 image spur C4 image spur HD3 8dB 13dB C4 image spur C4 image spur C2 image spur HD3 HD2 5.9dB 15.6dB Figure 3.17: Two-tone and single-tone simulated annealing calibration (a) Maximum of the C2 and C4 image spurs in dBc (from Table 3.1). Losses from the test board, cables, and balun have been de-embedded from the measurements. Raw output spectrum comparison (using a Keysight N9040B UXA Signal Analyzer) with fundamental tones at (b) f 0 = 8:54GHz. (c) f 0 = 16.08GHz. requires only a few seconds, utilizing Goertzel lters [39] at the measurement frequencies can further speed up this operation. Fig. 3.17(a) illustrates the maximum of the spurs in Table 3.1 at frequencies over the Nyquist band (in dBc). The uncalibrated results are based on nominal settings for each parameter. The calibrated results show the best and worst of 10 simulated annealing runs using single-tone and two-tone calibration signals. Evaluation of a particular run is deter- mined by the wideband performance of the parameters returned after calibration. To this end, our metric is the maximum of the quantity in Fig. 3.17(a) over the Nyquist band { for example, comparing the purple and green curves, we consider the green one to be better since this metric is roughly -48dBc and -55dBc, respectively. Our motivation behind running 64 the algorithm multiple times is for a deeper exploration of the solution space. Specically, since simulated annealing is a random algorithm, it often converges to a local minimum instead of the global minimum. However, such local minima are often close enough to the global minimum to be considered feasible solutions for the intended application. For exam- ple, even the worst case two-tone solution in Fig. 3.17(a) still guarantees that the targeted spurs will be below -48dBc over the Nyquist band. For the single-tone case, the calibration signal was a tone at f cal = 9:48GHz. An additional tone was included for the two-tone case, i.e., f cal,0 = 9:48GHz and f cal,1 = 18:8GHz. Note that for single-tone calibration (worst case), the performance starts to degrade for frequencies greater than the calibration frequency (9.48GHz), which is indicative of narrowband locking. Although the two-tone calibration signal also includes 9.48GHz, such locking is not observed due to the presence of the additional tone at f cal,1 during calibration, resulting in a 15dB improvement near the Nyquist frequency. Fig. 3.17(b) compares single-tone and two-tone calibration by means of the output spectrum at f 0 = 8:54GHz. 7 Note that the targeted spurs (from Table 3.1) are suppressed signicantly in both cases. A similar comparison is shown in Fig. 3.17(c) for a tone at f 0 = 16:08GHz. We suspect that the second and third harmonics (HD2, HD3) are caused by errors commonly encountered in CS-DACs [17, 40, 41]. The literature involving the calibration of such errors is rich, and the work in [17] provides an overview of several calibration techniques. One practical consideration is the resolution of the ADC that is used to measure the targeted spurs. Specically, if the ADC resolution is too low, the targeted spurs will be 7 Note that the upward slope within the noise oor is an artifact of noise density variation in the spectrum analyzer. 65 masked by quantization noise. For high-resolution ADCs (e.g. 10 bits), the processing gain from the FFT (i.e., 10 log 10 ( M 2 ) dB for an M-point FFT) helps resolve spurs below the quantization noise level [42]. Conservatively, a signal-to-noise ratio (SNR) of at least 20 dB for each of the targeted FFT bins should be enough to neglect quantization eects. Accordingly, we have P spur + 6:02B ADC + 1:76 + 10 log 10 (M=2) 20, where P spur is the targeted spur power andB ADC is the ADC resolution. For example, with a 10-bit ADC and a 8192-point FFT (providing 36 dB processing gain), the eects of quantization are negligible for spurs above -78dBFS. In addition, if other ADC nonidealities also produce spurs at the same locations as the targeted spurs, then it is important that they are kept suciently low in order to ensure an adequate SNR. For example, timing mismatch between sub-ADCs in a time-interleaved ADC will also produce a C2 image spur [43]. Fortunately, calibration schemes for these types of nonidealities are well-documented in the literature [44]. While ADC calibration is beyond the scope of this dissertation, it is important to note that the ADC in Fig. 3.16 was calibrated prior to calibrating the DAC. 3.8 Concluding Remarks This chapter presented analysis and calibration of interleaving and data timing errors that are encountered in modern times-2 interleaved CS-DACs. First, we derived key insights by analyzing the eects of these errors in isolation. For example, the coupled analysis of gain and duty cycle errors uncovered the drawback of calibration via gain error cancellation (i.e., deterministic narrowband locking), which motivated the use of two tones in calibration mode. Moreover, we showed how data timing errors create spurs when nite settling of 66 the sub-DACs is considered. We then developed an analytical model that includes all of the errors considered in this chapter and highlighted its speed and accuracy relative to behavioral simulations. The speed and accuracy of the analytical model was leveraged to run extensive simulations of the proposed two-tone calibration algorithm. The simulation results showed that the use of two tones in calibration mode avoids narrowband locking, resulting in solutions that are eective over the full Nyquist band. This is an improvement over the previous single-tone approaches (e.g., gain error cancellation and single-tone simulated annealing), as these are prone to narrowband locking. The ecacy of the algorithm was then demonstrated experimentally on a 10-bit times-2 interleaved CS-DAC, operating at 40GS/s in 14nm CMOS. Finally, it is worth noting that our proposed approach is classied as foreground cal- ibration, i.e., it does not consider environmental variations. A useful direction for future work would be to quantify the sensitivity of the parameters in Table 3.1 to variations in, for example, temperature and supply voltage. Moreover, developing background calibration algorithms that remedy such variations would be another fruitful research opportunity. 67 Chapter 4 Linearization of CS-DACs Using Neural Networks The focus of this chapter is on correcting static nonlinearity, which is mainly due to the current source mismatch described in Section 2.3.2. A common remedy is the DEM technique discussed in Section 2.4.3, however, this has the potential to cause a signicant noise oor penalty since it shapes the harmonic distortion into white noise. An alternative that does not raise the noise oor is digital pre-distortion (DPD), which is a well-known technique that has been applied to power amplier linearization [45], [46]. In the context of a DAC, this technique cancels out the nonlinearity by mapping the inverse of the transfer characteristic onto the input codes. In this chapter, we propose a novel DPD scheme that is tailored to the discontinuities of the CS-DAC transfer characteristic. We begin by exciting the DAC with an input waveform and then capturing its output with an ADC. Since our scheme is not intended to update in the background, the DAC input signal can be designed. We use the term background to refer to a scheme that runs during normal operation using DAC input data driven by the application. This is in contrast to a foreground scheme which runs oine calibration and allows one to select the DAC input data to be used for system identication. 68 DAC LUT x n x n F(x n ) F( ) F -1 ( ) Figure 4.1: Block diagram illustrating the DPD concept, where the inverse of the DAC static transfer characteristic is stored in a LUT. In our approach, we design the DAC input signal so that it does not stimulate the dynamic eects in the DAC output driver and measurement path from the DAC output to the ADC input. Thus, only the static transfer characteristic is identied using the resulting captured input-output pairs. Specically, we excite the DAC with a low-frequency sine wave so that the static nonlinearity is extracted directly. The static transfer characteristic is then learned by training a NN using a dataset of input-output pairs from this DAC-to-ADC system. Lastly, the inverse of this transfer characteristic is mapped onto the input codes using a Look Up Table (LUT), thus linearizing the DAC via DPD. 4.1 System Identication Mapping the DAC input codes using DPD in order to remedy static nonlinearity has been investigated in [47], [48]. The main idea is illustrated in Figure 4.1, where a LUT maps input codes x n to ~ x n = F 1 (x n ), which linearizes the DAC by inverting its static transfer char- acteristic F (). The static nonlinearity is modeled as a time-invariant, memoryless system. We use the term transfer characteristic to describe the input-output relationship for this memoryless nonlinearity. Data from the DAC output is required in order to estimate F (), 69 DAC ADC System ID x n Measurement Path Model Parameters, θ TRAIN := {(x n , y n ), n=1, … N} y n System Model y n = F(x n ; θ) TRAIN (a) (b) (c) DAC-to-ADC System x n Figure 4.2: (a) Block diagram of the DAC-to-ADC system, (b) System identication using a dataset to determine model parameters , (c) DAC-to-ADC system model with input x n and output ^ y n . and this is typically provided by an ADC. A block diagram of a representative DAC-to-ADC system is shown in Figure 4.2(a), where the measurement path from the DAC output to the ADC input is modeled as a lowpass lter. Our approach is to obtain an estimate ^ F ( ;), where are the model parameters. We refer to this as system identication, as depicted in Figure 4.2(b), where model parameters are found using a dataset of input-output pairs from the DAC-to-ADC system:D TRAIN :=f(x n ;y n ); n = 1;:::;Ng. The DAC stimulus used for system identication in [47], [48] is uniformly distributed random codes. This is because the proposed algorithms in this case are intended to run in the background, and random codes share spectral properties with the signals encountered 70 during normal operation. In contrast, we consider a foreground linearization scheme and, consequently, we leverage our choice of input stimulus in order to isolate the static nonlinear- ity. Specically, we excite the DAC using a sine wave with frequency f sig f s , where f s is the DAC sample rate. This avoids stimulating the dynamic eects in the DAC output driver and measurement path. Therefore, we seek a memoryless model ^ y n = ^ F (x n ;) as depicted in Figure 4.2(c). Furthermore, we assume the ADC in Figure 4.2(a) is suciently linear so that the DAC-to-ADC system accurately captures the nonlinearity of the standalone DAC. The choice of the regression model ^ F is critical, and depends on the problem at hand. In [47] and [48] this model is a polynomial, which is a suitable choice since the proposed DAC architecture exhibits only weakly nonlinear behavior. For CS architectures, which are the focus of this chaper, this model should be selected carefully. This is because the CS-DAC transfer characteristic is prone to large discontinuities, as discussed in Section 2.3.2. Although polynomials are a popular choice for a regression model, they are ineective at tting discontinuities { i.e., they t the abrupt transition poorly and exhibit oscillatory behavior [49]. In contrast, NN regression models are powerful, universal approximators and are a good choice for tting a transfer characteristic with jump discontinuities as well as other, smooth, nonlinear eects. This is illustrated in the example shown in Figure 4.3 where we have focused on a region of the CS-DAC transfer characteristic containing a jump discontinuity. Note how the NN ts this region well while the polynomial exhibits both poor tting near the discontinuity and oscillatory behavior. For this reason, we approach system identication using NNs. The NNs considered in this chaper are feedforward Multilayer Perceptrons (MLPs). An example of an MLP with a single hidden layer is shown in Figure 71 Figure 4.3: Polynomial vs. NN regression in the vicinity of a discontinuity for a CS-DAC behavioral model. 4.4, and the output ^ y n for this architecture with nonlinear activation h :R H !R H is given by ^ y n =w (1) > h w (0) x n +b (0) +b (1) (4.1) where the set of trainable parameters is dened as := w (0) ;w (1) ;b (0) ;b (1) (4.2) with dimensions w (0) 2R H , w (1) 2R H , b (0) 2R H , b (1) 2R. 72 x n H y n w 1 (0) w j (0) w H (0) w 1 (1) w j (1) w H (1) Figure 4.4: Single layer MLP with 1 input node, H hidden nodes, and 1 output node. 4.2 Simulation Results In this section, datasetD TRAIN is obtained using 10-bit DAC and ADC behavioral models operating atf s = 40:96GS/s. We model the measurement path in Figure 4.2(a) as a 2 nd order Butterworth lowpass lter with a 20GHz cuto. The FFT of a two-tone waveform without any linearization is illustrated by the blue spectrum in Figure 4.5. Note that current source errors result in IMD products, and the linearization objective is to suppress these as much as possible. We approach system identication in a NN framework by minimizing the following mean squared error (MSE) cost function C model = 1 N N X n=1 (^ y n y n ) 2 (4.3) 73 Post-Linearization Pre-Linearization Post-linearization Pre-linerization Magnitude FFT Comparison Figure 4.5: Two-tone FFT comparison before and after NN-based DPD. The signal frequen- cies aref 1 = 3:1GHz,f 2 = 3:2GHz with amplitudes -12dBFS/tone and the DAC is sampling at f s = 40:96GS/s. by an appropriate selection of , H, and h(). Conventionally, hyperparameters H, h(), and the number of hidden layers are chosen heuristically. However, in this chapter, we leverage Deep-n-Cheap (DnC), an automated framework for low complexity deep learning applications [50]. This results in single layer NN with rectied linear unit (ReLu) activation [51] and H = 271 hidden nodes. Model parameters are then obtained using an extended version of Stochastic Gradient Descent (SGD) [52], which completes system identication for the static transfer characteristic. The inverse of this transfer characteristic is then quantized to the 10-bit level and then stored in a LUT as shown in Figure 4.1. The performance of NN-based DPD on the behavioral model is illustrated by the green spectrum in Figure 4.5, which shows a reduction of 23.6dB, 19.8dB, and 17.9dB for IM3, IM5, and IM7 respectively. 74 4.3 Measurement Results In this section, we present results for NN-based DPD on a times-2 interleaved 10-bit CS- DAC operating at f s = 40:96GS/s in 14nm CMOS. Our motivation is to demonstrate the ability to capture real-world nonlinearities and also avoid capturing dynamic properties of the system. We do not intend to compare the specic DAC used to state-of-the-art circuit research. DatasetD TRAIN is obtained by capturing the DAC output using an on-chip 10-bit ADC, which is synchronized to the same sample rate as the DAC. The DAC is externally connected to the ADC to avoid signal attenuation and ltering eects. The test setup is shown in Figure 4.8. Linearization was performed in the NN framework as described in Section 4.1 using a sine wave with frequency f sig = 100MHz and amplitude -6dBFS for system identication. In Figure 4.6, we show the output spectrum for the 100MHz tone without DPD, and the spectrum after applying NN-based DPD is shown in Figure 4.7. Note that applying NN-based DPD suppresses the harmonics signicantly, which suggests that this calibration scheme is correcting the static nonlinearity. Specically, the DPD amounts to an almost 10dB SFDR improvement in a 1GHz span that includes HD3, HD5, HD7, and HD9. The two-tone IMD performance results are illustrated in Figure 4.9 (for -6dBFS inputs) and Figure 4.10 (for -12dBFS inputs), where we compare IM3/IM5/IM7 levels using two- tone signals centered at various frequencies across the rst Nyquist zone. We compare the proposed NN technique with DEM and q th -order polynomial-based DPD where q = 15. An on-chip randomizer is used for the former, and coecients c2R q+1 for the latter are found by applying linear regression with a Vandermonde matrix, i.e., solving 75 Figure 4.6: Test bench with the high-speed DAC and ADC test board. HD3 HD5 HD7 HD9 100MHz tone Figure 4.7: DAC output spectrum without DPD, f s = 40:96GS/s, f sig = 100MHz. y =V c (4.4) in the least-squares sense where V2R N(q+1) is dened as 76 HD3 HD5 100MHz tone Figure 4.8: DAC output spectrum with NN-based DPD, f s = 40:96GS/s, f sig = 100MHz. V := 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 x 1 x 2 1 ::: x q 1 1 x 2 x 2 2 ::: x q 2 1 x 3 x 2 3 ::: x q 3 . . . . . . . . . . . . . . . 1 x N x 2 N ::: x q N 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 (4.5) and y2 R N is a vector whose elements y n are ADC output samples that correspond to the DAC input sequence x n . The polynomial-based transfer characteristic is dened by the ordered pairs (x; ^ y poly (x)), where 77 IM3 IM5 IM7 -6dBFS Input Figure 4.9: IM3/IM5/IM7 performance across Nyquist for two-tone signals, -12dBFS/tone (-6dBFS total amplitude), 100 MHz spacing. ^ y poly (x) = q X i=0 c i x i (4.6) and x is valid for the range of input codes used for system identication. Referring to Figure 4.9 (-6dBFS inputs), the IM3 reduction for NN-based and polynomial- based DPD is similar. However, the NN outperforms the polynomial for lower frequencies by at least 10dB for high-order IMD products { i.e. IM5 and IM7. Referring to Figure 4.10 (-12dBFS inputs), it is evident that NN-based DPD shows an IMD reduction of at least 6dB for frequencies up to 9GHz compared to its polynomial-based counterpart. The reason the NN outperforms the polynomial is attributed to the fact that the NN captures 78 DEM NN-DPD No linearization Polynomial DPD IM3 IM5 IM7 -12dBFS Input Figure 4.10: IM3/IM5/IM7 performance across Nyquist for two-tone signals, -18dBFS/tone (-12dBFS total amplitude), 100 MHz spacing. the transfer characteristic more accurately around the discontinuities. Again, an example of such a scenario is depicted in Figure 4.3. It is appears that dynamic eects such as those discussed in Section 2.3 begin to dominate IMD performance above 9GHz, and these are not considered as part of the calibration scheme. Evidence for this is based on the ecacy of DEM above 9GHz, as it is proven to suppress such errors [15]. 4.4 Background Calibration (Future Work) In this chapter, we studied linearization for CS-DACs using NN-based DPD. The proposed scheme mapped the inverse of the static transfer characteristic onto the input codes using a LUT, thus linearizing the DAC via DPD. We referred to this as a foreground (or oine) calibration approach. That is, we assumed the LUT coecients were derived and stored 79 prior to using the DAC for its intended application. However, it is known that current source mismatch in a CS-DAC changes as a function of temperature [53]. Therefore, a foreground calibration scheme may not be feasible for applications that are subject to large temperature drifts. To remedy this, background calibration schemes are often introduced. In a DPD-based framework, this typically means that the LUT coecients are periodically updated using DAC data driven by the application [48], [47]. It is worth emphasizing that the objective of this section is not to develop, simulate, and demonstrate a background calibration algorithm. Rather, we aim to present some preliminary results and ideas that may lead to an interesting research opportunity on this front. Recall from this chapter that IMD performance may be improved by using foreground, NN-based DPD over its polynomial-based counterpart. Therefore, this motivates an explo- ration of designing NN-based DPD schemes that run in the background. To this end, one may consider using multiple LUTs coupled with a temperature sensor and some control logic, as shown in Figure 4.11. In contrast to the foreground framework, we now model the DAC trans- fer characteristic as a function of a time-dependent temperature term T (t)2 [T min ;T max ], t 0. Furthermore, we partition the temperature range T := T max T min using L LUTs. Specically, LUT k stores the inverse transfer characteristic for the temperature range [T k ;T k+1 ); k = 0;:::;L 1 where T j := T min +jT=L, j = 0;:::;L. The coecients for LUT k can be derived oine at T cal;k = (T k +T k+1 )=2, i.e. the average of its prescribed temperature range. We assume that an on-chip temperature sensor drives control logic that selects the appro- priate DPD LUT to use for calibration. Each LUT in Figure 4.11 can be derived using the foreground methodology presented in earlier in this chapter. Note that while this multi-LUT 80 x n DAC F( ,T(t)) Temperature Sensor Control Logic F -1 ( ,T cal, 0 ) x n T(t) LUT 0 LUT 1 LUT L-1 F -1 ( ,T cal, 1 ) F -1 ( ,T cal, L-1 ) F(x n , T(t)) Figure 4.11: Multi-LUT background calibration. approach calibrates the DAC in the background (i.e, it is adaptive to temperature variation), it still has a foreground component since the LUTs must be pre-computed. It would also be interesting to see if the time-delay neural network (TDNN) in Fig. 4.12 could be leveraged to adaptively update a pre-distortion LUT, e.g., as illustrated in Fig. 4.13. Note that, in contrast to the previously described approaches, this background calibration scheme employs online training using DAC data driven by the application. Another dierence is that the TDNN utilizes the current and previous L samples at the input, which can be used to capture memory eects that are present during normal DAC operation (e.g., when the DAC is excited by a modulated waveform) { this is shown in Fig. 4.12, but omitted in Fig. 4.13 for clarity. 81 x n x n-1 x n-L z -1 z -1 z -1 x n-2 z -1 y n N H Figure 4.12: Single layer TDNN with L + 1 input nodes, N H hidden nodes, and 1 output node. DAC ADC TDNN-based System y n DAC-to-ADC System x n DPD LUT x n Measurement Path Online training Adaptive LUT Figure 4.13: Adaptive DPD via online training of a TDNN. A natural step toward evaluating the feasibility of this approach is to ensure that TDNNs can be used for accurate system identication under the framework in Fig. 4.13. To this end, we trained a TDNN (on a PC) using a training dataset designed from a DAC-to-ADC 82 Figure 4.14: DAC system identication using two dierent training signals { random codes (red) and low-frequency sine waves (green). behavioral simulation model where the DAC was excited by random codes. The results are illustrated by the red curve in Fig. 4.14, which shows the MSE training loss versus memory length, L. Note that the MSE monotonically decreases with increasing memory length, which suggests that the system identication accuracy improves as the TDNN utilizes more samples from the past. We would expect the MSE loss to eventually saturate at a value of L that depends on the characteristics of the training signal and, in this case, this occurs at L = 8. For comparison, we overlay the MSE loss for a TDNN that is trained with a low- frequency sine wave (green), which is the training signal used for the single-LUT foreground approach that we developed earlier in this chapter. In this case, the system identication accuracy does not depend on the memory length, which is what we would expect since the training signal in this case was extracted from a memoryless system. It is encouraging to see that, for L 8, the MSE loss associated with the random codes stimulus is comparable 83 to that of low-frequency sine waves. However, it would be helpful to conrm that this MSE loss can be maintained when the TDNN is trained online with environmental variations, i.e., where the TDNN weights are updated in real time. Finally, leveraging the TDNN to derive a pre-distortion LUT (also in real time) would be an interesting research problem and help further develop this study. 4.5 Concluding Remarks In this chaper, we explored a novel linearization scheme for current steering DACs using NNs. We showed that simple NNs are eective for system identication if low-frequency sine waves are used for training. The NN architecture was selected using DnC and the NN parameters were found using SGD. The inverse of the transfer characteristic was then mapped onto the input codes using a LUT. The nal implementation was a simple pre- distortion LUT with no NNs required. Our approach demonstrated an improvement of at least 6dB over conventional DEM and polynomial-based DPD methods for frequencies up to 9GHz. Finally, we concluded the chapter by providing a direction for extending this approach to handle environmental variations (e.g. temperature drifts). 84 Chapter 5 Wideband Analysis of Timing Errors in CS-DACs CS DACs generate analog signals by combining weighted current sources. Ideally, the current sources are combined at each switching instant simultaneously. However, this is not true in practice due to timing mismatch, resulting in nonlinear distortion, as discussed in Section 2.3.4. This chapter uses the equivalent timing error model, introduced by previous work, to analyze the signal-to-distortion ratio (SDR) resulting from these timing errors. Using a behavioral simulation model, we demonstrate that our analysis is signicantly more accurate than the previous methods. We also use our simulation model to investigate the eect of timing mismatch in partially-segmented CS-DACs, i.e., those comprised of both equally- weighted and binary-weighted current sources, as described in Section 2.4.2. 5.1 Motivation As mentioned, CS-DACs are considered to be the de facto solution for transmitters in modern high-speed applications [13], including cellular communication, electronic warfare, and auto- motive radar. The CS-DAC generates an analog signal from a digital input sequence by com- bining current sources, as shown in Fig. 2.12. We refer to this as a fully-segmented CS-DAC, 85 since each current source is equally weighted. In contrast, partially-segmented CS-DACs are hybrid architectures that are comprised of both equally-weighted and binary-weighted current sources, as shown in Fig. 2.13. Regardless of the architecture, the ideal output is a perfect zero-order-hold (or staircase-like) representation of the digital input sequence. However, this is not true in practice due to nonlinear distortion caused by various errors. Recall from Section 2.3 that errors in CS-DACs are broadly classied as either static or dynamic [24]. Static errors, which are time-invariant and memoryless, are mainly caused by current source mismatch and are treatable by various calibration techniques [54], [55], [56]. Dynamic errors, on the other hand, are more dicult to calibrate because they only appear during switching instants and last for a small fraction of the sample period. Hence, this necessitates calibration circuitry with ne resolution in both amplitude and time [33], [47]. Timing-related mismatch, for example, causes the current cells to re at dierent times [15]; nominally, they all re simultaneously at each switching instant. Dynamic errors, such as timing-related mismatch, limit the high-frequency performance of the DAC [33], which makes their analysis critical. Previous research on this topic is presented in [15] and [10], where it was proposed that timing errors for each current cell can be lumped into an equivalent timing error for each switching instant. This is a key contribution that simplies the analysis considerably. Under this framework, the DAC error is comprised of narrow pulses with amplitudes that are proportional to the dierence between consecutive input codes. The signal-to-distortion ratio (SDR) is then derived as a function of the timing error spread, which is assumed to be the same for each current cell. In this chapter, we utilize the equivalent timing error introduced in [15] and provide a more accurate analysis of the resulting model. The key dierence in the analyses is that 86 the approach in [15] implicitly assumed that the timing errors are present during the entire sample period. While this assumption leads to an accurate SDR for the Nyquist band, it is not as accurate for the wideband SDR, i.e., where all frequency components of the error are considered. In this chapter we make no such assumption, resulting in a signicantly more accurate expression for the wideband SDR (as we conrm with behavioral simulations). The limitations of the equivalent timing error model are stated after characterizing the SDR over frequency. In addition, we use the behavioral model to explore the SDR for partially- segmented architectures. The rest of the chapter is organized as follows. In Section 5.2, we provide background information on the equivalent timing error model and the previous analysis. In Section 5.3, we carry out the SDR analysis using the equivalent timing error model and compare it to that in [15]. In Section 5.4, we validate our analysis using a behavioral model, state its limitations, and simulate the SDR for partially-segmented architectures. Finally, we conclude the chapter in Section 5.5. 5.2 Background In the absence of nonidealities, the output of the fully-segmented CS-DAC in Fig. 2.12 is y ideal (t) =u(t) 1 X n=1 I u x n (tnT s ) (5.1) for a digital input sequence x n 2 0;:::; 2 M 1 , where x n =x n x n1 , u(t) is the unit step function, and * denotes convolution. 87 5.2.1 Equivalent Timing Error Model In (5.1), it is assumed that the current cells re simultaneously at each switching instant, i.e., the DAC output changes abruptly at time nT s when x n 6= 0. In practice, the m th current cell res at nT s + m , where m is the timing error for that cell, m2f0;:::;C 1g. As in [15], we model m as independently drawn from a mean-zero, normal distribution with variance 2 (i.e., m N(0; 2 )). The authors in [15] begin the analysis by considering the net charge error introduced by the code transition x n1 !x n . Next, they formulate an equivalent timing error for this transition,T (n), which allows the DAC output to be written as y(t) =u(t) 1 X n=1 I u x n (tnT s T (n)) (5.2) where T (n) = 8 > > > < > > > : 1 jxnj P max(xn;x n1 )1 m=min(xn;x n1 ) m x n 6= 0 0 x n = 0 (5.3) Note from (5.3) that timing errors only occur when the input code changes, i.e., x n 6= 0. In this case, the equivalent timing error is the average of the timing errors for each current cell that switches for the x n1 !x n code transition. 1 The error based on the equivalent timing error model ise(t) =y(t)y ideal (t) and an example is illustrated in Fig. 5.1. Note thate(t) is 1 A derivation of the equivalent timing error, T (n), is presented in [15]. 88 nT s (n+1) T s e(t) t |T ε (n)| Q ε (n) I u | Δx n | T s ... ... Figure 5.1: Errore(t) based on equivalent timing errorsT (n). The charge errorQ (n) is the area under the n th error pulse. comprised of narrow pulses with magnitudes I u jx n j and durationsjT (n)j. The magnitude of the net charge error introduced by thex n1 !x n transition isQ (n) =I u jx n T (n)j, i.e., the area under the error pulses. 5.2.2 Previous SDR Analysis The average expected error power is computed in [15] as hE[P e;previous (n)]i = * E " Q (n) T s 2 #+ (5.4a) = * E " T (n) T s 2 x 2 n I 2 u #+ (5.4b) = 2 I 2 u T 2 s hjx n ji (5.4c) 89 wherehi denotes discrete-time averaging. We use (5.4c) to dene the wideband SDR as SDR = P sig 2 I 2 u T 2 s hjx n ji (5.5) since the denominator is comprised of the total error power. Note that in (5.4a) it is implicitly assumed that the charge error occurs over the entire sample period, T s . However, the charge error actually occurs over a small fraction ofT s , as illustrated in Fig. 5.1. In Section 5.3, we carry out the analysis accordingly, resulting in a more accurate expression for the wideband SDR. Lastly, the authors in [15] specialize (5.5) for low-frequency sinusoidal inputs with the error power limited to the rst Nyquist band, resulting in an SDR of SDR Nyquist = A 1 8f 1 f s 2 (5.6) where f 1 f s is the input frequency and A 1 = 1 2 (2 M 1) is the input amplitude. 2 5.3 Wideband SDR Analysis The error shown in Fig. 5.1 may be written as the sum of non-overlapping pulses e n (t), i.e., e(t) = P 1 n=1 e n (t) where e n (t) = sgn(T (n))I u x n rect t (nT s +T (n)=2) jT (n)j (5.7) 2 A more detailed derivation of (5.6) is presented in [10]. 90 and rect(t) = 1 ifjtj 1=2 and 0 ifjtj> 1=2. Note thate(t) is a random process, where the randomness comes from the equivalent timing errors, T (n). The expected error power is E[P e ] =E " lim T!1 1 T Z T=2 T=2 e 2 (t)dt # = lim N!1 1 (2N + 1) N X n=N E " 1 T s Z (2N+1)Ts=2 (2N+1)Ts=2 e 2 n (t)dt # = lim N!1 1 (2N + 1) N X n=N E[A n ] =hE[A n ]i (5.8) where A n is a random variable dened by A n = 1 T s Z (2N+1)Ts=2 (2N+1)Ts=2 e 2 n (t)dt; jnjN (5.9) Taking the expected value of (5.9) and then averaging yields E[P e ] = E jT (n)j T s x 2 n I 2 u (5.10) Note from (5.3) that T (n) N(0; 2 jxnj ), x n 6= 0. Therefore,jT (n)j has a folded normal distribution [57] with meanE[jT (n)j] = jxnj 1=2 q 2 which we substitute into (5.10), yielding E[P e ] = 1 T s r 2 I 2 u jx n j 3=2 (5.11) 91 At this point, we can compare our analysis to that in [15]. Specically, we observe that (5.10) and (5.4b) dier by a factor ofjT (n)j=T s inside the expectation, i.e., the duty factor of the error pulses. In practice, 0<jT (n)j=T s 1, which means that this dierence is nontrivial. The dierence arises because the analysis in [15] implicitly assumes that the charge error is distributed over the entire sample period. In our analysis, we do not make this assumption. Using our analysis, the SDR based on a sinusoidal input at full scale is SDR = 10 log 10 P sig E[P e ] = 10 log 10 0 @ (2 M 1) 2 8f s q 2 hjx n j 3=2 i 1 A 10 log 10 (5.12) where P sig = ( Iu 2 (2 M 1)) 2 =2 and E[P e ] comes from (5.11). 5.4 Simulations In this section, we simulate CS-DACs using a behavioral model with an oversampling ratio (OSR) of 4096, i.e., one sample period,T s , is represented by 4096 samples. The large OSR is so that we can capture timing errors that are a very small fraction of the sample period, e.g., we are interested in=T s down to 10 3 . Note that using an OSR = 4096 is memory intensive, so we ran the simulations on a modern workstation with 128GB of RAM. In Fig. 5.2, we plot simulated and analytical results of the SDR versus =T s for M-bit DACs (M = 3; 5; 7; 8). Note that the analysis in this chapter accurately captures the wideband SDR, i.e., with no lter at the DAC output. In contrast, the analysis in [15] accurately captures the Nyquist band SDR, i.e., where there is brick wall lter at the DAC output with a cuto frequency 92 Figure 5.2: SDR analysis vs. simulation for a single tone input, f 0 =f s = 0:01 { the simu- lated data points are comprised of 50 independent runs of the behavioral model, where 95% condence intervals are shown. of f s =2. Hence, the Nyquist band SDR is substantially higher since it ignores the spectral content of the error beyond f s =2. It is worth mentioning how the SDR is extracted from the simulations. First, the behavioral model is run with zero timing errors to generate the ideal output, y ideal (n). Then, it is run with timing errors to generate the nonideal output, y(n), resulting in an error of e(n) = y(n)y ideal (n). For the wideband SDR, the power of the sequence e(n) is used. For the Nyquist band SDR, the power spectral density of e(n) 93 Figure 5.3: SDR analysis vs. simulation over frequency, =T s 3 10 3 { the simulated data points are comprised of 50 independent runs of the behavioral model, where 95% con- dence intervals are shown. is computed, and only the frequency components from DC to f s =2 are included in the error power. In Fig. 5.3, we plot the wideband SDR from simulation (in blue) and compare it with two dierent analyses over frequency. The red markers are from (5.12) (this chapter), and the purple markers are derived from the analysis in [15], i.e., using (5.5) as the SDR. Note that our analysis, in contrast to that in [15], is in closer agreement with the simulations. This is because for practical values of=T s , i.e., 0<=T s 1, the assumption in (5.4a) that 94 Figure 5.4: Simulation of the wideband SDR versus thermometer bits for M-bit CS-DACs (M = 8; 10; 12) { the simulated data points are comprised of 50 independent runs of the behavioral model, where 95% condence intervals are shown. the charge error is distributed over the entire sample period becomes less accurate. Since we do not make this assumption, there is a considerable dierence between the two analyses for the small =T s considered in Fig. 5.3. Referring again to Fig. 5.3, note that our analysis diverges from the simulation as the number of bits,M, is increased (by up to 6.3dB forM = 8 andf 0 =f s 0:5). This divergence is caused by the breakdown of the equivalent timing error model for larger values ofM at high frequencies. This is qualitatively shown in Fig. 5.5(a) and Fig. 5.5(b), where we illustrate the squared error for various code transitions for 3-bit and 6-bit DACs, respectively. Note that the squared errors in the 3-bit cases closely resemble rectangular pulses, i.e., they are well-suited for approximation via equivalent timing errors. In contrast, the squared errors for the 6-bit cases change gradually in the vicinity of the switching instant. We also used the behavioral model to investigate partially-segmented DACs. We con- sidered M-bit DACs with the rst T bits (MSBs) thermometer-decoded into 2 T 1 unit 95 (a) t / T s Squared Error Squared Error (b) Figure 5.5: Squared error for various code transitions x n1 ! x n in an M-bit CS-DAC, f 0 =f s 0:11, =T s 3 10 3 . For each case, the nominal switching instant nT s is aligned with t = 0, (a) M = 3 and (b) M = 6. elements and binary weights for the remainingMT bits (LSBs). In Fig. 5.4, we plot simu- lations of the wideband SDR versusT forM-bit partially-segmented DACs (M = 8; 10; 12). For T < 6, the SDR increases by approximately 3dB/bit. However, the SDR eventually saturates, i.e., the improvement gets smaller for each bit added when T 6. For example, going from T = 8 to T = 9 yields only a 1dB increase in SDR. Lastly, it should be noted that if M is suciently large, e.g., M 8, then increasing it further does not improve the SDR, i.e., the impact of quantization on the timing errors becomes negligible. 96 5.5 Conclusion In this chapter, we presented analysis of the wideband SDR due to timing errors in fully- segmented CS-DACs, which was validated using a behavioral model and proven to be sig- nicantly more accurate than the previous analysis. In addition, we used the model to characterize the SDR for partially-segmented architectures. Thus, this chapter provides a method for accurately specifying error tolerance in the circuit design to achieve a given SDR. A useful extension would be to improve the model accuracy for high-resolution DACs at high-frequency operation. This may be done by substituting the equivalent timing error model with one that more accurately captures the error pulse characteristics. 97 Chapter 6 Calibration of Analog Dot Products 6.1 Background Dot products are employed in a wide variety of applications, including modern radar, com- munications, signal processing, and machine learning [58{62], e.g., y =x > w (6.1) = M1 X i=0 x i w i (6.2) where x > = [x 0 x 1 x M1 ] and w > = [w 0 w 1 w M1 ]. Moreover, dot products are regarded as the most frequent and compute-intensive function required for applications like machine learning [63, 64]. For example, consider a single hidden layer NN with K input nodes, H hidden nodes, and a single output node { this requires N dot products of length K for the hidden layer and one dot product of length N for the output layer. A single dot product is comprised of multiplier-accumulator (MAC) operations, as shown in (6.2). Recently, there has been substantial interest in carrying out MAC operations in the analog domain (e.g. for low-power, hardware implementations of NNs) [63,65,66]. For example, Fig. 6.1 illustrates one possible implementation of an \analog dot product engine", as presented in [63]; for ease of illustration, we show how a simple dot product of length 2 is computed 98 V DD V DD x 0 x 1 b 2 (0) b 1 (0) b 0 (0) b 2 (1) b 1 (1) b 0 (1) Summing node M 1 M 2 M 1 M 2 M 1 M 2 M 1 M 2 M 1 M 2 M 1 M 2 y = g ON (x 0 w 0 + x 1 w 1 ) (W/L) M1,2 = 4 (W/L) M1,2 = 2 (W/L) M1,2 = 1 w 0 b 2 (0) b 1 (0) b 0 (0) w 1 b 2 (1) b 1 (1) b 0 (1) 4g ON x 0 b 2 (0) 2g ON x 0 b 1 (0) g ON x 0 b 0 (0) 4g ON x 1 b 2 (1) 2g ON x 1 b 1 (1) g ON x 1 b 0 (1) Desired dot product Figure 6.1: Analog dot product engine from [63]. via analog MAC operations (i.e., y = x 0 w 0 +x 1 w 1 ), where the weights are stored as 3-bit words (i.e., w 0 b (0) 2 b (0) 1 b (0) 0 and w 1 b (1) 2 b (1) 1 b (1) 0 ). The x 0 w 0 term is computed from the top row of the array via currents that are injected into a summing node. For example, referring to the top-right of the array (the LSB for w 0 ), if transistor M1 is on (b (0) 0 = 1) 1 1 In [62], bitsb (j) i are stored in standard 8T-SRAM bit cells. A 6T-SRAM bit cell implementation can be found in [67]. 99 then the injected current is g ON x 0 , where g ON is the series ON-conductance of M1 and M2 when (W=L) M1;M2 = 1. If M1 is o, then no current is injected into the summing node. The color-coded columns represent dierentW=L ratios, which are binary weighted. The bottom row of transistors computes the x 1 w 1 term using the same methodology, where its currents are also injected into the summing node. The nal result is a term that is proportional to the desired dot product, i.e., y =g ON (x 0 w 0 +x 1 w 1 ). Although Fig. 6.1 illustrates a hardware implementation of a length 2 dot product, it is worth noting that this technique can be generalized by stacking more rows of transistors. In fact, the authors in [62] utilize a conceptually similar technique to implement a single layer NN in hardware with hundreds of input and hidden nodes. 6.1.1 Analog Multiplier Nonlinearity While performing MAC operations in the analog domain as in Fig. 6.1 is desirable for low- power applications, it introduces some nonidealities that require calibration. One signicant nonideality is the nonlinearity of the analog multipliers. This is because the basic operation of an analog multiplier relies on passing the multiplying factors through a nonlinear device 2 , as shown in Fig. 6.2 [68]. The nonlinear device produces a term that is proportional to the product of the multiplying factors (i.e., the desired term, Kxw) along with some undesired terms (e.g., x 2 , w 2 , x 3 , x 2 w, and so on). This is true for all practical multipliers, including the well-known Gilbert cell [69]. Generally, some form of nonlinear correction is applied to ensure that the multiplier output contains only the desired term, as shown in Fig. 6.2. Fig. 2 Such nonlinear devices can be characterized by a high-order polynomial [68]. 100 Nonlinear Device Nonlinear Correction = Figure 6.2: Basic operation of an analog multiplier. 6.3 illustrates a simulated transfer characteristic (I-V response) of an analog multiplier in GF 12LP. Note that the 600-900mV range appears to be a suitable, linear operating range for this multiplier. Although unclear in Fig. 6.3, there is in fact some nonlinearity within this operating range. This is clear in Fig. 6.4, which illustrates the error relative to the ideal transfer characteristic 3 , where the normalized x-axis represents 600-900mV. Note that a 4 th order polynomial provides a good t to this error. Finally, it is worth mentioning that, for the time being, we model the error as a 1D function of x (e.g. the V GS values in Fig. 6.3). In Section 6.2, we develop a calibration technique under this 1D framework. In Section 6.4, we model the error as a function of both x and the weight, w, and discuss a direction for calibration under this 2D framework. 3 The error is normalized to the nominal range of the multiplier output and reported as a percent. 101 w Linear operating range w V GS I DS Figure 6.3: Simulated I-V response of an analog multiplier in GF 12LP. Figure 6.4: Analog multiplier error in GF 12LP, extracted from a simulation and t to a 4 th order polynomial. 102 6.2 Analog Dot Product Calibration In this section, we propose a calibration algorithm for analog dot products, where the tar- geted impairment is the nonlinearity of the multipliers (as discussed in Section 6.1.1). For simplicity, we model the multiplier nonlinearity as a 1D function of one multiplying factor, however, we conclude the chapter with a direction for extending the results to the 2D case (i.e., where the error is a function of both multiplying factors). Fig. 6.5(a) illustrates a standard, \brute force" calibration approach for a length-M analog dot product, which is to have a nonlinear correction circuit for each of theM multipliers in the MAC array. However, when M is large, this approach is not conducive to low cost, area, and power consumption due to the requirement of M nonlinear correction circuits. Instead, we propose to use a new approach, as illustrated in Fig. 6.5(b), where we apply the following additive correction factor to the output of the MAC array E corr (x 0 ; ;x M1 ) = M1 X m=0 ^ e ref (x m ) (6.3) where ^ e ref (x) is a ZOH approximation of the error of a reference multiplier (e.g., a staircase approximation of the error in Fig. 6.4). To obtain ^ e ref (x), we non-uniformly quantize the operating range of the reference multiplier and assign approximation values to each of the quantization bins. For example, this is illustrated in Fig. 6.6 for a 4-bit quantization, where the vertical lines are the bin thresholds and the approximation values are determined by evaluating a polynomial t to the error, e ref (x), at the midpoint of each bin 4 . The resulting 4 We explore another method of selecting the approximation values later on in this section. 103 g 0 (z) z 0 x 0 w 0 f 1 (x,w) g 1 (z) z 1 x 1 w 1 f M-1 (x,w) g M-1 (z) z M-1 x M-1 ... + w M-1 Kx 0 w 0 Kx 1 w 1 Kx M-1 w M-1 Nonideal multipliers Nonlinear Correction Circuits f i (x,w) x w z = Kxw + e i (x) Nonideal multiplier (a) (b) K x i w i Desired dot product f 0 (x,w) z 0 x 0 w 0 f 1 (x,w) z 1 x 1 w 1 f M-1 (x,w) z M-1 x M-1 ... + w M-1 Nonideal multipliers K x i w i + e i (x i ) f 0 (x,w) + E corr (x 0 , , x M-1 ) = ê ref (x i ) K x i w i Desired dot product Additive Correction Factor Figure 6.5: Analog dot product calibration (a) Traditional approach. (b) Proposed approach. approximation error, e ref (x) ^ e ref (x), is illustrated by the green curve. Later we describe how to compute (6.3) in practice. In general, the calibration for the entire MAC array will improve as the approximation error ^ e ref (x) e ref (x) decreases. However, even if this approximation error is zero, this only implies that the nonlinearity associated with the reference multiplier will be zero after 104 Figure 6.6: ZOH approximation of a reference multiplier error with NUQ { approximation values are selected by evaluating the polynomial t at the midpoint of each bin. calibration. There will still be some residual error from the remainingM1 multipliers, but this should be low since all of the multipliers have qualitatively similar nonlinearity with only minor dierences caused by process variation. We explore the eect of nonlinearity mismatch between the multipliers in Section 6.3. Also note that we use the max-absolute-value of the error as a multiplier linearity metric, e.g., referring to Fig. 6.6, prior to calibration this metric is 0.52%, and after calibration it is 0.12% { the calibration objective is to keep this metric as low as possible. 6.2.1 Calibration Considerations Two critical design choices for the described calibration approach are: 1) where to place the quantization thresholds and 2) how to select the approximation values once the thresholds are in place. For the example in Fig. 6.6, the thresholds were manually placed in order 105 to ensure that wider bins map to regions where the error derivative is low and narrow bins map to regions where it is high. While this is a conceptually reasonable approach, manual placement becomes time consuming and infeasible when there are a large number of thresholds. Moreover, it is subject to human error and may result in a sub-optimal solution. For these reasons, an algorithmic approach to the threshold placement is desired. To this end, we propose to use simulated annealing, as outlined in Algorithm 1 from Chapter 3. As mentioned, another key design choice is how the approximation values are selected once the thresholds are in place. To be more conducive with the calibration objective, we propose to select the approximation values as described in Appendix C.1, where we prove this is the optimal approach for a xed set of the thresholds (i.e. it minimizes the maximum absolute error). Note that this method of selecting the approximation values is dierent from that shown in Fig. 6.6. Specically, the approach described in Appendix C.1 averages the maximum and minimum values of the error in each bin, whereas the method in Fig. 6.6 uses the error evaluated at the midpoint of each bin. Our proposed method of selecting the approximation values (i.e., from Appendix C.1) is demonstrated in Fig. 6.7; note that is outperforms the method in Fig. 6.6 by a factor of 2. 106 Figure 6.7: ZOH approximation of a reference multiplier error with NUQ { approximation values are selected using the methodology in Appendix C.1. 6.2.2 Calibration Algorithm Following the framework described in Section 6.2.1 (i.e., using simulated annealing to nd the thresholds and the method in Appendix C.1 to select the approximation values), we begin by dening a state vector that is comprised of the thresholds s = [s 0 s N1 ] > (6.4) where s i 2 (0; 1) is the i th threshold and N is the number of thresholds. Fig. 6.8 illustrates an example of how the thresholds in (6.4) can partition the unit interval, where the reference errore ref (x) is shown in red. We assumes i 6=s j (fori6=j), so such a partition createsN + 1 quantization bins (i.e., b 0 b N ). 107 ... s k-1 s k b k b 0 b N s 0 s 1 s N-1 x e ref (x) b 1 s -1 =0 s N =1 ... Figure 6.8: Partition of the unit interval via the state vector in (6.4). Again, our approach to placing the thresholds is to use simulated annealing (Algorithm 1). Specically, we want to solve the following integer programming problem s = arg min s C(s) (6.5) using simulated annealing for some suitable cost function, C(s). To this end, we propose to use the following cost function C(s) = 20 log 10 max x je ref (x) ^ e ref (x;s)j (6.6) where ^ e ref (x;s) = N X k=0 c k rect x s k +s k1 2 s k s k1 ! (6.7) 108 is the reference error approximation (e.g., the blue curve in Fig. 6.7) and c k is the approx- imation value for the k th bin; note that, by convention, we assign s 1 = 0 and s N = 1, neither of which are part of the state vector and hence xed during simulated annealing. To initialize simulated annealing, we start o with a uniform quantization, i.e., s k = k+1 N+1 , k = 0;:::;N 1. The approximation values can then be computed as follows c k = max x2[s k1 ;s k ] e ref (x) + min x2[s k1 ;s k ] e ref (x) 2 k = 0;:::;N (6.8) which is consistent with the methodology outlined in Appendix C.1, and an algorithm for computing (6.8) is outlined in Appendix C.2. Note that simulated annealing (Algorithm 1) requires the selection of a neighbor state n(s), i.e., a state that is \close" to s in the state space. We construct n(s) in a two-step process by: 1) randomly selecting one of the thresholds and 2) placing this threshold at a random location in between its adjacent thresholds. Both of these steps are outlined in Algorithm 2. Fig. 6.9 depicts a scenario where s 0 is the randomly selected threshold, and its new location is denoted by s 0 0 . Algorithm 2: Generate neighbor state. Input: State vector, s Output: Neighbor state, n(s) Initialize: n(s) s 1 k uniform random number in [0;N 1] 2 [n(s)] k uniform random number in (s k1 ;s k+1 ) /* Update kth element of n(s) */ After a neighbor state n(s) is selected, we compute its cost using (6.6) so that we can compare it with the cost of the current state, s, as shown in Algorithm 1. Note that computing the cost of n(s) rst requires computation of (6.8), however, it is not necessary 109 ... s k-1 s k b k b 0 b N s 1 s N-1 x b 1 s -1 =0 s N =1 ... s 0 e ref (x) Figure 6.9: Neighbor state selection. to compute (6.8) for all values of k. Specically, if threshold k 0 is selected for adjustment in Algorithm 2, then only the approximation values associated with binsk 0 andk 0 +1 need to be recomputed. For example, Fig. 6.9 illustrates the case where k 0 = 0, so only approximation values c 0 and c 1 need to be recomputed. The only time we compute (6.8) for all values of k is at the rst iteration of the algorithm, i.e., to compute the cost of the initial state. 6.2.3 Counter-Based Correction Factor Computation It is worth describing how the correction factor in (6.3) can be computed in practice. First, note that (6.3) depends on the sequence x 0 ; ;x M1 which, for example, may be obtained from a non-uniform quantizer that samples an analog input signal, x(t). Such a quantizer can be designed with thresholds found via the simulated annealing approach described in Section 6.2.2. Again, ifN thresholds are used for quantization, then this createsN + 1 bins, e.g., b 0 ; ;b N , as shown in Fig. 6.8. Moreover, we may assign a counter to each of these 110 of these bins that counts the number of bin \hits" { specically, the output of the counter for binb k is P M1 m=0 1 (s k1 ;s k ) (x m ), where 1 (s k1 ;s k ) (x) = 1 ifx2 (s k1 ;s k ) and zero otherwise. Correction factor (6.3) can then be computed as a weighted sum of the counter outputs, i.e., E corr (x 0 ;:::;x M1 ) = N X k=0 c k M1 X m=0 1 (s k1 ;s k ) (x m ) | {z } b k counter output (6.9) where c k are dened in (6.8). 6.2.4 Calibration Results Fig. 6.10 illustrates the calibrated error (in green) after applying the calibration approach described in Section 6.2.2 for dierent values of N (number of thresholds). Note that for each value of N, the error is reduced signicantly relative to the uncalibrated error shown in red. Fig. 6.11 illustrates (6.6) (in dB) over 137k iterations of simulated annealing; the longest runtime (N = 64) was 220 seconds on a modern workstation. Note that the linearity metric in the plot titles of Fig. 6.10 corresponds to the cost values at the end of simulated annealing (e.g. for N = 8, 0.083% corresponds to 20 log 10 ( 0:083 100 ) 62dB). Also note that these trajectories have a high variance at the beginning of the algorithm and a low variance near the end. This is a typical characteristic of simulated annealing, i.e., it is highly exploratory at the beginning and approaches the global optimum in a greedy fashion near the end. Fig. 6.12 compares cost (6.6) (in dB) for uniform and NUQ versus N, where the NUQ- based thresholds were found using simulated annealing. Note that these two curves approach 111 N=32, max-abs-error = 0.025% N=64, max-abs-error = 0.015% N=16, max-abs-error = 0.042% N=8, max-abs-error = 0.083% Figure 6.10: ZOH approximation of a reference multiplier error using the methodology out- lined in Section 6.2.2 for dierent threshold counts (N = 8; 16; 32; 64). each other asN increases, which implies that there are diminishing returns when using NUQ over uniform quantization for larger values of N. For example, with N = 256 thresholds, there is no benet from using NUQ over uniform quantization since their corresponding cost values are roughly equal. 112 N=8 N=16 N=32 N=64 Figure 6.11: Cost function (6.6) vs. number of simulated annealing iterations for dierent threshold counts (N = 8; 16; 32; 64). No calibration Figure 6.12: NUQ vs. uniform quantization cost comparison. 113 6.3 Calibration Simulation In practical analog dot products, the multipliers will have slightly mismatched nonlinearity due to process variation. For example, Fig. 6.13 overlays the error curves for M = 1024 multipliers, where the error of the reference multiplier is shown in red 5 . Even if we apply the calibration scheme in Fig. 6.5(b) with zero approximation error (i.e., ^ e ref (x) = e ref (x)), the resulting dot product will still deviate from its ideal value due to the bias of the correction factor toward the reference multiplier. Specically, this is by construction of (6.3) since it utilizes error information from only the reference multiplier; if all M multipliers shared identical error curves then such a bias would not exist, however, in practice there will be some mismatch between them, as shown in Fig. 6.13. Although this bias causes dot product errors in practice, it is worth mentioning that one of the benets of this biased approach 5 To model this, we added mean-zero Gaussian noise to the error of the reference multiplier. Figure 6.13: Multiplier nonlinearity mismatch due to process variation (M = 1024 multipli- ers). 114 is that only one error curve (the reference) needs to be measured in order to compute the correction factor. In this section, we simulate the calibration approach in Fig. 6.5(b), and consider nonlin- earity mismatch between the multipliers. Specically, to evaluate the calibration, we consider hundreds of realizations of the following ratio R = " M1 X m=0 X m W m +e m (X m ) # | {z } nonideal dot product E corr (x 0 ; ;x M1 ) | {z } correction factor M1 X m=0 X m W m | {z } ideal dot product (6.10) where X m U(0; 1), W m N (0; 1), E corr (x 0 ; ;x M1 ) is dened in (6.3) 6 , and e m (x) is the error curve for the m th multiplier. Note that the numerator in (6.10) is the calibrated dot product and the denominator is the desired, ideal dot product; ideally, this ratio is unity. Fig. 6.14 illustrates histograms of 512 random realizations of (6.10) (in dB), with and without calibration ([R] dB = 10 log 10 R). For the case without calibration, we set E corr (x 0 ; ;x M1 ) = 0. The results in Fig. 6.14 indicate that the calibration brings (6.10) close to unity (0 dB) with a low spread, and the opposite is true when no calibration is applied. Fig. 6.15 illustrates the calibrated and uncalibrated results for each of the 512 6 Computing this factor requires an approximation of the error of a reference multiplier, ^ e ref (x). We used the simulated annealing approach described in Section 6.2.2 with N = 34 thresholds to obtain this approximation. 115 Figure 6.14: Calibrated vs. uncalibrated MAC array. Figure 6.15: Simulation of calibrated vs. uncalibrated analog dot products. analog dot products, where D calibrated = 1 D " M1 X m=0 X m W m +e m (X m ) # 1 D E corr (x 0 ; ;x M1 ) D uncalibrated = 1 D " M1 X m=0 X m W m +e m (X m ) # (6.11) D ideal = 1 D M1 X m=0 X m W m (6.12) 116 and D is a normalization factor, for convenience, so thatD ideal has unit variance. Note that in the calibrated case the dot products closely overlap with their ideal values. In contrast, such an overlap does not occur for the uncalibrated case, where there is instead a negative bias relative to the ideal values. Moreover, it can be shown that this bias is a result of the fact that the average of the majority of the error curves in Fig. 6.13 is negative. 6.4 Extension to 2D Calibration Recall that the calibration algorithm proposed in Section 6.2 assumes that the multiplier nonlinearity is a 1D function ofx. However, in some cases, this nonlinearity may depend on both x and w. The purpose of this section is to provide some insight on how the 1D results in Section 6.2 can be extended to this 2D framework. We consider three dierent approaches for 2D calibration: 1. (x;w) Calibration 2. Additively Separable Calibration 3. Hybrid Calibration The rst approach is the most accurate of the three and the most demanding from a com- plexity standpoint. The second approach is the least accurate, however, it is also the least complex. Finally, the third approach balances accuracy and complexity by combining the rst two approaches. 117 6.4.1 (x;w) Calibration Fig. 6.16 depicts a MAC array where the multiplier nonlinearity is a function of both x and w, along with a calibration approach similar to that in Fig. 6.5(b), i.e., a correction factor, constructed from the error of a reference multiplier, is applied to the output of the MAC array in order to cancel out the undesired term. At a high level, this correction factor can be generated as follows: 1. Extract the error of a reference multiplier using a circuit simulation, which will result in data points that comprise a surface over the (x;w)-plane. 2. Fit a 2D function e ref (x;w) to the circuit simulation data via regression techniques (polynomials, NNs, etc.), which is analogous to the polynomial t for the 1D case in Fig. 6.4. 3. Derive a coarse approximation of the 2D surface, ^ e ref (x;w), by quantizing the (x;w)- plane via simulated annealing and assign approximation values to each quantization \patch". This is analogous to the ZOH approximations we derived in Section 6.2 for the 1D case. 4. Compute the correction factor as E corr = P M1 i=0 ^ e ref (x i ;w i ) We refer to this approach as (x;w) calibration to distinguish it from the other approaches discussed in this section. 118 f i (x,w) x w z = Kxw + e i (x,w) Nonideal multiplier z 0 x 0 w 0 f 1 (x,w) z 1 x 1 w 1 f M-1 (x,w) z M-1 x M-1 ... + w M-1 Nonideal multipliers K x i w i + e i (x i, w i ) f 0 (x,w) + E corr (x 0 , , x M-1 , w 0 , w M-1 ) = ê ref (x i, w i ) K x i w i Desired dot product Additive Correction Factor Figure 6.16: (x;w)-calibration applied to a MAC array. 6.4.2 Additively Separable Calibration While the (x;w)-calibration approach in Fig. 6.16 seems like a viable approach, drastic simplications can be realized when the multiplier nonlinearity, e i (x;w), is an additively separable function [70], i.e, e i (x;w) = f i (x) +g i (w). From a calibration perspective, this allows a 2D problem to be split into two 1D problems. For example, Fig. 6.17 depicts a calibration approach for the additively separable case; note that there are now two correction factors, i.e., F corr (x 0 ; ;x M1 ) and G corr (w 0 ; ;w M1 ) to cancel out the f i (x) and g i (w) error components, respectively. The idea is to once again compute the correction factors by utilizing only the error of a reference multiplier,e ref (x;w) =f ref (x)+g ref (w). Specically, the 119 correction factors can be computed by applying the 1D simulated annealing approach in Sec- tion 6.2 tof ref (x) andg ref (w) in order to derive ^ f ref (x) and ^ g ref (w), respectively. Accordingly, we have F corr (x 0 ; ;x M1 ) = M1 X i=0 ^ f ref (x i ) (6.13a) G corr (w 0 ; ;w M1 ) = M1 X i=0 ^ w ref (w i ) (6.13b) There are two signicant benets of additively separable calibration. First, since the weights are xed in typical applications, correction factor G corr only needs to be computed once; this is in contrast to F corr which needs to be updated in real time as the x values change. Second, it reduces the amount of counters required to compute ^ e ref (x;w) if we apply the counter-based approach, as described in Section 6.2.3. For example, suppose we apply (x;w) calibration (as in Fig. 6.16) by quantizing thex andw dimensions with 16 and 4 thresholds, respectively { this results in a total of 64 counters (i.e., 16 for thex-axis 4 for thew-axis). In contrast, the additively separable approach only requires 20 counters under the same conditions (i.e., 16 for the x-axis + 4 for the w-axis). Additively Separable Approximations In cases where e ref (x;w) is not additively separable, it may be tempting to default to the (x;w) calibration approach in Fig. 6.16. However, here we explore additively separable approximations (ASAs) to functions that are not additively separable, which allows the approach in Fig. 6.17 (i.e., additively separable calibration) to be applied with some approx- imation error. Our approach to deriving an ASA of e ref (x;w) is to concurrently train two 120 f i (x,w) x w z = Kxw + e i (x,w) = Kxw + f i (x) + g i (w) Nonideal multiplier z 0 x 0 w 0 f 1 (x,w) z 1 x 1 w 1 f M-1 (x,w) z M-1 x M-1 ... + w M-1 Nonideal multipliers K x i w i + f i (x i ) +g i (w i ) f 0 (x,w) + F corr (x 0 , , x M-1 ) = ref (x i ) K x i w i Desired dot product Additive Correction Factor 1 G corr (w 0 , , w M-1 ) = ref (w i ) + Additive Correction Factor 2 Figure 6.17: Additively separable calibration approach applied to a MAC array. NNs with a concatenation and combination layer, as shown in Fig. 6.18. Note that x and w follow independent paths and are passed through their respective NNs (i.e., NN-x and NN-w). If this NN system is trained with the following dataset D train = 8 < : (x k ;w k ) | {z } NN input ;e ref (x k ;w k ) | {z } NN target 9 = ; K k=1 (6.14) then its output is the desired ASA, i.e., e ref, ASA (x;w) = f ref, ASA (x) +g ref, ASA (w). Once the NN system is trained, components f ref, ASA (x) and g ref, ASA (w) can be independently extracted via NN-x and NN-w, respectively. For example, to extract f ref, ASA (x), we can take the dot product of the NN-x output with the rst half of the combination layer, and g ref, ASA (w) can be extracted in a similar fashion (i.e., with the NN-w output and second half of the combination layer). Once these components are obtained, the 1D simulated annealing approach in Section 6.2 can be applied to obtain the ZOH approximations ^ f ref, ASA (x) and ^ g ref, ASA (w), which allows the correction factors to be computed as in Fig. 6.17 by replacing ^ f ref (x) and ^ g ref (w) with ^ f ref, ASA (x) and ^ g ref, ASA (w), respectively. 121 NN-x NN-w Figure 6.18: NN system for deriving additive separable approximations. Fig. 6.19 illustrates NN-based ASAs of various target functions e ref (x;w), which were heuristically designed for demonstration purposes. In Fig. 6.19(a), the target was designed to be an additively separable function, and hence the approximation error is low. Specically, note that the target ande ref,ASA (x;w) are qualitatively similar, and the approximation error, which we refer to as the residue,r ref (x;w), is close to zero. Moreover, these observations are consistent with the NN learning curves (MSE loss) which drop nearly 40dB over 600 epochs. In Fig. 6.19(b), we illustrate the case where the target e ref (x;w) was designed to be only weakly dependent on w. Again, from a qualitative standpoint, the ASA appears to be a good t to the target. However, the residue is higher in this case compared to Fig. 6.19(a) since the target is not additively separable. Lastly, in Fig. 6.19(c), the ASA is clearly a poor t to the target, which shows that it is not always possible to nd an adequate ASA. This 122 (a) Additively Separable Target (b) Target with weak dependence on w (c) Multiplicatively Separable Target Figure 6.19: ASA approximations of various target e ref (x;w) (a) additively separable case. (b) weak dependence on w. (c) multiplicatively separable case. qualitative observation is again re ected in the residue and the learning curves (which drop by only 1-2dB). 6.4.3 Hybrid Calibration As mentioned, we also consider an approach that combines (x;w) calibration with additively separable calibration. To begin the exposition, suppose we apply an ASA to thei th multiplier error, e i (x;w). This error can then be written as the ASA in summation with the residue 123 f i (x,w) x w z = Kxw + e i (x,w) = Kxw + e i , ASA (x,w) +r i (x,w) Nonideal multiplier z 0 x 0 w 0 f 1 (x,w) z 1 x 1 w 1 f M-1 (x,w) z M-1 x M-1 ... + w M-1 Nonideal multipliers K x i w i + f i , ASA (x i ) + g i , ASA (w i ) + r i (x i ,w i ) f 0 (x,w) + F corr (x 0 , , x M-1 ) = ref,ASA (x i ) Additive Correction Factor 1 G corr (w 0 , , w M-1 ) = ref,ASA (w i ) + Additive Correction Factor 2 + R corr (x 0 , ,x M-1 ,w 0 , , w M-1 ) = ref (x i ,w i ) Residue Correction Factor - (x,w) Calibration K x i w i Desired dot product Additively Separable Calibration Figure 6.20: Hybrid calibration approach applied to a MAC array. (approximation error), i.e., e i (x;w) = e i, ASA (x;w) +r i (x;w), as illustrated in the upper- left portion of Fig. 6.20. The idea of hybrid calibration is to take the additively separable calibration one step further by cancelling out the residue via (x;w) calibration, which results in an additional correction factor, R corr , as shown in Fig. 6.20. Note that we again utilize only a reference multiplier for this calibration scheme. Finally, it is worth mentioning that, in cases wheree i (x;w) is already additively separable, we can simply apply the approach in Fig. 6.17. In other words, the hybrid approach may be used in cases where it is desirable to reduce the residue after applying ASA to an error target that is not additively separable. 6.5 Concluding Remarks & Future Work In this chapter, we developed calibration algorithms for analog dot products (MAC arrays), which was motivated by the fact that they are starting to gain signicant traction, especially for low-power, hardware implementations of NNs. We showed that the multipliers in MAC arrays suer from nonlinearity, which causes the dot products to deviate from their ideal values. We began by developing calibration for this nonlinearity in a 1D framework, i.e., 124 where the multiplier error is a function of only one variable. Specically, we showed that calibration is possible by applying a single correction factor to the output of the MAC array, which presents a signicant reduction in complexity compared to traditional approaches that require nonlinear correction circuits for each multiplier in the array. We also showed how the 1D results can be extended to the 2D case, i.e., where the multiplier nonlinearity depends on both multiplying factors (the input, x, and the weight, w). To this end, we explored three dierent calibration approaches: 1) (x;w) calibration, 2) additively separable calibration, and 3) hybrid calibration. The rst approach is the most general of the three since it does not require any constraints on the characteristics of the target nonlinearity. In cases where the target nonlinearity is an additively separable function, we showed that the second approach can be used to realize a signicant reduction in complexity. Moreover, we showed that this approach can still be applied even in cases where the target nonlinearity is not additively separable by applying an ASA. However, this incurs an approximation penalty which we referred to as the residue. The third approach, hybrid calibration, takes an additional step with respect to the second approach by cancelling out this residue via (x;w) calibration. The work in this chapter can be further developed by leveraging more input from circuit designers. For example, referring to Fig. 6.15, we observed a bias in the dot products which was attributed to the nonzero integral of the multiplier error curves in Fig 6.13; it would be interesting to see if there are any circuit design parameters that can reduce this integral to remove this bias (perhaps in a trade-o with, for example, the maximum absolute-value of the error). In addition, recall that the proposed correction factor approach suers from nonlinearity mismatch between the multipliers, which is caused by process variation. Hence, 125 it would be useful to explore multiplier circuit designs that result in better immunity to process variation. Further improvements can be made on the 2D calibration side as well. Specically, while the 1D calibration scheme was demonstrated on nonlinearity from realistic circuit simulations, the 2D schemes utilized only heuristically designed nonlinearities. It would therefore be useful to perform circuit simulations that extract the 2D nonlinearity, and study its characteristics to see which of the three proposed calibration schemes makes the most sense in a realistic application. 126 Chapter 7 Conclusion This chapter concludes the dissertation by summarizing the key results. In addition, we provide recommendations for future work. 7.1 Summary The famous Moore's Law predicts that the number of transistors on an integrated circuit doubles every two years [71]. Although recent CMOS technology in 14nm and 10nm have taken longer than two years to develop, the better-than-expected transistor integration has kept us on track with Moore's prediction [72]. One of the major benets of this unprecedented level of integration is the ability to include on-chip digital circuity. This allows ICs to include a wide array of useful features such as microcontrollers, LUTs, FFT engines, numerically controlled oscillators (NCOs), digital up-converters (DUCs), digital down-converters (DDCs) etc., and this is certainly not an exhaustive list. This rich set of tools permits the design of calibration algorithms that drastically improve IC performance. The main focus of this dissertation was on DACs in highly-integrated RF transceiver ICs. In Chapter 3, we developed analysis and calibration of the most signicant interleaving and data timing errors that are encountered in modern, wideband CS-DACs with a times-2 interleaved architecture. Key insights were derived by analyzing these errors in isolation 127 of each other, e.g., the narrowband locking eect and the fact that data timing errors can create spurs when nite settling is considered. The analysis ultimately led to the proposal of a two-tone calibration algorithm which, unlike previous approaches, does not suer from narrowband locking and hence results in good wideband performance. Moreover, we devel- oped a fast, accurate analytical model that was leveraged to run extensive simulations of the proposed algorithm that would have been impossible to conduct via behavioral simulations. While the model was utilized for a calibration-based study, it could also be used to explore the circuit design space by mapping design tolerances to spectral impairments and mak- ing design trade-os across circuit components. We concluded Chapter 3 by demonstrating the proposed calibration algorithm on a modern, highly-integrated RF transceiver in 14nm CMOS, where the on-chip ADC was leveraged to support the calibration of the DAC. In Chapter 4, we developed, simulated, and experimentally demonstrated a NN-based calibration scheme for static nonlinearity in CS-DACs, where we again utilized the on-chip ADC as calibration support. First, we learned the DAC transfer characteristic by training a NN with input-output pairs, captured from a DAC-to-ADC measurement path. The inverse of the transfer characteristic was then mapped onto the input codes using a LUT that was used to linearize the DAC via DPD. The experimental results on a modern DAC suggested that this scheme can outperform two conventional approaches (i.e., polynomial- based DPD and DEM), especially for the the cancellation of high-order IMD products (IM5, IM7, etc.). We concluded Chapter 4 by providing recommendations for extending the NN- based approach to operate in background mode, i.e., to account for environmental variations such as temperature. 128 In Chapter 5, we studied the eect of timing errors (an important dynamic eect) on the wideband performance of CS-DACs. Specically, we leveraged the equivalent timing error model from the previous analysis [10] to study the impact of timing errors on the wideband SDR, i.e., where distortion terms beyond the rst Nyquist zone are included in the analysis. From our understanding, such a study has not been carried out previously. Moreover, the simulation results in Chapter 5 showed that our analysis is signicantly more accurate than the previous analysis under this wideband framework. In Chapter 6, we showed how some of the calibration methodologies in this dissertation (i.e., simulated annealing) can be applied to other circuits as well, where we focused on analog dot products (or MAC arrays). Specically, the focus was on the calibration of mul- tiplier nonlinearity within MAC arrays. To this end, we proposed a calibration algorithm that presents a signicant reduction in complexity relative to the standard, \brute-force" approach of requiring a nonlinear correction circuit for each multiplier in the array. Prelim- inary simulations of the algorithm were conducted on a MAC array comprised of multiplier nonlinearity from circuit simulations in GF 12LP, and the results were encouraging. Speci- cally, we showed that it is possible to apply a single additive correction factor to the output of a MAC array, which brings the dot product signicantly closer to its ideal value, as we demonstrated on hundreds of randomized dot products. While we developed the algorithm in a 1D framework (i.e, where the multiplier nonlinearity is a function of only one multiply- ing factor), we provided some guidance on how to extend it to the 2D case (i.e., where both multiplying factors are considered). We concluded Chapter 6 by proposing three dierent 2D calibration approaches, each tailored to the characteristics of the multiplier nonlinearity. 129 7.2 Recommendations for Future Work In this section, we provide recommendations for future work that would improve the results in this dissertation. Referring to Chapter 3, we highly recommend to quantify the sensitiv- ity of the DAC parameters to environmental uctuations, e.g., temperature, as this would provide guidance for designing background calibration algorithms. Specically, it would be interesting to know how the spectral impairments change as a function of temperature for each DAC parameter { perhaps some of the parameters require more accurate background tracking than others. Developing background calibration algorithms for the most sensi- tive parameters would be another useful endeavour. It would be interesting to see if such algorithms could track variations in supply voltage as well. We also recommend to further develop the background calibration study presented in Chapter 4, i.e., as outlined in Section 4.4. Referring to Chapter 5, it would be interesting to see if the wideband timing error analysis can be applied to DACs that operate in higher Nyquist zones, e.g, as in [73{75]. In addition, carrying out a spectral analysis of the timing errors for higher Nyquist zones may also be of interest, e.g., mapping timing errors to harmonic levels in the n th Nyquist zone, as this would assist in frequency planning for system design. Note that such an analysis would be more general than considering only the SDR due to timing errors. Chapter 6 presented a preliminary study of calibration algorithms for multiplier nonlin- earity in analog dot products, which are starting to receive a lot of attention for low power, hardware implementations of NNs. This work may be further developed by incorporating 130 input from circuit design experts, as described in Section 6.5. In addition, the proposed techniques would be further validated by hardware demonstrations on real MAC arrays. As a nal note, it is dicult to underscore the level of innovation and complexity of the highly-integrated RF transceiver in Fig. 1.2. At a high level, it is comprised of three main components { the DAC, PLL, and ADC. There are enough open research problems for each of these components by themselves to comprise several interesting Ph.D. dissertations. Hence, an analysis and calibration study of the DAC is only the tip of the iceberg for these highly-complex systems. With that said, it would be interesting to apply some of the calibration techniques in this dissertation to, for example, the ADC, as it is well-known that they also suer from nonlinearity and interleaving eects. For example, it may be possible to leverage the on-chip DAC to provide a calibration stimulus for the ADC and minimize the spectral impairments via simulated annealing. Developing calibration techniques along this line and comparing them with those in current literature would be another fruitful research endeavour. 131 Appendix A Two-Tone Calibration Analysis In this appendix, we motivate the use of two tones in calibration mode where only gain and C2 clock duty cycle errors are considered ( g and ). Suppose the TIDAC is excited by a two-tone calibration signal of the formx(t) = 1 2 cos(2f cal,0 t)+ 1 2 cos(2f cal,1 t),f cal,0 6=f cal,1 . This creates one interleaving spur at f s =2f cal,0 and another at f s =2f cal,1 . Both of these spurs are zero if and only if (; g ) = (0; 0), and we prove this as follows. First, recall that the interleaving spur atf s =2f cal,0 vanishes when g = g (; cal,0 ), i.e., for gain errors as in (3.8). Similarly, the interleaving spur atf s =2f cal,1 vanishes when g = g (; cal,1 ). Hence, both interleaving spurs vanish simultaneously when g (; cal,0 ) = g (; cal,1 ), i.e., sin ((1=2 cal,0 ) (1 + 2)) sin ((1=2 cal,0 ) (1 2)) = sin ((1=2 cal,1 ) (1 + 2)) sin ((1=2 cal,1 ) (1 2)) (A.1) Note that = 0 is a solution to (A.1). Moreover, it is in fact the unique solution in the interval of interest 2 [0:5; 0:5]. To prove this, we show that (A.1) cannot hold for 2A [A + , whereA = [0:5; 0) andA + = (0; 0:5]. To this end, we show that the following function f(w) = sin (w (1 + 2)) sin (w (1 2)) (A.2) 132 is one-to-one over the domain w 2 [0; 0:5] for 6= 0. A property of such a function is that f(w 0 ) = f(w 1 ) implies w 0 = w 1 . However, this property cannot hold for (A.1) since cal,0 6= cal,1 is assumed. It can be shown that the sign of the df=dw is determined by the following quantity G(w) = 2 sin(2w) sin(4w) (A.3) Note that (A.3) is a continuous function over I for any 2A [A + . It is straightforward to verify that there are no critical points in the open interval I 0 = (0; 0:5) by settingdG=dw equal to zero and solving forw. By the extreme value theorem, this means that the absolute extrema must lie at the endpoints of I, i.e., w = 0 and w = 0:5. Accordingly, we have G(0) = 0 and G(0:5) = sin(2). Therefore, if 2A + then G(w) 08w2 I. Since the sign of df=dw is determined by G(w), this also implies df=dw 08w2 I. Thus, we conclude that (A.2) is one-to-one over I for any 2A + . Analogously, we also conclude that f(w) is one-to-one over I for any 2A . Hence, (A.2) is one-to-one over I for any 2A [A + . Consequently, = 0 is the unique solution to (A.1) in the interval of interest 2 [0:5; 0:5]. Furthermore, substituting = 0 into (3.8) yields g = 0. Thus, calibration based on two distinct tones promotes convergence to the wideband solution (; g ) = (0; 0), i.e., it avoids narrowband locking at just a single frequency. 133 Appendix B Analysis of Data Timing Errors In this appendix, we derive the output spectrum coecients in (3.3) where only data timing errors are considered. Specically, referring to Table 3.1, we only consider I ; Q (C4 clock duty cycle errors) and I ; Q (C4 clock phase errors). Fig. 3.9 illustrates how these errors aect the data timing. First, recall that the analysis with ideal data timing resulted in (3.10) for the even sub-DAC. If we include C4 clock errors in the analysis (i.e., I and I ), then this becomes e even (t) = x(t) 1 X k=1 (t 4kT s ) ! p 0 (t) + x(t) 1 X k=1 (t (4k + 2)T s ) ! p 2 (t) (B.1) where p 0 (t) =e 1 (t+Ts+2 I Ts 2 I Ts) h(t) (B.2a) p 2 (t) =e 1 (t+Ts2 I Ts 2 I Ts) h(t) (B.2b) x(t) is dened in (3.11), and h(t) = rect(t=T s ). We observe that (B.1) has two separate terms, unlike in (3.10). This is a consequence of duty cycle error I , since samplesx[4k] re 134 early by 2 I T s and samplesx[4k + 2] re late by the same quantity, as shown in Fig. 3.9(a). For convenience, we rewrite (B.1) as e even (t) = c 0 x(t) 1 X k=1 (t 4kT s ) ! p(t) + c 2 x(t) 1 X k=1 (t (4k + 2)T s ) ! p(t) (B.3) where c 0 =e 1 (2 I Ts 2 I Ts) (B.4a) c 2 =e 1 (2 I Ts 2 I Ts) (B.4b) andp(t) is dened in (3.12), which is the settling pulse with ideal data timing. Note that I and I are now embedded in the constants in (B.4). Analogously, for the odd sub-DAC we have e odd (t) = c 1 x(t) 1 X k=1 (t (4k + 1)T s ) ! p(t) + c 3 x(t) 1 X k=1 (t (4k + 3)T s ) ! p(t) (B.5) where c 1 =e 1 (2 Q Ts 2 Q Ts) (B.6a) c 3 =e 1 (2 Q Ts 2 Q Ts) (B.6b) 135 Proceeding with the Fourier transform of (B.3) yields E even (f) = f s 4 P (f) 1 X k=1 X(fkf s =4) c 0 + (1) k c 2 (B.7) where X(f) andP (f) are the Fourier transforms of (3.11) and (3.12), respectively. Further simplifying by separating the even and odd terms yields E even (f) = f s 4 c + ( I ; I )P (f) 1 X k=1 X(fkf s =2) f s 4 c ( I ; I )P (f) 1 X k=1 X(f (2k + 1)f s =4) (B.8) wherec + (u;v) andc (u;v) are dened in (3.14). Furthermore, note that the Fourier trans- form of (3.11) is X(f) =M(f)P (f) whereM(f) andP (f) are dened in (3.15) and (3.16), respectively. Substituting this into (B.8) yields E even (f) = f s 4 c + ( I ; I )P (f)M(f) 1 X k=1 X(fkf s =2) f s 4 c ( I ; I )P (f)M(f +f s =4) 1 X k=1 X(f (2k + 1)f s =4) (B.9) 136 where we have used the fact that M(f kf s =2) = M(f) and M(f (2k + 1)f s =2) = M(f +f s =4). Similarly, the settling error spectrum for the odd sub-DAC may be derived by taking the Fourier transform of (B.5), which is E odd (f) = f s 4 c + ( Q ; Q )P (f)M(f) 1 X k=1 X(fkf s =2) (1) k +j f s 4 c ( Q ; Q )P (f)M(f +f s =4) 1 X k=1 X(f (2k + 1)f s =4) (1) k (B.10) where j = p 1 is the imaginary unit. Finally, the output spectrum, Y (f), is the sum of (B.9), (B.10), and (3.2b), which is of the form (3.3) with coecients as in (3.13). 137 Appendix C Approximation Value Analysis C.1 Approximation Value Selection Suppose we have some continuous function e(x) over a quantized unit interval [0; 1]. The objective is to nd an approximation value c such that c = arg min c max x2B je(x)cj (C.1) over the closed intervalB = [a;b], whereB is a particular quantization bin. SinceB is a closed interval,e(x) attains both a maximum and minimum value onB, which we denote by e max and e min , respectively. Hence, c = arg min c maxfe max c;ce min g (C.2) Now letc =e mid +, wheree mid = 1 2 (e max +e min ). We may then write (C.2) in terms of as = arg min maxf +;g (C.3) 138 where = emaxe min 2 . It follows from (C.3) that = 0 which implies c = e mid . Therefore, the approximation value that satises objective (C.1) is the average of the maximum and minimum values of e(x) over the quantization bin,B. C.2 Approximation Value Computation In this appendix, we provide an algorithm that computes the approximation values in (6.8), which amounts to computing the terms in the numerator, i.e., the maximum and minimum values of the error e(x) in each bin. Before formally stating the algorithm, we describe how to compute the maximum of the error in the k th bin, i.e., in the closed interval [s k1 ;s k ]. Since we are working with a continuous function over a closed interval, this maximum is guaranteed to exist; moreover, it occurs at either an endpoint (i.e., s k1 or s k ) or a critical point in the open interval (s k1 ;s k ) for which e 00 (x) < 0. Using the k th bin in Fig. 6.8 as an example, there are 3 points to check { the two endpoints and the point in the middle of the bin where e 0 (x) = 0. In this case, it is clear by visual inspection that the critical point results in the maximum. Since the minimum of the error in the k th bin can be computed analogously, we can develop an algorithm to compute the approximation values, as outlined in Algorithm 3. 139 Algorithm 3: Approximation Value Computation. Input: e(x), e 00 (x),R =frje 0 (r) = 0g, s 0 s N1 Output: c 0 c N Initialize: s 1 0, s N 1 1 for k 0 to N do 2 max candidates [ ] 3 min candidates [ ] 4 5 max candidates.append(max(e(s k1 );e(s k )) 6 min candidates.append(min(e(s k1 );e(s k )) 7 for r inR do 8 if s k1 <r 0 then 10 min candidates.append(e(r)) 11 else if e 00 (r)< 0 then 12 max candidates.append(e(r)) 13 c k 1 2 (max(max candidates) + min(min candidates)) 140 References [1] Q. Gu, RF System Design of Transceivers for Wireless Communications, 1st ed. Springer Publishing Company, Incorporated, 2010. [2] G. Hueber and R. Staszewski, Multi-Mode / Multi-Band RF Transceivers for Wireless Communications: Advanced Techniques, Architectures, and Trends. Wiley, 2011. [3] R. Levinson, C. Hornbuckle, and K. Dyer, \A monolithic analog to digital converter in 32nm CMOS for broadband phased array applications," in 2013 IEEE International Conference on Microwaves, Communications, Antennas and Electronic Systems (COM- CAS 2013), Oct 2013, pp. 1{13. [4] C. G. Tsinos, S. Maleki, S. Chatzinotas, and B. E. Ottersten, \Hybrid analog-digital transceiver designs for cognitive large-scale antenna array systems," CoRR, vol. abs/1612.02957, 2016. [Online]. Available: http://arxiv.org/abs/1612.02957 [5] W. Hong, Z. H. Jiang, C. Yu, J. Zhou, P. Chen, Z. Yu, H. Zhang, B. Yang, X. Pang, M. Jiang, Y. Cheng, M. K. T. Al-Nuaimi, Y. Zhang, J. Chen, and S. He, \Multibeam antenna technologies for 5G wireless communications," IEEE Transactions on Antennas and Propagation, vol. 65, no. 12, pp. 6231{6249, 2017. [6] B. Ku, P. Schmalenberg, O. Inac, O. D. Gurbuz, J. S. Lee, K. Shiozaki, and G. M. Rebeiz, \A 77{81-GHz 16-element phased-array receiver with 50 beam scanning for advanced automotive radars," IEEE Transactions on Microwave Theory and Tech- niques, vol. 62, no. 11, pp. 2823{2832, 2014. [7] T. Jerey, Phased-Array Radar Design: Application of Radar Fundamentals, ser. Electromagnetics and Radar. Institution of Engineering and Technology, 2009. [Online]. Available: https://books.google.com/books?id=UUMhCFWl5hEC [8] D. Beauchamp and K. M. Chugg, \Machine learning based image calibration for a twofold time-interleaved high speed DAC," in 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS), 2019, pp. 908{912. [9] P. Caragiulo, O. E. Mattia, A. Arbabian, and B. Murmann, \A 2x time-interleaved 28- GS/s 8-bit 0.03-mm 2 switched-capacitor DAC in 16-nm FinFET CMOS," IEEE Journal of Solid-State Circuits, vol. 56, no. 8, pp. 2335{2346, 2021. 141 [10] K. Doris, A. Roermund, and D. Leenaerts, Wide-Bandwidth High-Dynamic Range D/A Converters. Springer, Boston, MA, 01 2006. [11] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals & Systems (2nd Ed.). USA: Prentice-Hall, Inc., 1996. [12] Myung-Jun Choe, Kwang-Hyun Baek, and M. Teshome, \A 1.6GS/s 12b return-to-zero GaAs RF DAC for multiple nyquist operation," in ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005., 2005, pp. 112{587 Vol. 1. [13] B. Razavi, \The current-steering DAC [a circuit for all seasons]," IEEE Solid-State Circuits Magazine, vol. 10, no. 1, pp. 11{15, 2018. [14] K. O. Andersson, \Modeling and implementation of current-steering digital-to-analog converters," Ph.D. dissertation, The Institute of Technology at Link oping University, 2005. [15] K. Doris, A. van Roermund, and D. Leenaerts, \Mismatch-based timing errors in current steering DACs," in Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03., vol. 1, 2003, pp. I{I. [16] P. Carbone, S. Kiaei, and F. Xu, Design, Modeling and Testing of Data Converters, ser. Signals and Communication Technology. Springer Berlin Heidelberg, 2013. [Online]. Available: https://books.google.com/books?id=8vi3BAAAQBAJ [17] S. M. McDonnell, V. J. Patel, L. Duncan, B. Dupaix, and W. Khalil, \Compensa- tion and calibration techniques for current-steering DACs," IEEE Circuits and Systems Magazine, vol. 17, no. 2, pp. 4{26, 2017. [18] A. K. Baranwal, Anurag, and B. Singh, \Design and analysis of 8 bit fully segmented digital to analog converter," in 2015 2nd International Conference on Recent Advances in Engineering Computational Sciences (RAECS), 2015, pp. 1{4. [19] I. Galton, \Why dynamic-element-matching DACs work," IEEE Transactions on Cir- cuits and Systems II: Express Briefs, vol. 57, no. 2, pp. 69{74, 2010. [20] M. Shen, J. Tsai, and P. Huang, \Random swapping dynamic element matching tech- nique for glitch energy minimization in current-steering DAC," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 57, no. 5, pp. 369{373, 2010. [21] D. Lee, T. Kuo, and K. Wen, \Low-cost 14-bit current-steering DAC with a randomized thermometer-coding method," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 56, no. 2, pp. 137{141, 2009. [22] I. Galton, \Why dynamic-element-matching DACs work," IEEE Transactions on Cir- cuits and Systems II: Express Briefs, vol. 57, no. 2, pp. 69{74, 2010. 142 [23] W.-C. Kim, D.-s. Jo, Y.-J. Roh, Y.-D. Kim, and S.-T. Ryu, \A 6b 28GS/s four-channel time-interleaved current-steering DAC with background clock phase calibration," in 2019 Symposium on VLSI Circuits, 2019, pp. C138{C139. [24] E. Olieman, A. Annema, and B. Nauta, \An interleaved full nyquist high-speed DAC technique," IEEE Journal of Solid-State Circuits, vol. 50, no. 3, pp. 704{713, March 2015. [25] L. Zhou, W. Li, D. Wu, F. Jiang, D. Xue, J. Wu, and X. Liu, \A 30Gsps 6bit DAC in SiGe BiCMOS technology," in 2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2016, pp. 37{40. [26] E. Olieman, \Time-interleaved high-speed D/A converters," Ph.D. dissertation, Uni- versity of Twente, Netherlands, Mar. 2016. [27] S. Kim, W. Kim, M. Seo, and S. Ryu, \A 65-nm CMOS 6-bit 20 GS/s time-interleaved DAC with full-binary sub-DACs," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 9, pp. 1154{1158, 2018. [28] H. Huang, J. Heilmeyer, M. Gr ozing, M. Berroth, J. Leibrich, and W. Rosenkranz, \An 8-bit 100-gs/s distributed DAC in 28-nm cmos for optical communications," IEEE Transactions on Microwave Theory and Techniques, vol. 63, no. 4, pp. 1211{1218, 2015. [29] L. Angrisani and M. D'Arco, \Modeling timing jitter eects in digital-to-analog con- verters," IEEE Transactions on Instrumentation and Measurement, vol. 58, no. 2, pp. 330{336, 2008. [30] X. Geng, Y. Tian, Y. Xiao, Z. Ye, Q. Xie, and Z. Wang, \A 25.8GHz integer-N PLL with time-amplifying phase-frequency detector achieving 60fsrms jitter, -252.8dB FoMJ, and robust lock acquisition performance," in 2022 IEEE International Solid- State Circuits Conference (ISSCC), vol. 65, 2022, pp. 388{390. [31] M. Mansuri, A. Hadiashar, and C.-K. K. Yang, \Methodology for on-chip adaptive jitter minimization in phase-locked loops," IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 50, no. 11, pp. 870{878, 2003. [32] S. Balasubramanian, G. Creech, J. Wilson, S. M. Yoder, J. J. McCue, M. Verhelst, and W. Khalil, \Systematic analysis of interleaved digital-to-analog converters," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 58, no. 12, pp. 882{886, 2011. [33] S. Su and M. S.-W. Chen, \A 12-bit 2 GS/s dual-rate hybrid DAC with pulse-error pre- distortion and in-band noise cancellation achieving >74 dBc SFDR and <-80 dBc IM3 up to 1 GHz in 65 nm CMOS," IEEE Journal of Solid-State Circuits, vol. 51, no. 12, pp. 2963{2978, 2016. [34] T. Chen and G. Gielen, \The analysis and improvement of a current-steering DACs dynamic SFDR-I: the cell-dependent delay dierences," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 53, no. 1, pp. 3{15, 2006. 143 [35] Y. Cong and R. Geiger, \A 1.5-v 14-bit 100-MS/s self-calibrated DAC," IEEE Journal of Solid-State Circuits, vol. 38, no. 12, pp. 2051{2060, 2003. [36] J. Savoj, A. Abbasfar, A. Amirkhany, M. Jeeradit, and B. W. Garlepp, \A 12-GS/s phase-calibrated CMOS digital-to-analog converter for backplane communications," IEEE Journal of Solid-State Circuits, vol. 43, no. 5, pp. 1207{1216, 2008. [37] J. Kim, A. Balankutty, R. K. Dokania, A. Elshazly, H. S. Kim, S. Kundu, D. Shi, S. Weaver, K. Yu, and F. O'Mahony, \A 112 Gb/s PAM-4 56 Gb/s NRZ recongurable transmitter with three-tap FFE in 10-nm nFET," IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 29{42, 2019. [38] L. M. Rios and N. V. Sahinidis, \Derivative-free optimization: a review of algorithms and comparison of software implementations," Journal of Global Optimization, vol. 56, no. 3, pp. 1247{1293, 07 2013. [39] E. Jacobsen and R. Lyons, \The sliding DFT," IEEE Signal Processing Magazine, vol. 20, no. 2, pp. 74{80, 2003. [40] C. Su and R. L. Geiger, \Dynamic calibration of current-steering DAC," in 2006 IEEE International Symposium on Circuits and Systems. IEEE, 2006, pp. 4{pp. [41] Y. Tang, J. Briaire, K. Doris, R. van Veldhoven, P. C. van Beek, H. J. A. Hegt, and A. H. van Roermund, \A 14 bit 200 MS/s DAC with SFDR > 78 dBc, IM3 <-83 dBc and NSD<-163 dBm/Hz across the whole Nyquist band enabled by dynamic-mismatch mapping," IEEE Journal of Solid-State Circuits, vol. 46, no. 6, pp. 1371{1381, 2011. [42] W. Kester, \Taking the mystery out of the infamous formula," SNR= 6.02 N+ 1.76 dB," and why you should care," Analog Devices Tutorial, MT-001 Rev. A, vol. 10, no. 08, 2009. [43] M. El Chammas and B. Murmann, Background Calibration of Time-Interleaved Data Converters. Springer, New York, NY, 2011. [44] M. Bagheri, F. Schembari, H. Zare-Hoseini, R. B. Staszewski, and A. Nathan, \Inter- channel mismatch calibration techniques for time-interleaved SAR ADCs," IEEE Open Journal of Circuits and Systems, vol. 2, pp. 420{433, 2021. [45] Q. Tang, H. Zhou, A. Tiwari, J. Stewart, Q. Qu, D. Zhang, and H. Hemmati, \Experi- mental demonstration of digital pre-distortion for millimeter wave power ampliers with GHz bandwidth," in 2018 IEEE Topical Conference on RF/Microwave Power Ampli- ers for Radio and Wireless Applications (PAWR), Jan 2018, pp. 58{61. [46] J. Wood, \Digital pre-distortion of lrf power ampliers," in 2017 IEEE Topical Confer- ence on RF/Microwave Power Ampliers for Radio and Wireless Applications (PAWR), Jan 2017, pp. 1{3. 144 [47] C. Daigle, A. Dastgheib, and B. Murmann, \A 12-bit 800-MS/s switched-capacitor DAC with open-loop output driver and digital predistortion," in 2010 IEEE Asian Solid-State Circuits Conference, 2010, pp. 1{4. [48] A. Dastgheib, \Calibration ADC and algorithm for adaptive predistortion of high-speed DACs," Ph.D. dissertation, Stanford University, 2013. [49] A. Janczak, Identication of Nonlinear Systems Using Neural Networks and Polyno- mial Models: A Block-Oriented Approach (Lecture Notes in Control and Information Sciences). Berlin, Heidelberg: Springer-Verlag, 2004. [50] S. Dey, S. C. Kanala, K. M. Chugg, and P. A. Beerel, \Deep-n-Cheap: An automated search framework for low complexity deep learning," arXiv e-print arXiv:2004.00974, 2020. [51] A. F. Agarap, \Deep learning using rectied linear units (relu)," arXiv preprint arXiv:1803.08375, 2018. [52] D. Kingma and J. Ba, \Adam: A method for stochastic optimization," International Conference on Learning Representations, 12 2014. [53] H. Zhu, W. Yang, G. Engel, and Y.-B. Kim, \A two-parameter calibration technique tracking temperature variations for current source mismatch in DACs," IEEE Transac- tions on Circuits and Systems II: Express Briefs, vol. 64, pp. 1{1, 01 2016. [54] D. J. Stoops, J. Kuo, P. J. Hurst, B. C. Levy, and S. H. Lewis, \Digital background calibration of a split current-steering DAC," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 8, pp. 2854{2864, 2019. [55] A. Bugeja and B.-S. Song, \A self-trimming 14-b 100-MS/s CMOS DAC," IEEE Journal of Solid-State Circuits, vol. 35, no. 12, pp. 1841{1852, 2000. [56] I. Galton, \Why dynamic-element-matching DACs work," IEEE Transactions on Cir- cuits and Systems II: Express Briefs, vol. 57, no. 2, pp. 69{74, Feb 2010. [57] M. Tsagris, C. Beneki, and H. Hassani, \On the folded normal distribution," Mathematics, vol. 2, no. 1, p. 12{28, Feb 2014. [Online]. Available: http: //dx.doi.org/10.3390/math2010012 [58] M. Leib, W. Menzel, B. Schleicher, and H. Schumacher, \Vital signs monitoring with a UWB radar based on a correlation receiver," in Proceedings of the Fourth European Conference on Antennas and Propagation, 2010, pp. 1{5. [59] B. Aazhang and H. Poor, \Performance of DS/SSMA communications in impulsive channels. ii. hard-limiting correlation receivers," IEEE Transactions on Communica- tions, vol. 36, no. 1, pp. 88{97, 1988. [60] L. Weiss, \Wavelets and wideband correlation processing," IEEE Signal Processing Mag- azine, vol. 11, no. 1, pp. 13{32, 1994. 145 [61] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, \High-speed tracking with kernelized correlation lters," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583{596, 2015. [62] D. Bankman and B. Murmann, \An 8-bit, 16 input, 3.2 pj/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS," in 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2016, pp. 21{24. [63] A. Jaiswal, I. Chakraborty, A. Agrawal, and K. Roy, \8T SRAM cell as a multibit dot-product engine for beyond von neumann computing," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 11, pp. 2556{2567, 2019. [64] L. Gao, F. Alibart, and D. B. Strukov, \Analog-input analog-weight dot-product oper- ation with ag/a-si/pt memristive devices," in 2012 IEEE/IFIP 20th International Con- ference on VLSI and System-on-Chip (VLSI-SoC). IEEE, 2012, pp. 88{93. [65] Y. Wang, H. Tang, Y. Xie, X. Chen, S. Ma, Z. Sun, Q. Sun, L. Chen, H. Zhu, J. Wan et al., \An in-memory computing architecture based on two-dimensional semiconductors for multiply-accumulate operations," Nature communications, vol. 12, no. 1, pp. 1{8, 2021. [66] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song, N. D avila, C. E. Graves et al., \Analogue signal and image processing with large memristor cross- bars," Nature electronics, vol. 1, no. 1, pp. 52{59, 2018. [67] M. Kang, S. K. Gonugondla, A. Patil, and N. R. Shanbhag, \A multi-functional in- memory inference processor using a standard 6T SRAM array," IEEE Journal of Solid- State Circuits, vol. 53, no. 2, pp. 642{655, 2018. [68] G. Han and E. Sanchez-Sinencio, \CMOS transconductance multipliers: a tutorial," IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 45, no. 12, pp. 1550{1563, 1998. [69] B. Gilbert, \A precise four-quadrant multiplier with subnanosecond response," IEEE Journal of Solid-State Circuits, vol. 3, no. 4, pp. 365{373, 1968. [70] S. F. Bellenot, \Additively separable functions of the form F (x;y) = f(x) +g(y)," https://www.math.fsu.edu/ bellenot/class/s05/cal3/proj/project.pdf. [71] G. E. Moore, \Cramming more components onto integrated circuits, reprinted from electronics, volume 38, number 8, april 19, 1965, pp.114 ." IEEE Solid-State Circuits Society Newsletter, vol. 11, no. 3, pp. 33{35, 2006. [72] M. T. Bohr and I. A. Young, \CMOS scaling trends and beyond," IEEE Micro, vol. 37, no. 6, pp. 20{29, 2017. [73] M. Kazuno, M. Motoyoshi, S. Kameda, and N. Suematsu, \A study on the SNR in higher Nyquist zone of 1-bit low-pass delta-sigma RZ-DAC," in 2017 IEEE Asia Pacic Microwave Conference (APMC). IEEE, 2017, pp. 918{921. 146 [74] S. Y.-S. Chen, N.-S. Kim, and J. Rabaey, \A 10b 600MS/s multi-mode CMOS DAC for multiple Nyquist zone operation," in 2011 Symposium on VLSI Circuits - Digest of Technical Papers, 2011, pp. 66{67. [75] M. Kazuno, M. Motoyoshi, S. Kameda, and N. Suematsu, \26 GHz-band direct digital signal generation by a manchester coding 1-bit band-pass delta-sigma modulator using it's 7th Nyquist zone," in 2018 11th Global Symposium on Millimeter Waves (GSMM), 2018, pp. 1{3. 147
Abstract (if available)
Abstract
RF communication systems have undergone a paradigm shift, which is in part due to the advancement of data converter technology, where sample rates are now several gigasamples-per-second (GS/s) with high resolution in compact, deep-submicron processes. This allows the converters to be placed closer to the antenna in a “direct RF” sampling configuration. Such a configuration is the preferred implementation of future communication systems, as it results in the highest system performance, lowest power consumption, and lowest cost. In direct RF sampling, operations such as filtering, up-conversion, and down-conversion are moved into the digital domain, which eliminates the need for costly, power-hungry, and bulky analog components. Moreover, transmitters and receivers following this simplified architecture may coexist on a single RF Integrated Circuit (IC).
The primary focus of this dissertation is on modern digital-to-analog converters (DACs) in highly-integrated RF transceiver ICs. First, we provide a detailed analysis of the significant spectral impairments associated with wideband, times-2 interleaved current-steering DACs (CS-DACs). The analysis leads to the proposal of a calibration algorithm based on machine learning, which, unlike single-tone approaches, does not suffer from narrowband locking (i.e., where the calibration algorithm is only effective near the calibration frequency). We run extensive simulations of this algorithm by developing and leveraging a fast, accurate analytical model of the spectral impairments. In addition, we demonstrate the algorithm experimentally on a commercial RF transceiver IC in 14nm CMOS. Consistent with the trend toward high integration, the transceiver also contains a high-speed analog-to-digital converter (ADC), an embedded MCU, and programmable control over the spectral impairments; we show how these can be used to support the calibration of the DAC. Beyond DAC analysis and calibration, this dissertation also shows how a similar machine learning approach can be applied to analog dot products, which are starting to receive significant attention for low-power, hardware implementations of neural networks (NNs).
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Digital to radio frequency conversion techniques
PDF
Nonuniform sampling and digital signal processing for analog-to-digital conversion
PDF
Charge-mode analog IC design: a scalable, energy-efficient approach for designing analog circuits in ultra-deep sub-µm all-digital CMOS technologies
PDF
Mixed-signal integrated circuits for interference tolerance in wireless receivers and fast frequency hopping
PDF
Analog and mixed-signal parameter synthesis using machine learning and time-based circuit architectures
PDF
Exploring complexity reduction in deep learning
PDF
Towards high-performance low-cost AMS designs: time-domain conversion and ML-based design automation
PDF
High power, highly efficient millimeter-wave switching power amplifiers for watt-level high-speed silicon transmitters
PDF
Improving spectrum efficiency of 802.11ax networks
PDF
A generic spur and interference mitigation platform for next generation digital phase-locked loops
PDF
Silicon photonics integrated circuits for analog and digital optical signal processing
PDF
Improving the speed-power-accuracy trade-off in low-power analog circuits by reverse back-body biasing
PDF
A biomimetic approach to non-linear signal processing in ultra low power analog circuits
PDF
Understanding the characteristics of Internet traffic dynamics in wired and wireless networks
PDF
Graph machine learning for hardware security and security of graph machine learning: attacks and defenses
PDF
Sampling theory for graph signals with applications to semi-supervised learning
PDF
Bidirectional neural interfaces for neuroprosthetics
PDF
Utilizing context and structure of reward functions to improve online learning in wireless networks
PDF
Low-power, dual sampling-rate, shared-architecture ADC for implantable biomedical systems
PDF
Memristor for parallel and analog data processing in the era of big data
Asset Metadata
Creator
Beauchamp, Daniel
(author)
Core Title
Calibration of digital-to-analog converters in highly-integrated RF transceivers using machine learning
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2022-12
Publication Date
09/06/2022
Defense Date
08/25/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
analog dot products,calibration,CS-DAC,current-steering,DAC,OAI-PMH Harvest,SFDR,TIDAC,time-interleaved DAC,wideband
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Chugg, Keith (
committee chair
), Chen, Mike (
committee member
), Golubchik, Leana (
committee member
)
Creator Email
dan.beauchamp@gmail.com,dbeaucha@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111764645
Unique identifier
UC111764645
Legacy Identifier
etd-BeauchampD-11179
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Beauchamp, Daniel
Type
texts
Source
20220908-usctheses-batch-978
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
analog dot products
calibration
CS-DAC
current-steering
DAC
SFDR
TIDAC
time-interleaved DAC
wideband