

# SiGe based ROM-less 18.5 GHz Clock Direct Digital Synthesizer Design and Characterization

Dissertation zur Erlangung des Doktorgrades der Naturwissenschaften

vorgelegt beim Fachbereich Physik der Johann Wolfgang Goethe-Universität in Frankfurt am Main

> von **Amit Shrestha** aus Jhapa, Nepal

Frankfurt am Main 2020

(D30)

vom Fachbereich Physik der J. W. Goethe-Universität als Dissertation angenommen.

Dekan: Prof. Dr. Harald Appelshäuser

Gutachter: Prof. Dr.-Ing. habil. Viktor Krozer, Prof. Dr.-Ing. Lars Hedrich

Datum der Disputation:

# Zusammenfassung

Die Anforderung an den vielseitigen Signalgenerator ist in HF- und Kommunikationssystemen wichtig. Die herkömmliche Technik zur Erzeugung eines solchen Signals ist der Voltage-Control-Oszillator (VCO). Sie hat jedoch ein minderwertiges Phasenrauschen und eine schmale Bandbreite. Sein Phasenrauschen wird durch verschiedene Parameter beeinflusst, die mit der Oszillatorschaltung zusammenhängen, z.B. Transistorgröße & Rauschen, Vorspannungsstrom, Rauschen, das aus der Vorspannungsversorgung austritt, usw. Die Bandbreite ist begrenzt, weil die Eingangsspannung & eine Ausgangsfrequenz des VCOs ist über den Abstimmbereich nicht linear. Das Phasenrauschen und SFDR des VCO-Ausgangs werden durch die Verwendung der Phase-Lock-Technik verstärkt. Die Phase-Locked-Loop(PLL) verwendet das Rückkopplungssystem, das die vom VCO eingestellte Referenzfrequenz verriegelt [1]. Die Einschwingzeit der PLL ist jedoch aufgrund der Rückkopplungsregelschleife höher. Die höhere Einschwingzeit bedeutet eine längere Frequenzumschaltzeit zwischen den PLL-Ausgängen.

YIG-Oszillator ist trotz seiner im Vergleich zum VCO niedrigeren Betriebsfrequenz für Anwendungen mit großer Bandbreite geeignet. Darüber hinaus kann die Signalerzeugung durch Freie-Elektronen-Strahlung, optische Laser, Gunn-Dioden erreicht werden und sie können sogar im THz-Bereich arbeiten. All diese Signalgeneratoren leiden unter langsamer Frequenzschaltung, fehlender digitaler Steuerbarkeit und fortgeschrittener Modulationsfähigkeit, obwohl ihre Betriebsfrequenz im THz-Bereich liegt. Alternativ kann der AWG (Arbitrary Wave Generator) einen breiten Frequenzbereich mit geringem Phasenrauschen, einschließlich digitaler Steuerbarkeit, erzeugen. Eine der Hauptkomponenten des AWG ist der direkte digitale Synthesizer (DDS). Im Allgemeinen besteht er aus einem Phasenakkumulator, einem Digital/Analog-Wandler, sinusförmigen Abbildungsschaltungen und einem Tiefpassfilter. Er benötigt einen Referenztakt, der als Abtastwert der DDS-Ausgänge dient. Seine Ausgangsfrequenz kann durch Anwendung eines geeigneten digitalen Eingangscodes variiert werden. Obwohl die maximale Betriebsfrequenz niedriger ist als bei allen anderen Alternativen (VCO, PLL, YIG-Oszillatoren, Laserdiode usw.), weist der DDS die folgenden wichtigen Merkmale auf:

- Breite Bandbreite: Im Idealfall beginnt die Bandbreite des DDS von nahe DC bis zur Hälfte des Referenztaktes.
- **Diskrete Ausgangsfrequenzen:** Die DDS-Ausgabe wird von der Uhr abgetastet. Die Ausgangsfrequenz ist die Funktion des digitalen Eingangscodes und des Taktes. Daher gibt es keinen Frequenzversatz.

- Frequenz-Nummern : Er hat eine große Anzahl von Frequenzausgängen, die von der Größe des Akkumulators abhängen.
- Schnelle Frequenzumschaltung: Da es keine Rückkopplungsschleife wie bei PLL gibt, verfügt der DDS über eine schnelle Frequenzumschaltung.
- **Modulationsfähigkeit:** EEs verfügt über eine schnelle Frequenzumschaltung, und die digitale Steuerbarkeit ist über digitale Eingangscodes möglich. Es kann in verschiedenen Modulationsschemata verwendet werden.

#### DDS design

Das DDS, das eine hohe Taktfrequenz, große Ausgangsfrequenzpunkte, hohe SFDR, Modulationsfähigkeit, geringes Phasenrauschen und niedrigen Stromverbrauch aufweist; erfordert intensive Forschung zur Auswahl der geeigneten Technologie, zum Design der DDS & individuellen Blockarchitektur, zum Entwurf von Hochgeschwindigkeitsschaltungen und zum Layout. Darüber hinaus ist auch die Endmontage entscheidend, um sie in realen HF-Systemen einsetzen zu können.

#### **DDS** Architektur

Das herkömmliche ROM-basierte DDS ist nicht durchführbar, da zur Speicherung der Sinuswellenamplituden eine große Anzahl von Speicherblocks erforderlich ist. Solche Techniken werden für das DDS bei der niedrigeren Betriebsfrequenz (< 5 GHz) verwendet. Im ROM-basierten DDS werden für M-Bit-Phasenauflösung und N-Bit-Amplitudenauflösung eine Anzahl von  $2^{M} \times N$  Registern oder ROM-Zellen benötigt. Zum Beispiel für 12-Bit-Phasenauflösung und 6-Bit-Amplitudenauflösung sind 24576 Speicherzellen erforderlich. Die Speicherung aller Amplitudeninformationen in der Nachschlagetabelle erfordert einen erheblichen Platzbedarf, bedeutet eine größere Fläche und einen höheren Stromverbrauch. Um die ROM-Größe klein zu halten, verwenden verschiedene Komprimierungstechniken unterschiedliche Komprimierungsarchitekturen. Trotz der mit Komprimierungstechniken verbundenen Anstrengungen wird die maximale Taktfrequenz durch den Anstieg des Stromverbrauchs und der Fläche des ROMs immer noch begrenzt.

Es gibt eine CORDIC(coordinate rotating digital computer)-Technik für DDS mit einer höheren Frequenz. Aufgrund der Anforderung der digitalen Blöcke erfordert die Implementierung eines solchen Codes jedoch einen hohen Stromverbrauch [2]. Andererseits kann der nichtlineare DAC-basierte (ROM-lose) Code auch für Hochgeschwindigkeits-DDS verwendet werden. Allerdings hat er auch einen hohen Stromverbrauch und eine hohe Schaltungskomplexität. Wegen der großen Anzahl (mehrere hundert) von Einheitsstromquellen, die für die Gewichtung der Sinuswellenamplituden benötigt werden. Daher wird die TSCbasierte ROM-lose Architektur für DDS verwendet, die im Multi-GHz-Bereich arbeiten kann, wo die Sinus-Abbildung unter Verwendung des linearen DAC und des TSC durchgeführt wird. Drei Hauptbausteine (Akkumulator, DAC und TSC) sind so optimiert, dass sie jenseits von 20 GHz arbeiten und eine ähnliche SFDR-Leistung bieten.

#### Hochgeschwindigkeitsschaltungen design

Die im DDS verwendete Hochgeschwindigkeitsschaltung ist für die Gesamtgeschwindigkeit des Hochgeschwindigkeits-DDS die wichtigste. Alle Schaltungen sind in 0,25  $\mu$ m SiGe-Technologie (SG25H4) ausgeführt. Sie bietet den HBT f<sub>T</sub>/f<sub>MAX</sub> von 180/220 GHz. Der Hauptblock des DDS ist ein Akkumulator. Er ist eine Kombination aus einer großen Anzahl digitaler Logik. Um sie mit der höchsten Frequenz zu betreiben, sollte die volldigitale Logik am schnellsten sein. Dazu wird eine Current-Mode-Logik (CML) mit induktivem Peaking in allen logischen Gattern des DDS verwendet. Die Ausbreitungsverzögerung und Schaltzeit des kaskodierten CML-Inverters mit induktivem Peaking beträgt 3,5 ps bzw. 5,5 ps. Alle digitalen Gatter sind so ausgelegt & optimiert, dass sie mit der schnellstmöglichen Geschwindigkeit arbeiten.

#### Phase akkuumulator design

Für einen 12-Bit-Akkumulator wird eine Teil-Pipeline-Architektur verwendet. Sie besteht aus acht 2-Bit-Adder, zwei 1-Bit-Adder, sechs Registern, sieben XOR-Gattern und zwölf Treibern. Der H-Baum für die Taktsynchronisation im Akkumulator besteht aus sechzehn Registern und fünf Puffern. Die Akkumulatorausgabe wird auf 7 Bits abgeschnitten, wobei MSB für die digitale Dreiecksgenerierung verwendet wird und die restlichen 6 Bits zur Ansteuerung eines konsequenten 6-Bit-DACs verwendet werden. Er kann bis zu 2048 Frequenzpunkte synthetisiert werden und hat eine SFDR von 38,15 dBc. Seine Fläche und Leistungsaufnahme betragen 0,88 mm<sup>2</sup> bzw. 1,23 W.

#### Triangle to sinus converter (TSC)

Die Sinuskartierung erfolgt in diesem DDS durch die Kombination von DAC und TSC. Differentialschaltungen haben die hyperbolische Tangentenübertragungscharakteristik für die hohe Signalanregung. Daher wandelt der TSC die vom DAC erzeugte Dreieckswelle unter Verwendung der translinearen Eigenschaften der Differentialpaarschaltungen in die Sinuswelle um. Der TSC ist kompakt in der Groeße, energieeffizient und bietet eine große Bandbreite. Die Signalqualitaet (z.B. SFDR) des TSC-Ausgangs haengt von der optimalen Anregung eines Eingangssignals und der praezisen Vorspannung furr die entsprechende unausgereifte Saettigung der Differentialpaarschaltungen ab. Dieser TSC hat eine aktive Flaeche von nur  $0,01 \text{ mm}^2$  und eine Gesamtleistungsaufnahme von 50 mW. Er hat eine Betriebsbandbreite von bis zu 25 GHz und bietet einen SFDR von 30 bis 42 dBc.

#### **DDS** ergebnisse

Dieses DDS hat den höchsten Takt (18, 5GHz) in der SiGe-Technologie und den dritthöchsten Takt (nach [3], [4]) unter allen anderen Technologien. Einer der Hauptgründe dafür, dass [3] (32 GHz), [4] (24 GHz) wegen des schnelleren Knotens einen höheren Takt haben, nämlich  $f_T = 350$  GHz im Vergleich zu 180 GHz in diesem DDS. Dies ist das erste bisher gemeldete gemessene Phasenrauschen des Ausgangs des Hochgeschwindigkeits-DDS (>10 GHz Takt). Diese Arbeit und Laemmle et al. [5] weisen mehrere Gemeinsamkeiten auf. Beide wurden in derselben SiGe-Technologie entwickelt, teilen dieselbe ROM-lose TSC-basierte Sinus-Mapping-Architektur und arbeiten in einem GHz-takt. Daher benötigen sie einen direkten Vergleich. Diese Arbeit erreicht  $F_{clk}$  von 18,5 GHz, 1,7 GHz besser als die von Laemmle et al. Beide haben ähnliche FDR-Charakteristika. Sie haben die gleiche Anzahl von abgeschnittenen Akkumulatorausgangs- und DAC-Bits. Anderen Händen zeigt DDS von Laemmle et al. eine bemerkenswert niedrige Leistungsaufnahme von 486 mW, mit einer Rekordzahl ( $F_{clk} : P$ ) von Verdienst. In dieser Arbeit hat eine Gesamtleistungsaufnahme von 1500 mW. Die zusätzliche Leistungsaufnahme im Vergleich zu Laemmle et al. ergibt sich aus den zusätzlichen 4 Bit im Akkumulator und der zusätzlichen Phasensteuereinheit. Dies führt zur Erhöhung von Volladdierern, Puffern und Takttreibern und damit zur Erhöhung der Leistungsaufnahme. Dieses DDS kann eine Anzahl von 2048 Frequenzpunkten liefern, während Laemmle et al. nur 128 erzeugen können.

Für einen fairen Vergleich in Bezug auf den Stromverbrauch zwischen diesen beiden DDSs wird das DDS mit 8-Bit-Akkumulator unter Verwendung der in diesem DDS entworfenen Zelle implementiert. Und sein Stromverbrauch wird berechnet. Die Anzahl der Volladdierer wird von 12 auf 8 Bit reduziert, die Taktzuleitungen werden von 16 auf 8 reduziert und PCW wird eliminiert. Daraus ergibt sich die Leistungsaufnahme: Akkumulator = 450 mW, DAC = 214 mW und TSC = 50 mW. Das bedeutet, dass ein an den 8-Bit-Akkumulator angepasster DDS in diesem Design nur 790 mW verbraucht hätte. Die Leistungsaufnahme kann weiter optimiert werden, insbesondere in den überlasteten Treiber- und Emitterfolgern. Beispielsweise werden die Stromstärken der Emitterfolger zu jedem Schalter des DACs reduziert. Die Leistung des DAC wird dadurch nicht beeinträchtigt, aber die Leistungsaufnahme des DAC wird um mindestens 60 mW verringert. Somit beträgt die Gesamtleistungsaufnahme des DDS nur 720 mW. Eine solche Optimierung kann auch im Akkumulatorblock durchgeführt werden, was eine Reduzierung von 50-60 mW bedeutet. Schließlich zeigt der an den 8-Bit-Akkumulator angepasste DDS eine Leistungsaufnahme von nur 654 mW, vergleichbar mit [5]. Ein gewisser Unterschied in der Leistungsaufnahme kann sich aus dem Unterschied in der Technologie selbst ergeben, obwohl es sich um die gleiche SiGe-Technologie handelt. Ihre Kollektorstromversus  $f_T$ -Verstärkungseigenschaften könnten unterschiedlich sein. Laemmle et al. haben einen leichten Vorteil, weil sie eine 20 GHz schnellere Technologie verwenden. Um  $F_{clk}$ von 16,8 GHz zu erreichen, kann die CML-Logik für niedrigere Schwanzströme gelockert werden, während diese Arbeit den Schwanzstrom noch weiter nach oben treiben muss, um trotz eines langsameren Knotens F<sub>clk</sub> von 18,5 GHz zu erreichen. Dennoch beträgt das F<sub>clk</sub> : P für das an den 8-Bit-Akkumulator angepasste DDS 28,3 GHz/W, vergleichbar mit 34,5 GHz/W in [5].

#### Anwendung von DDS im Doppler-Radar

Eine der Anwendungen der DDS wird ebenfalls demonstriert. Beim 5-10 GHz-Dopplerradar wird DDS als Frequenzquelle verwendet. Dieses Radar wird zur Erfassung der kleinen Schwingungen der Raumfahrer (hier die Lautsprechermembran) verwendet. Das Doppler-Radar wird unter Verwendung der kommerziellen HF-Komponenten (wie Mischer, Verstärker, Multiplikator, Filter, Antenne) aufgebaut, wobei das DDS die Schlüsselkomponente ist. Zunächst wird 12 GHz getaktetes DDS verwendet, um ein sauberes 3 GHz-Signal zu erzeugen. Es wird mit Hilfe eines Multiplikators verdoppelt, um ein 6 GHz-Trägersignal des Radars zu erzeugen. Die Schwingung des Prüflings wird von 1 Hz bis 100 Hz variiert. Das Radar erfasst die Frequenz der Schwingung des Lautsprechers korrekt. Bei der nächsten Messung wird die DDS-Ausgangsfrequenz auf 4 GHz geändert und durch einen Multiplikator auf 8 GHz verdoppelt. Auch hier wird eine Schwingung des Prüflings mit 1 Hz bis 100 Hz variiert und die vom Radar erfasste Frequenz aufgezeichnet. Die Schwingungsfrequenzen werden korrekt erfasst. Bei der abschließenden Messung zeigt sich, dass dieses Radar die Mehrtonschwingungen mit engen Abständen in der Frequenz von nur 0,3 Hz korrekt erfassen kann, was auf die hervorragende Phasenrauschcharakteristik des DDS-Ausgangssignals zurückzuführen ist. Dies impliziert, dass zwei Objekte mit sehr eng beieinander liegenden Frequenzen zuverlässig detektiert werden können, was in einer biomedizinischen Anwendung wie der Herzschlagdetektion & Überwachung nützlich ist.

# List of publications

(1) A. Shrestha, J. Moll, V. Krozer, "Wideband Doppler Radar Using a 18.5 GHz Microwave Direct Digital Synthesizer," International Radar Symposium (IRS) 2020, Warsaw, Poland

(2) M. Hrobak, K. Thurn, J. Moll, M. Hossain, A. Shrestha, T. Al-Sawaf, et al. "Components for a Modular MIMO Millimeter-Wave Imaging Radar for Space Applications", International Journal of Infrared, Millimeter, and Terahertz Waves 2020

(3) A. Shrestha, J. Moll, A. Raemer, M. Hrobak and V. Krozer, "20 GHz Clock Frequency ROM-Less Direct Digital Synthesizer Comprising Unique Phase Control Unit in 0.25μm SiGe Technology," 13<sup>th</sup> European Microwave Integrated Circuits Conference (Eu-MIC) 2018, Madrid, Spain

(4) A. Shrestha, R. Kumar, F. Dornuf, J. Moll, V. Krozer, M. Smidth, "Remote Mechanical Vibration Sensing: A Comparison Between CW-Doppler Radar and Laser-Doppler Vibrometer Measurements," Structural Health Monitoring (SHM) 2017, California, USA

(5) A. Shrestha, V. Krozer, "Design of direct digital microwave signal synthesizer using SiGe technology," International Conference On Microwaves, Communications, Antennas & Electronic Systems (COMCAS) 2017, Tel Aviv, Isreal

(6) K. Neuschwander, A. Shrestha, J. Moll, V. Krozer and M. Bücker, "Multichannel Device for Integrated Pitch Catch and EMI Measurements in Guided Wave Structural Health Monitoring Applications," Structural Health Monitoring (SHM) 2017, California, USA

(7) J. Moll, B. Hils, A. Shrestha, A. Ehlert, V. Krozer, K. Thurn, M. Vossiek, M. Hrobak, M. Hossain, W. Heinrich, M. Resch and J. Bosse, "*Panel design of a MIMO imaging radar at W-band for space applications*," European Radar Conference (EuRAD) 2017, Nuremberg, Germany

# Contents

| 1        | Intr | oduction                                                                                                                                                       | 1  |
|----------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|          | 1.1  | DDS challenges                                                                                                                                                 | 3  |
|          | 1.2  | Technology selection criteria                                                                                                                                  | 5  |
|          |      | 1.2.1 Speed                                                                                                                                                    | 5  |
|          |      | 1.2.2 Matching                                                                                                                                                 | 6  |
|          |      | 1.2.3 Phase noise $\ldots$                                                                                                                                     | 7  |
|          |      | 1.2.4 Power consumption, integration capability, frequency resolution                                                                                          | 7  |
|          | 1.3  | Technology selection and goals                                                                                                                                 | 8  |
|          | 1.4  | Thesis organization                                                                                                                                            | 10 |
|          | 1.5  | Conclusions from chapter 1                                                                                                                                     | 10 |
| <b>2</b> | DDS  | S basics                                                                                                                                                       | 1  |
|          | 2.1  | DDS parameters                                                                                                                                                 | 1  |
|          |      | 2.1.1 Clock frequency                                                                                                                                          | 1  |
|          |      | 2.1.2 Frequency resolution                                                                                                                                     | 1  |
|          |      | 2.1.3 DDS output frequency range                                                                                                                               | 1  |
|          |      | 2.1.4 DDS output frequency points                                                                                                                              | 12 |
|          |      | 2.1.5 Spurious free dynamic range                                                                                                                              | 12 |
|          |      | 2.1.6 Frequency switching time                                                                                                                                 | 13 |
|          | 2.2  | DDS architecture overview                                                                                                                                      | 13 |
|          |      | 2.2.1 CORDIC approach DDS architecture                                                                                                                         | 15 |
|          |      | 2.2.2 ROM based DDS architecture                                                                                                                               | 15 |
|          |      | 2.2.3 ROM-less DDS architecture                                                                                                                                | 16 |
|          | 2.3  | DDS architecture used in this thesis                                                                                                                           | 18 |
|          | 2.4  | DDS hierarchy                                                                                                                                                  | 19 |
|          | 2.5  | Conclusions from chapter 2                                                                                                                                     | 19 |
| 3        | Higl | h-speed circuit design                                                                                                                                         | 20 |
|          | 3.1  | SG25H4 technology                                                                                                                                              | 20 |
|          | 3.2  | Transistor I-V characteristics                                                                                                                                 | 21 |
|          |      | 3.2.1 Active region $\ldots \ldots \ldots$     | 22 |
|          |      | 3.2.2 Cut-off region                                                                                                                                           | 23 |
|          |      | 3.2.3 Saturation region $\ldots \ldots \ldots$ | 23 |
|          | 3.3  | $f_T/f_{MAX}$ of SG25H4 transistor                                                                                                                             | 23 |
|          | 3.4  | Biasing circuit                                                                                                                                                | 25 |
|          | 3.5  | Emitter follower                                                                                                                                               | 28 |

|          | 3.6  | Digital logic circuits                                                                                       | 29        |
|----------|------|--------------------------------------------------------------------------------------------------------------|-----------|
|          |      | 3.6.1 Inverter gate                                                                                          | 30        |
|          |      | 3.6.2 Comparison between inverter gates                                                                      | 32        |
|          |      | 3.6.3 Optimum tail current selection                                                                         | 33        |
|          |      | 3.6.4 Cascoded inverter gate                                                                                 | 34        |
|          |      | 3.6.5 Inductive peaking                                                                                      | 36        |
|          |      | 3.6.6 XOR gate                                                                                               | 37        |
|          |      | 3.6.7 AND gate and Majority gate                                                                             | 38        |
|          |      | 3.6.8 Register/memory block                                                                                  | 39        |
|          |      | 3.6.9 OR gate                                                                                                | 42        |
|          | 3.7  | Conclusions from chapter 3                                                                                   | 43        |
|          |      |                                                                                                              |           |
| 4        | Phas | se accumulator                                                                                               | <b>45</b> |
|          | 4.1  | Adder                                                                                                        | 45        |
|          |      | 4.1.1 Half adder                                                                                             | 46        |
|          |      | 4.1.2 Full adder                                                                                             | 46        |
|          | 4.2  | Multi-bit adder                                                                                              | 48        |
|          |      | 4.2.1 2-bit adder design                                                                                     | 49        |
|          | 4.3  | Accumulator architecture                                                                                     | 50        |
|          | 4.4  | Pipeline accumulator without pre-skewing register                                                            | 52        |
|          | 4.5  | Accumulator size selection                                                                                   | 53        |
|          | 4.6  | Phase accumulator phase truncation                                                                           | 53        |
|          |      | 4.6.1 Linear increasing ramp accumulation                                                                    | 55        |
|          |      | 4.6.2 Accumulation process with flipping                                                                     | 56        |
|          | 4.7  | Phase control word unit in accumulator                                                                       | 58        |
|          | 4.8  | Clock tree for synchronization                                                                               | 59        |
|          | 4.9  | Accumulator simulation                                                                                       | 60        |
|          | 4.10 | Conclusions from chapter 4                                                                                   | 62        |
|          |      | -                                                                                                            |           |
| <b>5</b> | Digi | tal to analog converter                                                                                      | 63        |
|          | 5.1  | DAC performance parameters                                                                                   | 64        |
|          |      | 5.1.1 Integral nonlinearity (INL) $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$             | 64        |
|          |      | 5.1.2 Differential nonlinearity (DNL)                                                                        | 64        |
|          |      | 5.1.3 Monotonicity $\ldots$ | 65        |
|          |      | 5.1.4 SFDR and glitch                                                                                        | 65        |
|          | 5.2  | DAC architecture                                                                                             | 65        |
|          |      | 5.2.1 DAC architecture based on implementation mode $\ldots \ldots \ldots$                                   | 65        |
|          |      | 5.2.2 DAC architecture based on coding                                                                       | 68        |
|          | 5.3  | Investigation for optimum partially segmented DAC                                                            | 70        |
|          |      | 5.3.1 DAC switches for partially segmented DAC                                                               | 71        |
|          |      | 5.3.2 R-2R and segmented combinations for 6-bit DAC                                                          | 74        |
|          |      | 5.3.3 Voltage level for switches                                                                             | 75        |
|          | 5.4  | 6-bit partially segmented DAC design                                                                         | 76        |
|          |      | 5.4.1 4-bit R-2R and 2-bit segmented DAC (4R-2S DAC)                                                         | 76        |
|          |      | 5.4.2 3-bit R-2R and 3-bit segmented DAC (3R-3S DAC)                                                         | 78        |
|          |      | 5.4.3 2-bit R-2R and 4-bit segmented DAC (2R-4S DAC)                                                         | 79        |

| 8 | <ul><li>7.5</li><li>7.6</li><li>7.7</li><li>Cone</li></ul> | DDS characterization                                   | 126<br>126<br>126<br>128<br>134<br>134<br>137<br>141<br><b>143</b> |
|---|------------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------------------|
|   | <ul><li>7.5</li><li>7.6</li><li>7.7</li></ul>              | DDS characterization                                   | 126<br>126<br>128<br>134<br>134<br>137<br>141                      |
|   | 7.5<br>7.6                                                 | DDS characterization                                   | 126<br>126<br>128<br>134<br>134<br>137                             |
|   | 7.5<br>7.6                                                 | DDS characterization                                   | 126<br>126<br>126<br>128<br>134<br>134                             |
|   | 7.5<br>7.6                                                 | DDS characterization                                   | 126<br>126<br>126<br>128<br>134                                    |
|   | 7.5                                                        | DDS characterization                                   | 126<br>126<br>126                                                  |
|   | 7.5                                                        | DDS characterization                                   | 126<br>126                                                         |
|   | 75                                                         | DDS characterization                                   | 120                                                                |
|   | 1.4                                                        | Duase control word simulation                          | - i ZO                                                             |
|   | 1.3<br>71                                                  | DDS phase control word simulation                      | 123<br>19¤                                                         |
|   | 7.2                                                        | DDS static simulation                                  | 111                                                                |
|   | 7.1                                                        | DDS simulation                                         | 109                                                                |
| 7 | DDS                                                        | simulation and characterization                        | <b>109</b>                                                         |
|   | 0.9                                                        |                                                        | 100                                                                |
|   | 0.8<br>6.0                                                 | Conclusions from aborter 6                             | 100                                                                |
|   | 0.1<br>6.9                                                 | Nonideal TSC input and the DDS                         | 103                                                                |
|   | 6.6<br>6.7                                                 | TSU simulation results                                 | 102                                                                |
|   | 6.5                                                        | Complete TSC block design                              | 101                                                                |
|   | 6.4                                                        | Harmonic suppression in TSC                            | 100                                                                |
|   | 6.3                                                        | Under, medium, and over excitation to the TSC $\ldots$ | 99                                                                 |
|   | 6.2                                                        | TSC design for DDS                                     | 97                                                                 |
|   | 6.1                                                        | Differential pair based TSC                            | 94                                                                 |
| 6 | Tria                                                       | ngle to sine wave converter                            | 94                                                                 |
|   | 5.11                                                       | Conclusions from chapter 5                             | 93                                                                 |
|   | 5.10                                                       | DAC for future DDS                                     | 92                                                                 |
|   | 5.9                                                        | SiGe high speed DAC comparisons                        | 91                                                                 |
|   |                                                            | 5.8.2 Glitch suppression of 3R-3S DAC                  | 87                                                                 |
|   |                                                            | 5.8.1 Glitch analysis of 3R-3S DAC                     | 86                                                                 |
|   | 5.8                                                        | Glitches in DAC                                        | 86                                                                 |
|   | 5.7                                                        | Layout for 3R-3S DAC                                   | 85                                                                 |
|   |                                                            | 5.6.4 Final selection of a DAC                         | 85                                                                 |
|   |                                                            | 5.6.2 Gitten comparison                                | 04<br>84                                                           |
|   |                                                            | 5.6.1 Gain offset, DNL, and INL comparison             | 83<br>84                                                           |
|   | 5.6                                                        | Simulated results of DACs comparison                   | 83                                                                 |
|   | <b>-</b> 0                                                 | 5.5.3 Simulation for 2R-4S DAC                         | 82                                                                 |
|   |                                                            | 5.5.2 Simulation for 3R-3S DAC                         | 81                                                                 |
|   |                                                            | 5.5.1 Simulation for 4R-2S DAC                         | 80                                                                 |
|   | 5.5                                                        | Simulations of various DACs                            | 79                                                                 |

# List of Figures

| 2.1  | SFDR of DDS output signal                                                               | 12 |  |  |  |
|------|-----------------------------------------------------------------------------------------|----|--|--|--|
| 2.2  | Basic DDS block diagram [6]                                                             |    |  |  |  |
| 2.3  | An accumulator-operation representing the digital phase wheel (left) and                |    |  |  |  |
|      | linear accumulation ramp (right)[6]                                                     | 14 |  |  |  |
| 2.4  | Different DDS architectures in simplified form [7].                                     | 15 |  |  |  |
| 2.5  | Conventional ROM-less TSC based DDS architecture [8]                                    | 17 |  |  |  |
| 2.6  | Modified ROM-less TSC based DDS architecture used in this design                        | 18 |  |  |  |
| 2.7  | Hierarchy of the DDS cells                                                              | 19 |  |  |  |
| 3.1  | BEOL of the IHP's 0.25 $\mu$ m BiCMOS technology [9]                                    | 21 |  |  |  |
| 3.2  | Simplified I-V characteristics set up for HBT npn200_1.                                 | 21 |  |  |  |
| 3.3  | Simulated I-V characteristics of HBT npn200_1.                                          | 22 |  |  |  |
| 3.4  | $f_T/f_{MAX}$ with the function of $I_C$ and $V_{CE}$ for npn200_1 (emitter number =1). | 24 |  |  |  |
| 3.5  | $f_T/f_{MAX}$ with the function of $I_C$ and $V_{CE}$ for npn200_1 (emitter number =2). | 24 |  |  |  |
| 3.6  | Resistor divider biasing [10].                                                          | 25 |  |  |  |
| 3.7  | Current source using a simple current mirror [10]                                       | 26 |  |  |  |
| 3.8  | Current source using a multiple current mirrors [11]                                    | 26 |  |  |  |
| 3.9  | Current source using an emitter degenerated current mirror (left), and its              |    |  |  |  |
|      | symbol [12]                                                                             | 27 |  |  |  |
| 3.10 | Current source using a beta helper current mirror [11].                                 | 27 |  |  |  |
| 3.11 | Simplified emitter follower circuit.                                                    | 28 |  |  |  |
| 3.12 | Inverter gate using CML logic [12].                                                     | 30 |  |  |  |
| 3.13 | Propagation delay $(t_{delay})$ and rise time $(t_{rise})$ calculation scheme [13]      | 31 |  |  |  |
| 3.14 | Rise time $(t_{rise})$ and fall time $(t_{fall})$ of CML inverter cell.                 | 32 |  |  |  |
| 3.15 | Inverter gate using CML and ECL logic.                                                  | 32 |  |  |  |
| 3.16 | t <sub>delay</sub> and t <sub>rise</sub> comparison between CML and ECL gates.          | 33 |  |  |  |
| 3.17 | CML and ECL gate $t_{delay}$ over different $I_C$ .                                     | 33 |  |  |  |
| 3.18 | Inverter gate using cascoded CML logic.                                                 | 34 |  |  |  |
| 3.19 | Inverter gate using cascoded CML logic.                                                 | 35 |  |  |  |
| 3.20 | Gain versus frequency comparison between with and without casocded                      |    |  |  |  |
|      | CML gates.                                                                              | 35 |  |  |  |
| 3.21 | Shunt peaking cascoded CML inverter cell.                                               | 36 |  |  |  |
| 3.22 | Shunt peaking cascoded CML inverter cell.                                               | 36 |  |  |  |
| 3.23 | Shunt peaking cascoded CML inverter cell.                                               | 37 |  |  |  |
| 3.24 | XOR gate using a CML.                                                                   | 38 |  |  |  |
| 3.25 | XOR gate simulated result for its functionality test (left) and rise time               | -  |  |  |  |
|      | (right)                                                                                 | 38 |  |  |  |
|      |                                                                                         |    |  |  |  |

| 3.26         | AND gate using a CML.                                                             | 39         |
|--------------|-----------------------------------------------------------------------------------|------------|
| 3.27         | AND gate simulated result for its functionality test (left) and rise time         |            |
|              | (right)                                                                           | 39         |
| 3.28         | Majority gate simulated result for its functionality test (left) and rise time    | 10         |
| 0.00         | (right).                                                                          | 40         |
| 3.29         | Majority gate simulated result for its functionality test (left) and rise time    | 10         |
|              | (right)                                                                           | 40         |
| 3.30         | D-latch block diagram.                                                            | 41         |
| 3.31         | D-latch using a CML (left)                                                        | 41         |
| 3.32         | Register using a master-slave D-latch configuration.                              | 42         |
| 3.33         | Register using a CML D-latches using a master-slave configuration                 | 42         |
| 3.34         | (Simulated) Register working at 20 GHz clock                                      | 43         |
| 3.35         | OR gate using a CML                                                               | 43         |
| 3.36         | OR gate simulated result for its functionality test (left) and rise time (right). | 44         |
| 4.1          | Half adder truth table (left) and logic block diagram (right) [14]                | 46         |
| 4.2          | Full adder truth table (left) and logic block diagram (right)[14].                | 46         |
| 4.3          | 1-bit adder (left) and 1-bit adder in accumulator block diagram (right).          | 47         |
| 4.4          | 1-bit adder accumulator circuit using CML gates.                                  | 47         |
| 4.5          | 2-bit adder blockdiagram                                                          | 49         |
| 4.6          | 2-bit adder accumulator block diagram                                             | 49         |
| 4.7          | 2-bit adder accumulator using CML gates                                           | 50         |
| 1.1<br>1.8   | 12-bit accumulator using a) 1-bit adders b) 2-bit adders c) 4-bit adders          | 51         |
| 4.0          | 12 bit accumulator using 2 bit addors without pro skowing register                | 52         |
| 4.5          | 6 bit phase truncated 12 bit accumulator using 2 bit adders                       | 54         |
| 4.10         | 7 bit phase truncated 12 bit accumulator using 1 bit and 2 bit adders             | 55         |
| 4.11         | 12 bit accumulator digital rown (gum output) generation                           | 50         |
| 4.12         | 7 bit above transported 12 bit accumulator using 1 bit and 2 bit addres in        | 90         |
| 4.15         | 7-bit phase truncated 12-bit accumulator using 1-bit and 2-bit adders in-         | 57         |
| 111          | Cluding AORS for digital triangle generation.                                     | 97         |
| 4.14         | 7-bit phase truncated 12-bit accumulator using 1-bit and 2-bit adders in-         | <b>F</b> 0 |
| 4 1 5        | Cluding AORs for digital triangle generation.                                     | 58         |
| 4.15         | Final 12-bit accumulator including PCW                                            | 59         |
| 4.16         | H-tree block diagram (left) and its layout realisation (right).                   | 60         |
| 4.17         | Simulation result of six most significant bits from the accumulator for a         |            |
|              | clock of 20 GHz and FCW = $0000\ 0010\ 0000$                                      | 61         |
| 5.1          | Static transfer characteristics of 3-bit DAC for without INL error (left) and     |            |
|              | with INL error (right) [15]                                                       | 64         |
| 5.2          | Static transfer characteristics of 3-bit DAC for without DNL error (left)         |            |
|              | and with DNL error (right) [15].                                                  | 65         |
| 5.3          | 3-bit resistor string DAC (voltage mode) [16].                                    | 66         |
| 5.4          | 3-bit voltage mode binary weighted resistor string DAC (voltage mode) [17]        | 67         |
| 5.5          | 3-bit B-2B ladder DAC (voltage mode) [17]                                         | 68         |
| 5.6          | 3-bit Charge distribution binary weighted DAC (simplified) [16]                   | 60         |
| $5.0 \\ 5.7$ | 3-bit binary weighted current steering DAC (current mode)[16]                     | 60         |
| 5.8          | DAC LSB0 switch using a CML case od a inverter                                    | 09<br>71   |
| 0.0          | DAG LODO SWITCH USING A OWIL CASCOULD HIVEITEL                                    | 11         |

| 5.9  | Fall time of DAC LSB switch and MSB switch.                                                                                                       | 72  |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 5.10 | Eye-diagram of LSB switch (left) and LSB switch (right)                                                                                           | 73  |
| 5.11 | DAC LSB0 switch using a CML cascoded inverter.                                                                                                    | 73  |
| 5.12 | Schematic of 4-bit R-2R and 2-bit segmented DAC (4R-2S DAC)                                                                                       | 74  |
| 5.13 | Schematic of 3-bit R-2R and 3-bit segmented DAC (3R-3S DAC)                                                                                       | 74  |
| 5.14 | Schematic of 2-bit R-2R and 4-bit segmented DAC (2R-4S DAC)                                                                                       | 75  |
| 5.15 | Voltage level for switch in DAC.                                                                                                                  | 75  |
| 5.16 | 4R-2S DAC complete block diagram.                                                                                                                 | 77  |
| 5.17 | Layout of R-2R section of 4R-2S DAC.                                                                                                              | 77  |
| 5.18 | 3R-3S DAC complete block diagram.                                                                                                                 | 78  |
| 5.19 | Layout of R-2R section of 3R-3S DAC.                                                                                                              | 79  |
| 5.20 | 2R-4S DAC complete block diagram.                                                                                                                 | 80  |
| 5.21 | DAC output voltage vs linaer input code for 4R-2S DAC.                                                                                            | 80  |
| 5.22 | INL and DNL static result (simulated) for 4R-2S DAC                                                                                               | 81  |
| 5.23 | DAC output voltage vs linaer input code for 3R-3S DAC.                                                                                            | 81  |
| 5.24 | DAC output voltage vs linaer input code for 3R-3S DAC.                                                                                            | 82  |
| 5.25 | DAC output voltage vs linaer input code for 2R-4S DAC.                                                                                            | 82  |
| 5.26 | DAC output voltage vs linaer input code for 2R-4S DAC.                                                                                            | 83  |
| 5.27 | Glitch, area, and power consumption comparison between various DACs.                                                                              | 85  |
| 5.28 | Layout of final DAC (3R-3S DAC).                                                                                                                  | 85  |
| 5.29 | Glitches in 3R-3S output voltage with linearly increasing input code                                                                              | 87  |
| 5.30 | Switching cell used for LSB (left) and MSB (right) with glitch suppression                                                                        |     |
|      | capacitor C <sub>SUP</sub> .                                                                                                                      | 88  |
| 5.31 | Glitch amplitude level of the MSB2 switch output for various capacitor                                                                            |     |
|      | $(C_{SUP})$ values (0 fF, 50 fF, 150 fF and 100 fF, clockwise starting from the                                                                   |     |
|      | $top-left). \ldots \ldots$ | 89  |
| 5.32 | The glitch amplitude and gate delay during the input code 100 000 to 011                                                                          |     |
|      | 111 switching with the function of glitch suppression capacitor ( $C_{SUP}$ )                                                                     | 89  |
| 5.33 | Eye-diagram of LSB switch (left) and MSB switch (right) with $C_{SUP} = 100 \text{fF}$ .                                                          | 90  |
| 5.34 | DAC output voltage with linearly increasing input code (with glitch sup-                                                                          |     |
|      | pression $C_{SUP} = 100 fF$ )                                                                                                                     | 90  |
| 5.35 | DAC output voltage (without and with glitch suppression $C_{SUP}$ for digital                                                                     |     |
|      | triangle input codes.                                                                                                                             | 90  |
| 5.36 | 5R-3S DAC block diagram                                                                                                                           | 92  |
| 5.37 | Result of 8 bit DAC (5R-3S DAC) with $C_{SUP} = 50$ fF                                                                                            | 93  |
| 61   | Differential paired based TSC core (left) and its input-output wave char-                                                                         |     |
| 0.1  | acteristics [18]                                                                                                                                  | 95  |
| 62   | TSC bandwidth (gain versus input/output frequency)                                                                                                | 98  |
| 6.3  | Under excitation ( $V_{PP} = 110 \text{ mV}$ left) appropriate excitation ( $V_{PP} = 240$                                                        | 50  |
| 0.0  | mV middle) and over excitation ( $V_{PP} = 850 \text{ mV}$ right) of 5MHz triangle                                                                |     |
|      | input to the TSC.                                                                                                                                 | 99  |
| 6.4  | Input peak to peak voltage (Vpp) versus harmonics suppression of TSC at                                                                           |     |
|      | the input triangle wave frequency of 5 MHz, 2.5 GHz, 5 GHz, and 10 GHz                                                                            |     |
|      | (clockwise from top-left).                                                                                                                        | 100 |

| 6.5        | TSC SFDR over frequency for 225 mV input excitation. The SFDR is                                                                               |   |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------|---|
|            | considered up to fifth harmonics of the fundamental signal. If the SFDR                                                                        |   |
|            | bandwidth were considered up 10 GHz, the SFDR would be much better                                                                             |   |
|            | specially for the output frequency greater than 5 GHz                                                                                          | 1 |
| 6.6        | Voltage shifter for the TSC input and output (at 1 GHz)                                                                                        | 1 |
| 6.7        | Complete schematics of the TSC core circuit including input EF, potential                                                                      |   |
|            | divider, and output EF                                                                                                                         | 2 |
| 6.8        | TSC output sinesave (left) and its spectrum (right) with input triangle of                                                                     |   |
|            | $5 \text{ MHz} (V_{PP} = 450 \text{ mV})$                                                                                                      | 3 |
| 6.9        | TSC output sinesave (left) and its spectrum (right) with input triangle of                                                                     |   |
|            | $2.5 \text{ GHz} (V_{PP} = 450 \text{ mV})$                                                                                                    | 3 |
| 6.10       | TSC output sinesave (left) and its spectrum (right) with input triangle of                                                                     |   |
|            | $5 \text{ GHz} (V_{PP} = 450 \text{ mV})$                                                                                                      | 4 |
| 6.11       | TSC output sinesave (left) and its spectrum (right) with input triangle of                                                                     |   |
|            | $10 \text{ GHz} (V_{PP} = 450 \text{ mV}).$                                                                                                    | 4 |
| 6.12       | TSC output sinesave (left) and its spectrum (right) with input triangle of                                                                     |   |
|            | $15 \text{ GHz} (V_{PP} = 450 \text{ mV}).$                                                                                                    | 5 |
| 6.13       | DAC output (TSC input triangle) at 5 GHz (left), 2.5 GHz (middle), and                                                                         |   |
|            | 1.25 GHz (right)                                                                                                                               | 5 |
| 6.14       | Conventional bandgap reference in current source with the first order tem-                                                                     |   |
|            | perate compensation                                                                                                                            | 6 |
| 6.15       | SFDR of the TSC output at various temperature at 5 MHz (left) and 5                                                                            |   |
|            | GHz (right).                                                                                                                                   | 7 |
| <b>_</b> . |                                                                                                                                                | _ |
| 7.1        | DDS block diagram and simulation set up                                                                                                        | 9 |
| 7.2        | DDS output frequency range and sampling points                                                                                                 | ) |
| 7.3        | Accumulator, DAC, and 2.5 GHz DDS output with $F_{clk} = 10$ GHz, FCW                                                                          |   |
|            | $= 0100\ 0000\ 0000\ \dots\ \dots\ 113$                                                                                                        | 3 |
| 7.4        | 2.5 GHz DDS output spectrum where DDS is operated with $F_{clk} = 10 \text{ GHz}$                                                              |   |
|            | and FCW = $0100\ 0000\ 0000$ . The SFDR of 2.5 GHz DDS output is 24 dBc. 114                                                                   | 4 |
| 7.5        | 1.25 GHz DDS output spectrum where DDS is operated with $F_{clk} = 10$                                                                         |   |
|            | GHz and FCW = $0010\ 0000\ 0000$ . The SFDR of 1.25 GHz DDS output is                                                                          |   |
|            | $30 \text{ dBc.} \qquad \dots \qquad $ | 4 |
| 7.6        | Accumulator, DAC, and 1.25 GHz DDS output with $F_{clk} = 10$ GHz, FCW                                                                         |   |
|            | $= 0100\ 0000\ 0000.$                                                                                                                          | 5 |
| 7.7        | DAC and DDS results for $F_{clk} = 10$ GHz, FCW = 0101 0101 0101 116                                                                           | ð |
| 7.8        | 3.33 GHz DDS output spectrum where DDS is operated with $F_{clk} = 10$                                                                         |   |
|            | GHz and FCW = 0101 0101 0101. The SFDR of 31 dBc is obtained. $\dots$ 116                                                                      | 3 |
| 7.9        | DAC and DDS results for $F_{clk} = 10$ GHz, FCW = 0001 1000 0000 117                                                                           | 7 |
| 7.11       | DAC and DDS results for $F_{clk} = 10$ GHz, FCW = 0000 0010 0000, $F_{out} =$                                                                  |   |
|            | 78.125 MHz                                                                                                                                     | 7 |
| 7.10       | 0.935 GHz DDS output spectrum where DDS is operated with $\mathrm{F}_{\mathrm{clk}}=10$                                                        |   |
|            | GHz and FCW = 0001 1000 0000. The SFDR of 31 dBc is obtained. $\dots$ 118                                                                      | 3 |
| 7.12       | 78.12 MHz DDS output spectrum with $F_{\rm clk}$ = 10 GHz and FCW = 0000                                                                       |   |
|            | 0010 0000                                                                                                                                      | 3 |

| 7.13 | Simulated 3.5 GHz DDS output with the clock frequency of 14 GHz and FCW=0100000000000: sine wave (left) and its spectrum (right). The SFDR of 37 dBc is obtained                                                           | 119        |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| 7.14 | Simulated 8.5 GHz DDS output with the clock frequency of 14 GHz and $FCW = 0111 1111 1111$ : sine wave (left) and its spectrum (right). The                                                                                |            |
| 7.15 | SFDR of 37 dBc is obtained                                                                                                                                                                                                 | 119<br>120 |
| 7.16 | Spectrum of 0.78 GHz DDS output with $F_{clk} = 20$ GHz, FCW = 0000 1010 0000. The SFDR of 34.4 dBc is obtained.                                                                                                           | 120        |
| 7.17 | 10 GHz DDS output with $F_{clk} = 20$ GHz, FCW = 0111 1111 1111 (left)<br>and 2.5 GHz DDS output with $F_{clk} = 5$ GHz, FCW = 0111 1111 1111. In<br>both cases all the accumulator outputs and DAC outputs are also shown | 191        |
| 7.18 | Spectrum of 10 GHz and 2.5 GHz DDS outputs for 20 GHz and 5 GHz clocks. The former shows 39 dBc of SFDR while the later one has poor SFDR of 15 dBc                                                                        | 121        |
| 7.19 | Simulated SFDR and output power versus clock frequency. The result shows that the DDS can give atleast 25 dBc of SFDR over entire 10 GHz bandwidth.                                                                        | 122        |
| 7.20 | Simulated SFDR output power versus DDS output frequency at 20 GHz clock. The result shows that the DDS can give atleast 28 dBc of SFDR and atleast -14 dBm of output power for output frequency from 5 MHz to 5 GHz.       | 123        |
| 7.21 | Simulated results from the DDS: linear ramp (left), chirp signal time do-<br>main view (right).                                                                                                                            | 124        |
| 7.22 | Simulated results from the DDS: nonlinear ramp (left), time domain view (right).                                                                                                                                           | 125        |
| 7.23 | PCW changing the phase of the DDS output at 239 MHz by 180°. The PCW value is changed from 000 000 to 111 111 at time instance 8 ns 1                                                                                      | 125        |
| 7.24 | Changing the phase of DDS output at 625 MHz by 90°.                                                                                                                                                                        | 126        |
| 7.25 | MMIC of the fabricated DDS (size $2.2 \text{ mm} \times 1.8 \text{ mm}$ )                                                                                                                                                  | 127        |
| 7.26 | DDS printed circuit test board on Rogers4300B substrate (including a wire-<br>bonded DDS MMIC)                                                                                                                             | 128        |
| 7.27 | DDS measurement setup in the lab                                                                                                                                                                                           | 128        |
| 7.28 | Measured result: 2.5 GHz DDS output with $F_{clk} = 10$ GHz and FCW = 0100 0000 0000. SFDR and output power are 26 dBc and -15 dBm                                                                                         |            |
|      | respectively.                                                                                                                                                                                                              | 129        |
| 7.29 | Measured result: DDS outputs at 10 GHz clock for various FCW inputs 1                                                                                                                                                      | 130        |
| 7.30 | Measured result: 3.5 GHz DDS output with $F_{clk} = 14$ GHz and FCW = 0100 0000 0000. SFDR and output power are 31 dBc and -18 dBm respectively.                                                                           | 120        |
| 7.31 | Measured result: 8.5 GHz DDS output with $F_{clk} = 17$ GHz and FCW = 0111 1111 1111. SFDR and output power are 31 dBc and -25 dBm respectively.                                                                           | 130        |
|      | 100p0001v01y                                                                                                                                                                                                               | LOT        |

| 7.32 | Simulated and measured result comparison: SFDR and output power of            | 101 |
|------|-------------------------------------------------------------------------------|-----|
|      | nyquist DDS output corresponding to the clock frequency                       | 131 |
| 7.33 | Measured result: SFDR and output power of nyquist DDS output corre-           |     |
|      | sponding to the clock frequency                                               | 132 |
| 7.34 | Measured result: phase noise of 3 GHz and 4 GHz DDS outputs with a            |     |
|      | clock of 12 GHz                                                               | 133 |
| 7.35 | Measured result: changing two least significant bits to generate four differ- |     |
|      | ent frequency points for 12 GHz clock.                                        | 133 |
| 7.36 | Exemplary Doppler radar concept [19]                                          | 134 |
| 7.37 | Radar system block diagram used in ADS simulation                             | 136 |
| 7.38 | ADS simulation result at different stages of the radar                        | 137 |
| 7.39 | Radar system block diagram used in measurement detecting a vibration of       |     |
|      | the speaker                                                                   | 138 |
| 7.40 | DDS integrated Doppler radar system realization.                              | 138 |
| 7.41 | Measured result: the 6 GHz Doppler radar detecting 10 Hz speaker vibra-       |     |
|      | tion. The net frequency of 50 Hz can also be detected.                        | 139 |
| 7.42 | Measured result: speaker vibration detection using 6 GHz Doppler radar:       |     |
|      | for 1 to 10 Hz (left), for 10 to 100 Hz (right). The net frequency of 50Hz    |     |
|      | can be clearly identified.                                                    | 139 |
| 7.43 | Measured result: speaker vibration detection using 8 GHz Doppler radar:       |     |
|      | for 1 to 10 Hz (left), for 10 to 100 Hz (right). The net frequency of 50 Hz   |     |
|      | can be clearly identified.                                                    | 140 |
| 7.44 | Measured result: 6 GHz Doppler radar detecting two vibrating speakers.        | 141 |
| 7.45 | Measured result: 6 GHz Doppler radar detecting two vibrating speakers         |     |
|      | with narrowly spaced frequencies of 0.3 Hz.                                   | 141 |
|      |                                                                               | _   |

# List of Tables

| $1.1 \\ 1.2 \\ 1.3$                                                         | Comparison between various signal sources                                                                                  | 2<br>4<br>6                |
|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|----------------------------|
| $3.1 \\ 3.2$                                                                | D-latch Truth table                                                                                                        | 40<br>44                   |
| <ol> <li>4.1</li> <li>4.2</li> <li>4.3</li> <li>4.4</li> <li>4.5</li> </ol> | Input equivalency between block diagram and CML of 2-bit adder accumulator                                                 | 50<br>50<br>51<br>53<br>62 |
| $5.1 \\ 5.2 \\ 5.3$                                                         | Partially segmented 6-bit DAC combinations                                                                                 | 70<br>74<br>84             |
| 5.4<br>5.5                                                                  | Input code transition and corresponding glitch level for decreasing ramp in DAC                                            | 87<br>91                   |
| $7.1 \\ 7.2$                                                                | Various static DDS simulations                                                                                             | 112<br>124                 |
| 8.1                                                                         | Performance comparison of high-speed DDS circuits (including this work) in the order of increasing maximum clock frequency | 145                        |
|                                                                             |                                                                                                                            |                            |

# Chapter 1 Introduction

In modern RF and communication systems, there is ever increasing demand for compact, power-efficient, low noise, versatile modulation scheme, fast frequency switching, and high-speed waveform generation. The basic form of signal generator produces the signal in the form a sinewave. The most conventional signal generator is the voltage control oscillator (VCO). The frequency of operation of the VCO is available at few Hz to THz spectrum. The generation of terahertz signals is traditionally achieved by either free-electron radiation, optical lasers, Gunn diodes or fundamental oscillation by using III-V-based HBT/HEMT technology [20]-[21]. These THz spectrum signal generators suffer from the size, cost, and sometimes necessity of cryogenic cooling [22]. Thanks to advancement in microwave and millimetre-wave (mm-wave) technology, the higher frequency of operation even at THz are viable [23]. In high-frequency VCO, generally, the higher frequency is achieved by utilizing the higher-order harmonics of the fundamental signal [24]. Nevertheless, there are some limitations to VCO. It has a narrow bandwidth and an inferior phase noise. The former one is due to its relationship between an input voltage & an output frequency is not strictly linear over the tuning range. Also, another important parameter of any signal source, phase noise: the phase of the sinusoidal signal may fluctuate randomly with time [25], is poor in VCO. Since it is affected by the several variables such as the transistor size & noise, bias current, noise leaking from the bias supply, tuning control voltage nodes, noise generated by the current source, and oscillation amplitude in the oscillator circuit.

The phase control loop can be used to enhance the phase noise of the signal. The phase-locked loop (PLL) implements the feedback system locking the reference frequency set by the VCO [1]. However, the settling time of the PLL is higher as it requires stabilizing the linearity using a feedback control loop. It restricts the fast frequency switching between two output frequencies which is a vital feature of a dynamic signal generator, e.g. this property used for frequency sweep generation in advance modulation schemes. Other hands, due to the nonlinear voltage-to-frequency tuning characteristics of the VCO, its tuning gain (Hz/V) varies [26]. In PLL, the variation in tuning-gain of the VCO changes the loop bandwidth that in turn increases the settling time of the PLL or even makes the PLL unstable [27]. Therefore, the PLL must linearise only the small fraction of the tuning range of the VCO. It means the bandwidth of PLL is even less than the VCO, but rather the requirement of the phase comparator. It is challenging to achieve a phase

comparator working at a higher frequency. Yttrium Iron Garnet (YIG) is a crystal that has a very high-quality factor (Q) characteristics providing low phase noise in oscillators [28]. YIG-oscillators is suitable for wide bandwidth application despite its frequency of operation is not as high as VCO.

All the signal generators (VCO, PLL, or YIG-oscillators, optical lasers, Gunn diodes etc.) suffer from slow frequency switching, lack of digital controllability, and advance modulation capability despite having the frequency of operation at several GHz and even at THz frequencies [29]. Also, their outputs are limited to the only sinewave. One of the widely known frequency source generators is AWG (arbitrary wave generator). It can produce a wide range of frequencies, low phase noise, also capable of modulation. It generates the various form of a signal wave such as sine, saw-tooth, and triangle waves [30]. The key component of the AWG is direct digital synthesizer (DDS) [31], that is designed as a monolithic microwave integrated circuit (MMIC). In DDS, the reference clock frequency is set, and the digital frequency control word (FCW) is applied as an input to generate the signal. The input FCW can also be varied to generate various signals from near DC to half of the clock [32].

| Signal<br>source | Operating<br>output<br>frequency<br>(GHz) | Bandwidth<br>(%) | Phase<br>noise<br>(dBc/Hz<br>@1MHz offset) | Frequency<br>switching<br>speed | Modulation<br>capability<br>/ Digital<br>controlability |
|------------------|-------------------------------------------|------------------|--------------------------------------------|---------------------------------|---------------------------------------------------------|
| VCO [33]         | 251                                       | 7                | -86                                        | slow                            | No                                                      |
| VCO [29]         | 324                                       | 1.2              | -78                                        | slow                            | No                                                      |
| PLL [34]         | 93-104                                    | 11               | -86                                        | slow                            | Limited                                                 |
| PLL [35]         | 162-164                                   | 1.2              | -79                                        | slow                            | Limited                                                 |
| YIG-osc. $[36]$  | 32-48                                     | 33               | -119                                       | slow                            | No                                                      |
| YIG-osc. [37]    | 6-12                                      | 50               | -115                                       | slow                            | No                                                      |
| DDS [38]         | 0.01-4.6                                  | 50               | -100                                       | fast                            | Yes                                                     |
| DDS [39]         | 0.01-4.2                                  | 50               | -118                                       | fast                            | Yes                                                     |

 Table 1.1: Comparison between various signal sources

There are some fundamental limitations in the high-speed DDS implementation. Firstly, the maximum clock of the speed of the DDS is limited by technology. The speed of the transistor is governed by the property of the semiconductor material, electron-hole mobility and the structure of the junctions(contacts) etc. One of the fastest technology is III-V semiconductor-based Indium-Phosphide (InP) technology. It allows either High Electron Mobility Transistors (HEMT) or Heterostructure Bipolar Transistors (HBT). The intrinsic and extrinsic parasitic capacitance needs to be reduced to get the highest speed transistors. These parasitics can be reduced, e.g. using a transfer-substrate approach [40], but can not be eliminated. It restricts the maximum speed of the transistor available in the technology. DDS is composed of a large number of transistors, digital gates, high-speed blocks, other larger functional circuits. The speed of the high-speed circuits (e.g. Inverter gate) is even lower than the maximum available speed of the transistor. Again the speed of the other larger blocks inherited from the fundamental gate is further reduced. The biggest problem with InP process is the transistor count which is because of its lower yield [41]. However, the DDS generally consists of hundreds (or even thousands) of transistors.

To get the fine-frequency resolution and a large number of frequency points, the size of the functional block (e.g. accumulator) should be larger than requires thousands of transistor. This number of transistor in InP process is not possible to fabricate with good yield and reliability. Thus the large size functional block is restricted in DDS based in InP. Consequently, fine-frequency resolution and number of frequency points are limited in InP based DDS. Also, the DAC (digital to analog converter) is an essential part of the DDS. DAC's bit-size (along with accumulator) is directly proportional to the spurious quality of the fundamental signal. However, the large bit-size demands a larger number of high-speed switches. A large number of switches lead to complex clock distribution circuits, reducing the sampling speed of the DAC. Therefore it is not possible to increase the bit-size of the DAC (or accumulator) arbitrarily due speed limit, and problem associated with the large scale integration. The CMOS (complementary metal-oxide-semiconductor) technology has the smallest transistor size and shows excellent yield and integration compared to any other technology. But its speed and RF performance are inferior compared to InP and Silicon-Germanium (SiGe) technology.

There are also several physical limitations independent of the technology. In real layout, a large number of blocks means more number of the connections, which are physically done using transmission lines and vias. The length & width of transmission lines and the number of vias in the given process are physically limited. For instance, the transmission lines is a metal line on a technology-substrate. Generally, to implement several blocks and to provide power to them, longer & wider metal lines are used. In RF lines, the transmission line width is fixed for a certain matched-impedance. Thus one can not increase the width of the transmission lines to accommodate more current. Also increasing longer transmission lines affects the RF performance. Even in DC bias lines, there a maximum current handling capacity of metal lines. Generally, the speed of the circuit proportional to the power consumption, to get the highest speed one can not increase the power consumption infinitely. Because the thermal conductivity of any substrate is finite, thus these abundant power can not be spread safely to the ground. The number of DC pads are also limited. The RF pads need to be placed at a certain position according to RF probe in measurement. For these reasons, the infinitely high clock and infinitely large functional block implementation are not possible. The best possible either clock or other features are dependent on each other. There are several compromises in DDS implementation that project challenges of high-speed DDS design.

### **1.1 DDS challenges**

Generally, the DDS consists of an phase accumulator (or accumulator), sine mapping circuits, digital to analog converter (DAC), and filter. The individual performance of these blocks affect the complete DDS performances. The fundamental DDS design challenges are to have a high clock frequency ( $F_{clk}$ ), better resolution, wide frequency range, digital integrability, fewer spurs, and low power consumptions. The simplified view of the relation between various DDS parameters is presented in Table 1.2. It shows that the benefits of the particular DDS parameter and their requirements. In addition to this, it also points out the consequences to other DDS parameters.

The high-speed design is limited by the speed of technology. The higher clock is the

| Parameters                                                 | Demands                                                                                                                   | Merits                                                                       | Demerits                                                           |
|------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|--------------------------------------------------------------------|
| $\begin{array}{c} {\rm High} \\ {\rm F_{clk}} \end{array}$ | <ul> <li>High f<sub>T</sub></li> <li>Compact clock<br/>distribution</li> <li>High-speed circuit<br/>techniques</li> </ul> | • Wide bandwidth                                                             | • Can limit<br>accumulator bit-size                                |
| High<br>SFDR                                               | <ul> <li>Large accumulator</li> <li>&amp; DAC bit-size</li> <li>Optimized phase<br/>to sine conversion</li> </ul>         | <ul> <li>Suppressed spurs</li> <li>Clean fundamental signal</li> </ul>       | <ul> <li>Large area</li> <li>high power<br/>consumption</li> </ul> |
| Large Nr.<br>of frequency<br>points                        | • Large accumulator bit-size                                                                                              | <ul><li> Fine frequency<br/>resolution</li><li> Better tuneablity</li></ul>  | <ul><li>Large area</li><li>High power<br/>consumption</li></ul>    |
| Small area<br>& low power<br>consumptions                  | <ul> <li>small accumulator &amp;<br/>DAC bit-size</li> <li>Low value</li> <li>current source</li> </ul>                   | <ul> <li>Thermally reliable</li> <li>Small size</li> <li>Low cost</li> </ul> | • Compromise on $F_{clk}$ , SFDR and frequency points              |

Table 1.2: A simplified view of the relationship between major DDS parameters

most sought parameter in DDS since it defines the operating bandwidth of the DDS and the maximum DDS output frequency. Note that DDS clock is always at least twice the maximum DDS output frequency. One should not be confused between the maximumallowable clock source to the DDS and maximum-available clock source (either from VCO or PLL) in general. The latter one is relatively easy to obtain since it is narrowband, or ideally, it is one single frequency. While the former one is limited and mainly depends upon the accumulation and synchronization time inside the DDS circuits.

The frequency resolution, and the number of frequency points is related to the accumulator size. The accumulator is composed of many small adder circuits, buffer, and other digital logic. The optimization and realization of accumulator architecture, their building blocks, and sine mapping circuits require a great effort. The spurs depend upon the accumulator size, design limitation of an internal block, phase to amplitude mapping circuits inside DDS, DAC, and even digital input codes [42]. Since the DDS output is sampled by the clock, depending upon the number of samples, the DDS output suffers spurious signal, especially due to a strong harmonics. Therefore keeping the SFDR (spurious-free dynamic range) of the DDS output as high as possible is vital to produce a clean desired fundamental signal. Other hands, the switching frequency stems mostly from the architecture of the accumulator and additional buffers for clock synchronization, but it is independent of the values of frequency switched within a DDS bandwidth.

The area and power consumption are another important issues in DDS. Since DDS uses a large number of digital circuits, the number of an active transistor operating at high speed pose significant challenges not only during circuit design and layout but also during production. The number of transistors used for DDS at high speed is on the edge of the manufacturing capability of the IC foundry [43]. Such a large DDS circuit yield is poor, thus, special design effort in advance should be taken. Some technology, demands more power for high-speed operation, while other consumes less power, but trade-off the speed.

These will be explained in detail later. Several hundred individual cells are working in one chip, so the efficient power supply and optimized routing are also important.

The characterization of the DDS needs special attention because it consists of a large number of inputs. A compact, efficient, and low loss printed circuit should be designed and realized to accommodate the DDS IC. It should provide several power supply interfaces and can be connected as a plug-in RF module, compatible with RF systems. In addition to this, all important parameters are dependent on each other, focusing only on one feature can completely cease other performances, while some feature contributes to enhancing other performances, thus precise designs and optimizations are required. This leads to the necessity of the novel DDS that will fulfil requirements such as high frequency, wide bandwidth, fewer spurs, low phase noise, low power consumption, and digital integrability. Although the DDS is not a panacea to all the requirements in the signal source domain, it does, however, cover the area where the digital control, system-on-chip integration, and lower power consumption which are needed for a versatile signal source. The DDS can also be complementary and compatible with other signal source systems [44], [45],[46].

Therefore, research on the high-frequency DDS with finer frequency resolutions, fast frequency hopping and sill operating with reasonable power consumption is important in the area of modern RF and communication systems (e.g, in the RF base station and hand, held mobile devices). However, such benefits of the DDS can be achieved after a suitable design, optimization, realization, and characterization. To accomplish this, in-depth research on suitable technology, accumulator architecture, plausible design features of various digital & analog blocks, sine mapping architecture and RF integration are essential.

### **1.2** Technology selection criteria

The DDS has been realized in various technologies, mainly in CMOS, InP, GaAs, and SiGe. Depending upon the performance parameter, e.g. speed, amplitude resolution, frequency resolution, frequency points, chip area, and power consumption, the suitable technology should be chosen. Nevertheless, there is not a concrete definition of the best technology for the DDS. Here various technologies are discussed regarding the suitability for the DDS design.

#### 1.2.1 Speed

The speed of the technology is roughly defined by the  $f_T/f_{MAX}$  of the transistor of the technology. The state of the art DDS in GHz are shown in Table1.3, mostly dominated by the InP and SiGe technology. The CMOS is a field-effect transistor (FET) a well known for is high volume production and high integration in digital circuit domain. It uses both NMOS and PMOS for various logic functions. It is primarily suitable for high integration, and large volume productions. Its speed is significantly lower than the InP counterparts. There are various CMOS based DDS is reported but their frequency of operation is limited to only few GHz [47], [48]. This is because to use a high-speed CMOS, both NMOS and PMOS speed should be the same or at least comparable. However, the NMOS is driven

by the electron mobility which is faster than the hole mobility driven PMOS. The speed of the SiGe-HBT technology is also higher than CMOS for comparable node. Apart from graded Germanium based in SiGe-HBT for faster carrier transportation, the speed of the HBT also is determined by the quality of the vertically grown thin semiconductor layers. For similar comparison, the speed of the CMOS/FET device is also driven by the lithography progress. Historically, the horizontal control technique of the semiconductor layer (e.g. atomic layer decomposition technique) in HBT is way ahead of lithographic resolution in FET. There are high-frequency CMOS, often cited as RF CMOS but its RF performance is inferior to the SiGe-HBT for a given bandwidth and comparable technology node. In future, the CMOS technology having comparable PMOS and NMOS speed in the high-speed domain could be also an interesting candidate for the high-speed DDS realization, especially to seek the benefits of compactness of the transistor and low power consumption. One of the fastest technologies is InP, thanks to its high carrier mobility. The fastest speed DDS recorded so far is 32 GHz DDS which is based on InP [3], and SiGe DDS also shows beyond 10 GHz DDS realization [49].

|                                                                                                                                                                                                                                                                              | SiGe | SiGe  | SiGe | InP   | SiGe  | InP   | InP  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|-------|------|-------|-------|-------|------|
| Technology                                                                                                                                                                                                                                                                   | HBT  | HBT   | HBT  | HBT   | HBT   | HBT   | HBT  |
|                                                                                                                                                                                                                                                                              | [50] | [51]  | [52] | [43]  | [5]   | [4]   | [3]  |
| $\fbox{Transit frequency } \mathbf{f_T} \ (\mathrm{GHz})$                                                                                                                                                                                                                    | 200  | 120   | 120  | 250   | 200   | 350   | 350  |
| Maximum clock                                                                                                                                                                                                                                                                | 5    | 6.3   | 12   | 14    | 16.8  | 24    | 32   |
| frequency $\mathbf{F_{clk}}$ (GHz)                                                                                                                                                                                                                                           |      |       |      |       |       |       |      |
| Accumulator size A (bit)                                                                                                                                                                                                                                                     | 24   | 9     | 9    | 8     | 8     | 12    | 8    |
| <b>DAC resolution B</b> (bit)                                                                                                                                                                                                                                                | 10   | 8     | 8    | 5     | 6     | 7.5   | 5    |
| Worst case SFDR (dBc)                                                                                                                                                                                                                                                        | 42   | 26    | 22   | 24.8  | 20    | 30.7  | 21.6 |
| <b>Power consumption P</b> (W)                                                                                                                                                                                                                                               | 4.7  | 2.5   | 1.9  | 2.4   | 0.49  | 19.8  | 9.45 |
| <b>Die area</b> $(mm^2)$                                                                                                                                                                                                                                                     | 11.1 | 5.76  | 9    | 3.52  | 1.15  | 16.5  | 4.05 |
| Transistor numbers                                                                                                                                                                                                                                                           | -    | 13500 | 9600 | 2122  | 1351  | -     | 1891 |
| Sine mapping block                                                                                                                                                                                                                                                           | NL-  | NL-   | NL-  | NL-   | TSC   | Digi. | NL-  |
|                                                                                                                                                                                                                                                                              | DAC  | DAC   | DAC  | DAC   | based | logic | DAC  |
| Phase control unit                                                                                                                                                                                                                                                           | Yes  | No    | No   | No    | No    | No    | No   |
| $\mathbf{FOM1} = \begin{pmatrix} \frac{\mathbf{F}_{\mathbf{clk}}}{\mathbf{f}_{\mathbf{T}}} \end{pmatrix} \times 100$ (%)                                                                                                                                                     | 2.5  | 5.3   | 10   | 5.6   | 5.8   | 6.5   | 10.6 |
| $      FOM2 = \begin{pmatrix} \frac{\mathbf{F}_{clk} \times \mathbf{A} \times \mathbf{B} \times \mathbf{SFDR}}{\mathbf{P} \times \mathbf{f_T}} \\ \begin{pmatrix} \frac{\mathrm{GHz} \cdot \mathrm{bit}^2 \cdot \mathrm{dBc}}{\mathrm{W} \cdot \mathrm{GHz}} \end{pmatrix} $ | 53.3 | 39.3  | 83.3 | 23.15 | 165.2 | 9.6   | 8.4  |
| $\overline{\mathbf{FOM3} = \begin{pmatrix} \mathbf{F}_{\mathbf{Clk}} \\ \mathbf{P} \end{pmatrix}}_{(\mathrm{GHz}/\mathrm{W})}$                                                                                                                                               | 1.1  | 2.5   | 6.3  | 5.8   | 34.5  | 1.2   | 3.4  |

 Table 1.3: Performance comparison of high-speed DDS circuits in the order of increasing maximum clock frequency

#### 1.2.2 Matching

DDS has a larger number of circuits and one of the most important issues in the circuits is matching between transistors. Comparing CMOS and HBT, the latter has a better matching capability. The transistor to transistor matching in HBT is compared

with the matching of  $V_{BE}$  which are determined by the p-n junctions doping profile across the emitter and base region. The doping profile increases in each new HBT technology thus the matching improves as well. Another hand, the transistor to transistor matching is determined by the  $V_T$  (thermal voltage) matching [12]. It is the amount of gate to source voltage to establish a connection between source and drain terminal in FET. The  $V_T$  variance mainly depends upon the length and width of the gate, doping profile in an active device. These parameters are varied device to device and therefore matching in HBT technology is better than CMOS [53].

#### 1.2.3 Phase noise

Phase noise depicts an amount of energy contained close to the carrier signal. It is greatly influenced by a flicker or 1/f noise. It is one of the crucial performance parameters of any signal source including a DDS. For example, the potential application of the DDS source is in radar system where frequency mixing is typical operation. This 1/f noise is problematic where low-frequency noise is visible at the high frequency due to the mixing. Because the mixing or subsequent signal processing may shift the low noise spectrum towards the centre of the carrier signal. This increases the potential of jitters, interferences in adjacent signal channels, and lowering the final signal to noise ratio (SNR). The clean output carrier signal is the vital feature of the DDS and it is compromised when CMOS is chosen over SiGe technology. For instance, the phase noise performance of the SiGe-HBT is better than the RF CMOS for an operating frequency of 60 GHz [54]. The main reason behind the superior noise performance of the SiGe-HBT against CMOS stems from a very core of how these two technologies are developed. SiGe is a bipolar transistor where 1/f noise is mostly generated in emitter-base junction. Other hands, the CMOS is a surface conduction device. Here current flows are affected by the  $Si/SiO_2$  interface. The quality of the silicon oxide layer and the p-n junction influence the 1/f noise. For a comparable node, the bipolar HBT device has a cleaner emitter-base junction than FET oxide layer and pn junction. Hence SiGe significantly better noise performance. The effect of 3D scaling of the transistor is more on the speed of the FET than the speed of HBTs. To achieve the faster FET/CMOS, aggressive generation to generation scaling is done which makes the oxide layer thin and it moves even closer to the active channel, increasing tunnelling current, and worsening a 1/f noise performance. This also means that choosing high-speed CMOS compromises its noise performance which is not desired. Other technologies such as GaAs-HEMT (Gallium Arsenide-HEMT) have at least 10 dB noisier phase noise [55], limited large scale integration compared to SiGe-HBT.

#### **1.2.4** Power consumption, integration capability, frequency resolution

Power consumption is one important parameter for an efficient and reliable device. The DDS blends both high speed digital and analog circuits. The number of transistors used for digital circuits is relatively large as compared to analog circuits. The frequency resolution is proportional to the number of phase accumulator bits. It also implies the better resolution requires more number of bits, thus not suitable in InP technology due to tremendous power consumption (P) despite having a higher operating frequency. Also, more than 9 bits accumulator increases its transistor count dramatically. It limits the size of the accumulator (9-bits), eventually restricting the frequency points and SFDR quality of the DDS as well. It is the reason behind all the InP DDS so far has less than 256 and SFDR less than 30 dBc for above 10 GHz clock DDS [43]. Comparing SiGe based DDS [5] and InP based DDS [3], the  $\frac{F_{clk}}{P}$  ratios are 34.56 GHz/W and 3.38 GHz/W respectively. Thus, SiGe has a significant advantage over InP technology in terms of lower power consumption, better integration, and finer frequency resolution.

The CMOS technology typically limited to below 10 GHz DDS application so far, however, due to its transistor compactness, low static power consumption, and high integration capability these are prominent in low-frequency DDS applications. At low frequency, the accumulator size of the CMOS can be implemented beyond 30 bits, providing finer frequency resolution and SFDR [56]. The size of CMOS based DDS can have a number of the transistor over 10 thousand. This level of integration is difficult to achieve in non-silicon technology. The CMOS technology in several GHz DDS application could be a future consideration, especially with the improvement in RF performance of high-speed CMOS and similar speed feature between NMOS and PMOS.

SiGe technology has higher thermal conductivity and better integration capability than InP. The accumulator size in SiGe can be more than 10-bits [8], thus capable of generating more than two thousands frequency points. The number of frequency points is important since fundamentally it allows user to tune finer output frequencies. But this is also crucial in DDS used in the frequency-modulated source. The possibility of larger accumulator size in SiGe also means that the SFDR can be better.

Although SiGe is inferior to InP in terms of speed it is not very far from InP speed performance, and its mixed-signal friendly nature is a great advantage. The highest speed SiGe has already beyond 500 GHz  $f_T$  [57]. SiGe can also be adapted to integrate CMOS/FET on the same technology, known as SiGe BiCMOS technology. It means it can combine excellent RF performance of the SiGe HBTs and the high integration and high computing power of the CMOS [58]. These lights a path of future of high-speed mixed-signal circuits, as beyond 500 GHz-SiGe technology matures, more such circuits will be realized. It includes the development of the DDS as an agile signal source even in THz electronics.

### **1.3** Technology selection and goals

Considering the high-speed DDS design demands high-speed technology, high integration, low phase noise, and low power consumption and SiGe technology appears to be the most suitable technology and has greater future opportunities. The selection of the technology can also be influenced by the non-technical parameters such as cost of the process, accessibility to the process. For MMIC, from design to fabrication tape-in (to the foundry) time takes several months depending up on a design complexity. Also tape-in to production of the MMIC chip (tape-out from the foundry) takes another 3-6 months. This also means any small fault in design and production cost time and money since a complete cycle has to be repeated. Also foundry often updates its process since the technology might be still evolving, and some small change in the process might not be included in the current design kit. In that case, those small changes acquired directly from the foundry can be considered during a circuit design, if there is close contact with the foundry. This improves the reliability of the circuit simulations. Therefore it is easier if the foundry is a project partner or in co-operation with the project.

In this thesis, a 0.25  $\mu$ m technology (SG25H4) from the foundry IHP (Innovations for High-Performance Microelectronics GmBH) from Frankfurt (Oder), Germany is used for all the circuits. It is a well-established foundry and one of the pioneers in SiGe process. By the time of starting of this project, there existed a high  $f_T/f_{MAX}$  (300/500 GHz) technology (e.g. 130 nm SG13G2) from the same foundry as well, but SG25H4 selected over SG13G2 for two reasons. First, the requirement of the numbers of the transistor would exceed several thousand and is a challenge for the new technology (SG13G2) adaption as the reliability is not well-understood for very-large integration of the transistors. The SG25H4 is highly stable and proven for the measured chip that consists of transistors count beyond 4 thousand [8]. A second reason, the SG13G2 would cost three times more than SG25H4. Nevertheless, the high-speed DDS design in SG13G2 is a prospect. This DDS work in SG25H4 will be a reference to the high-speed DDS design in SG13G2.

Various DDS architecture should be compared to get the best performance. Here are some vital DDS features among other published high speed DDS:

- **F**<sub>clk</sub> : This DDS aims the highest DDS clock frequency in SiGe and third highest DDS clock frequency in any technology [43], [5], [59].
- Frequency points : This DDS has 12-bits phase resolution which is capable of generating F<sub>P</sub> = 2<sup>12-1</sup> = 2048 frequency points which is largest number (jointly with [4]) in DDS beyond 15 GHz clock [59]. Other reported DDS has the maximum phase resolution of 8-bit (128 frequency points) for such high speed DDS [43].
- **Phase control unit** : Here 6-bit phase control word (PCW) unit is implemented that enables the phase changing capability and can work along with FCW unit simultaneously.
- $\mathbf{F_{clk}}$ :  $\mathbf{f_T}$  ratio : It has the highest clock frequency to  $\mathbf{f_T}$  for a given technology ratio (1:9) [59]. The maximum achievable DDS clock frequency ( $\mathbf{F_{clk}}$ ) is approximately limited to one-eighth of the  $\mathbf{f_T}$  [59]. The technology for this DDS is IHP 0.25  $\mu$ m SiGe with  $\mathbf{f_T} \approx 180\text{-}190$  GHz. It is challenging to achieve a 20 GHz clock. In this thesis, the goal is also to increase the  $\mathbf{F_{clk}}$ :  $\mathbf{f_T}$  ratio up to 1:10, in turn pushing the state-of-the-art  $\mathbf{F_{clk}}$ :  $\mathbf{f_T}$  value in DDS design. This requires additional effort on the high-speed circuit design and optimization.
- **Power consumption** : The 12-bit phase in truncated to 7-bit capable of driving 6bit DAC. Pre-skewing register is also reduced. Including other circuit optimisation, this DDS shows F<sub>clk</sub> to power consumption ratio of 12.9 GHz/W, which is second best among other high speed DDS after [5].
- **Phase noisemeasurement** : For the first time in DDS over 10 GHz clock category, the phase noise performance of the DDS outputs will be shown here.

## **1.4** Thesis organization

This chapter introduces the background & objectives of the thesis. In chapter 2 (DDS basics), the basic understanding, past, and current development of the DDS are presented. The various types of DDS in terms of operating frequency, bandwidth, frequency points, the power consumption etc. will be discussed. It will also focus on prominent architectures for a high-frequency synthesizer. The hierarchical overview of the DDS cells is also presented. Chapter 3 (High-speed circuit design), delves into the circuits design aspect of the high-frequency digital cells in SiGe technology. Chapter 4 (Phase accumulator), Chapter 5 (Digital to analog converter) and Chapter 6 (Triangle to sine wave converter) give the insight view of three of the important blocks namely, phase accumulator, digital to analog converter, triangle to sine converter design. One of the most important aspects of the DDS chip is transferring the blocks into a layout for fabrication. Here most important layout considerations are discussed. In Chapter 7 (DDS simulation and characterization), the investigation of the DDS for various input controls and set up will be discussed. To measure the fabricated DDS chip, a suitable evaluation printed circuit board (PCB) is designed. This board not only used for measurement purpose but also directly used as a discrete RF component. The DDS can be used in many applications from communication system, ideal signal source, or even in a radar system. The DDS based radar system are reported in [44], [45]. Here one of the applications of the DDS is characterized. Therefore the demonstration of the DDS in 5-10 GHz radar system is presented. At first, the simulations are carried using individual devices properties of the prospective RF components. Then the radar is realized and tested with real devices. Finally, conclusions are drawn in chapter 8 (Conclusions) about the DDS as a versatile signal source and its applications. The possibility of further work is also given.

## **1.5** Conclusions from chapter 1

The signal generator is a vital component in the field of RF and communication systems. The conventional VCO suffers from a narrow bandwidth and inferior phase noise. The phase-locked loop (PLL) implements the feedback system locking the reference frequency set by the VCO and enhances the quality of the output signal. However, the bandwidth of the PLL is limited by the tuning range of the VCO and also it has a large settling time due to the feedback loop. Alternatively, YIG-oscillator, laser, gun-diode can be used to generate a signal. However all suffer from a slow frequency switching time, lack of digital controllability, and phase modulation capability. The DDS is an agile signal generator that has a wide bandwidth, fast frequency switching time, digital controllability, and phase modulation capability. Here a suitable technology for high-speed DDS is also discussed. The DDS basics and its architecture will be explained in the next chapter.

# Chapter 2 DDS basics

In this chapter most important performance parameters and operation of the DDS are explained. Also the DDS types in terms of its architectures and operating frequencies will be discussed. Its intention is to find the best possible DDS architecture that eases the design challenges for 20 GHz DDS.

## 2.1 DDS parameters

The most important DDS performance parameters are as described below.

#### 2.1.1 Clock frequency

Clock frequency  $F_{clk}$  is the maximum allowable reference frequency to the DDS which acts as a sampling frequency to all DDS output frequencies. Such reference frequency to the DDS can be applied up to  $F_{clk}$ . Its unit is hertz (Hz). In practice, the reference frequency is set fixed and frequency/phase control units are applied to yield different DDS outputs.

#### 2.1.2 Frequency resolution

Frequency resolution is the lowest synthesizable frequency of the DDS output. It acts as a finest frequency step between two possible DDS outputs with a DDS output bandwidth for a given clock. Its unit is Hz. The frequency resolution  $F_{min}$  is given by:

$$F_{\min} = \frac{F_{\text{clk}}}{2^{N}} (\text{Hz})$$
(2.1)

N is the bit-size of the phase accumulator. Therefore  $F_{min}$  is limited by the bit-size of the accumulator.

#### 2.1.3 DDS output frequency range

It is the operating range of the DDS output frequency  $F_{out}$ . Its unit is Hz. Ideally, it starts from near DC ( $F_{min}$ ) to the half of the clock frequency of the DDS. The lower

bound is given by the frequency resolution of the DDS and the upper bound restricted by the half of the  $F_{clk}$  which is given by:

$$F_{\min}$$
 (Hz)  $\leq F_{out}$  (Hz)  $\leq \frac{F_{clk}}{2^N}$  (Hz) (2.2)

#### 2.1.4 DDS output frequency points

The number of output frequencies  $F_P$  with in a DDS frequency range is not infinite. It is because DDS is a digital device and its output frequency points are defined by the bit-size of the phase accumulator.

$$F_{\rm P} = 2^{\rm N-1} \tag{2.3}$$

#### 2.1.5 Spurious free dynamic range



Figure 2.1: SFDR of DDS output signal.

Spurious free dynamic range (SFDR) is the ratio of the desired DDS output signal  $(F_{out})$  to the strongest spur  $(F_{spurs})$  power within a given bandwidth. It is one of the most important specifications for the dynamic performance of the DDS. For SFDR calculation, the full DDS output bandwidth (DC to half of the clock) is considered. It is also called as wideband SFDR. For example, the output spectrum of the 20GHz-clock-DDS is shown in Fig.2.1. Within a DC to 10 GHz range, the fundamental DDS output and the strongest spur lie at 0.625 GHz and 1.875 GHz respectively. The difference in dB scale between these two signals is the SFDR of the DDS for this particular DDS fundamental output. Thus its unit is dBc. In this example, the SFDR is 30 dBc. In some applications, where the DDS output operates over small bandwidth and the largest spurs are considered only within such bandwidth, then the term narrowband SFDR is used. Here, let us suppose the DDS is used for application as a signal source up to 1.5 GHz and higher order frequencies are suppressed by using a low pass filter at 1.5 GHz. In this case, the narrow band SFDR calculation considers DC to 1.5 GHz only. Within this bandwidth,  $F_{out}$  and largest spurs  $(F_N)$  are at 0.615 GHz and 0.312 GHz respectively. Therefore, the narrowband SFDR is 35 dBc, limited by  $F_N$ .

#### 2.1.6 Frequency switching time

It is the time required to change one DDS output frequency to another frequency. Its unit is second (s). This is an important parameter that limits the switching speed of the frequency shifted modulation using a DDS source.

### 2.2 DDS architecture overview

Direct digital synthesizer (DDS) can produce a signal with a wide range of frequencies. It takes the digital input codes and translates it into phase accumulation, which yields a corresponding output sinusoidal wave. The nature of the output signal is not limited to sinusoidal but can also generate square and triangular signals. In general, the DDS can be categorized in terms of architecture and operating frequency.



Figure 2.2: Basic DDS block diagram [6].

From architectural point of view, DDS consists three main building blocks namely, phase accumulator, sine amplitude mapping, and digital to analog converter (DAC) as in Fig.2.2. The phase accumulator takes the digital input code and translates it into the phase information. It acts as a phase counter, at each clock cycle, is incremented by the frequency control word (FCW), updating the signal phase. The phase stored by the accumulator is converted to the corresponding sine-wave amplitude by the phase to amplitude/sine mapping block and then passed to the DAC, which converts it to an analog output. Finally this signal is smoothed by the filter. Although, each building block can have a variety of sub-architectures, three main blocks can be described as:

1) Phase accumulator can be fully pipeline, half-pipeline approach or parallel architecture.

2) Sine mapping bock can be ROM (read only memory) or ROM-less based.

3) DAC can be either linear or non linear. In some cases sine mapping is integrated with the nonlinear DAC.

The **Phase Accumulator** (or accumulator) is one of the most import parts of the DDS. It accumulates the phase information of the sine wave at each clock cycle. This is responsible for the frequency resolution, a number of synthesizable frequency points, and spurious signal formations. To visualise the accumulator operation in a simplified form, it is represented as digital phase wheel with N size accumulator in Fig.2.3. One complete sine



Figure 2.3: An accumulator-operation representing the digital phase wheel (left) and linear accumulation ramp (right)[6].

wave cycle has an angular range from 0 to 2  $\pi$  completing one full circle. Total 2<sup>N</sup> equally spaced sampling points are represented by the dots on the circle. The accumulation starts from 000...00, 000...01, and so on up to 111...11, while the next clock cycle completes one full circle (equivalent to one accumulation cycle). This means accumulator is full and resets to zero. The frequency control word (FCW) input code acts as a phase increment (jump size). The spacing between the two adjacent points in the digital phase wheel defines the frequency resolution of the DDS. In other words, narrower spacing leads to a finer frequency resolution.

The accumulation in relation to clock cycle can be further illustrated in Fig.2.3(right). The N-bits accumulator has total  $2^{\rm N}$  sampling points. For instance, when the jump is only one point, the output frequency corresponds to the  $\frac{1}{2^{\rm N}}$  of the clock frequency. One complete cycle of accumulation takes  $2^{\rm N}$  clock cycles of the F<sub>clk</sub>. The overflow counter resets the accumulation value to zero and start over for the second accumulation cycle. The second cycle is identical to the first given that F<sub>clk</sub> and FCW are constant. Note that the output from the accumulator is not a sinewave but linear-phase increment counter. Again, if the phase jump is changed to 2 points (by changing the FCW), it doubles the speed of the accumulation cycle. This means output frequency becomes  $2/2^{\rm N} \times F_{\rm clk}$ .

The accumulation cycle time of the output signal of the accumulator is same as of the DDS final output, regardless of type of the signal (ramp, triangle or sinewave). Therefore the relation between  $F_{clk}$ , FCW, N, and output frequency are also same for accumulator and DDS output. Due to Nyquist criterion, to reconstruct sine wave at least two sampling points are needed; this is only possible if the sampling clock frequency ( $F_{clk}$ ) is twice the synthesized output frequency ( $F_{out}$ ) [60]. Thus the maximum output frequency is limited to half of the clock while minimum output frequency is limited to frequency resolution of the accumulator. The output frequency of the accumulator (or DDS) is given by [39]:

$$F_{out} = \frac{FCW}{2^N} \times F_{clk} \quad \forall \quad \frac{1}{2^N} \le Fout \le \frac{F_{clk}}{2}$$
 (2.4)

There are various techniques to design and realize such individual blocks. Each individual blocks can be realized differently depending upon the performance requirements. For instance, an accumulator can be compromised of single bit adder circuit or double bit adder circuits or different bit-size which will be explained in chapter 4. The phase accumulator section can have variation in terms of their implementation. For larger number of input bits, the size of the accumulator increases. The pipelining architecture, number of adder, type of adder and inclusion of the register can be different in different phase accumulator. Despite, their primary task is to accumulating phase of the input FCW at each clock cycle. The most visible difference in various DDS is rather dominated by the sine mapping circuit implementations. Therefore, three distinctive DDS architectures are shown in Fig. 2.4 which is defined by sine mapping techniques. These are CORDIC (coordinate rotating digital computer) algorithm based, ROM (read only memory) based and ROM-less based DDS architecture.



Figure 2.4: Different DDS architectures in simplified form [7].

#### 2.2.1 CORDIC approach DDS architecture

CORDIC approach uses a series expansion such as Taylor series expansion or polynomial expansion to approximate the ideal curve of sinewave. This algorithm calculates the amplitude directly, based on the projection of a rotating vector in a polar axial system [61]. This requires considerable amount of hardware and power consumption, and its implementation is mostly available up to 14 GHz frequency operation [62], [2].

#### 2.2.2 ROM based DDS architecture

010, and 001. These values can be stored in a register or memory, or simply referred to as ROM. For the sine wave, the same approach can be implemented, however, to preserve the amplitude of the curvature of the sine wave, look-up table with non-linear amplitudes is required. In the ROM based DDS, for M-bit phase resolution and N-bit amplitude resolution, it needs  $2^{M} \times N$  number of registers or ROM cells [64]. Storing all amplitude information into the look-up table requires a significant amount of space, means larger area and power consumption. As the phase resolution corresponding to amplitude resolution increases, the size of the look-up table expands exponentially, making unrealistic power and area consumption. To keep an affordable ROM size, various compression techniques utilize different compression architectures. Using the symmetry approach, where the complete sine wave is divided into four parts (0 to  $\pi/2$ ,  $\pi/2$  to  $\pi$ ,  $\pi$  to  $3\pi/2$ ,  $3\pi/2$  to  $2\pi$ ) and only first quadrant is used in a look-up table. Rest quadrants are generated by using flipping or mirroring the first quadrant [65]. This compresses the ROM size by approximately 4:1. This means only  $2^{M-2} \times 2^N$  number of register or ROM is required to save amplitude information of complete sinewave. There are various ROM compressions are proposed, e.g. Hutchinson's techniques (compression 11.6:1) [66], Nicholas's architecture (compression 34.4:1) [67], Sunderland's architecture (compression 51:1) [68], Bellaouar architecture (compression 78.8:1)[69], and El Said and Elmasry architecture (compression 128:1) [70]. For example, the conventional ROM-based for 15-bit phase and 14-bit amplitude resolution DDS requires more than 458 thousands ROM cells, using a symmetry approach, the ROM cell number is reduced to 114 thousands. This is still unrealistic. The ROM compression of 94.3:1 is achieved using nonlinear ROM addressing with improved compression ratio and quantization noise, reducing the number of ROM cell to only 1214 [71]. This is further reduced to compression of 160:1 as reported in [72]. A review of different ROM based sine to amplitude mapping architecture is provided in [73]. Despite the efforts related to compression techniques, the increase in power consumption and area of the ROM still limits the maximum clock frequency. It is because the internal delay during a data retrieving from the ROM table restricts the high frequency of operation. For lower clock frequency, this is less problematic since data retrieval time is less than time period of the clock of the DDS.

#### 2.2.3 ROM-less DDS architecture

To operate DDS at higher clock, ROM-less techniques, are widely implemented. There is a ROM-less DDS that uses digital logic for the phase conversion as reported in [74], however, the digital logic implementation consists of binary and coarse DAC. The number of digital logical gates is over 50 and it is still questionable at a higher frequency. The author published the 32 GHz DDS [3], however rather used sine-weighted DAC for the phase conversion instead of digital logic. It could be possible that since a large number of digital logic required in phase translation in the digital logic approach, the longer data retrieval time and internal delays are still limiting the clock frequency. This is evident from the drop in clock frequency 32 GHz to 13 GHz in digital logic ROM-less approach given that the same technology, same 8-bit phase and 6-bit amplitude resolution in both cases. However, these digital logics still worth mentioning due to significantly less power consumption (5.42 Watts) compared to 32 GHz DDS (9.45 watts). ROM-less based phase

to amplitude conversion approaches are explored to reach higher clock frequencies [18], [75]. ROM-less sine mapping architectures can be designed generally in two ways: 1) sine



Figure 2.5: Conventional ROM-less TSC based DDS architecture [8].

weighted non-linear DAC (NL-DAC) or 2) DAC with a triangle to sine wave converter (TSC).

The NL-DAC based technique is the combination of sine mapping block integrated with the nonlinear (weighted) individual cell of the DAC. It directly converts the linear phase information of the sine wave into the correct amplitude of the sine wave [76], [77]. For instance, in some cases, this combines a coarse NL-DAC in the flat slope region and a fine NL-DAC in the curvature region of the sine [75]. NL-DAC for sine mapping shows promising results as in [39], [11]. The main difference between linear and nonlinear DAC is that the former one uses an identical or power 2 weighted current source while later on uses a variety of weighted current sources [39]. Although NL-DAC techniques work even in GHz range DDS, it requires a highly complex circuit due to a large number of sine weighted cells, encoders, and exhibits high power consumption. The weighting of the amplitude carried out by using a large matrix of the current cell. For instance, [50] demonstrated 10 bit amplitude resolution NL-DAC technique in 5 GHz clock DDS. Even after sinewave symmetry and segmentation, this DAC has current source arrays (matrix) composed of 512 unit current sources despite optimization. Row and column decoders are required to extract the correct amplitude of sinewave from the matrix. Besides, it needs special care to distribute current units randomly to improve the current matching. These pose a significant design challenge on synchronization and voltage variation in a switch.

Conversely, the amplitude resolution of 10 bit, without sine mapping, can be achieved by using a 10-bit linear DAC. It accomplished by using the partially segmented DAC, 6 bits (segmented approach) and 4 bits (resistor dividing approach). It needs only 67 current sources. But it still needs a sine mapping block. The TSC provides the sine mapping along with such linear DAC. This ROM-less technique uses a linear DAC with differential pair based TSC. The linear DAC is simpler in terms of hardware, can operate at higher clock frequencies compared to an NL-DAC for a given  $f_T$ , and consumes less area and power compared to an NL-DAC, while TSC is a very compact translinear circuit itself [18] [49]. Hence, the presented ROM-less DDS with TSC is a prominent high-speed DDS solution that overcomes the high power consumption problem and speed limitation. Therefore, ROM-less linear DAC along with TSC technique is chosen in the sine mapping section of the DDS. A conventional ROM-less TSC based DDS is shown in Fig.2.5. The bit-size of accumulator and DAC are shown for explanation purpose. It consists of N-bit accumulator, N/2-bit linear DAC, and TSC. Accumulator is truncated to eliminate the extra buffer required for the accumulator outputs synchronisation. However, it doesn't reduce the frequency resolution or FCW input control. In high speed DDS; the DAC appears to be 5-10 bits in practice [59], [39], [43]. Both NL-DAC and TSC based can used for sine mapping in high-speed DDS, however, the power consumption to clock ratios are much higher (see chapter (Table 1.3)).

### **2.3 DDS architecture used in this thesis**



Figure 2.6: Modified ROM-less TSC based DDS architecture used in this design.

In this thesis, ROM-less TSC based DDS architecture is used but with additional and improved features as shown in Fig.2.6. It has 12-bit phase resolution (FCWs), 6-bit linear DAC and TSC unit. The 12-bit phase accumulator assuring 2048 frequency points, truncated bits from 12 to 7 bits save power without degrading frequency resolution. The 6-bit phase control unit integrated synchronized with an accumulator providing double tuning capabilities (phase and frequency) which is the highlight of this architecture since no other high-speed DDS over 5 GHz has implemented such feature. Only [50] shows 5 GHz clock DDS with phase control unit implementation. The Note that the DAC in Fig.2.6 is without clock synchronization. It is free running, and all the possible delays are taken into account during a layout and minimizing any time delay issue. It eliminates the area and power burden due to the synchronization circuits. This architecture meets the design DDS target set (see chapter 1).

The DDS work in [5] is the closest DDS category to this DDS. Both DDS have SiGe technology, ROM-less, TSC, and GHz range of  $F_{clk}$ . [5] paper shows the remarkably low power consumption of only 488 mW. This DDS has 1.55 W for slightly higher  $F_{clk}$ . However, this is not a fair comparison since this DDS has an extra 6 bits phase control unit, 12-bit accumulator instead 8-bit. This leads to the increment of full adders, buffers,
and clock drivers. For a fair comparison, the 8-bit accumulator DDS is implemented using the cell designed in this DDS. Then the power consumption of 8-bit DDS based on this design is calculated. The full adder number is decreased from 12 to 8 bits, clock drivers decreased from 20 to 8, and PCW is eliminated. This results accumulator = 450 mW, DAC = 214 mW, and TSC = 50 mW. This means 8-bit DDS in this design would have consumed 790 mW only.

# 2.4 DDS hierarchy



Figure 2.7: Hierarchy of the DDS cells.

The DDS is composed of at least three individual blocks (accumulator/DAC/TSC). Each block are composed of various mini-blocks, and unit cells. The accumulator contains clock-tree, several adders, drivers, and emitter followers. The adder is composed of XOR gates, majority gates, registers, and digital filp-flops. Similarly, the DAC consists of combiner, divider lines, and high-speed switches. The TSC contains differential pair along with, voltage shifter, and low pass filters. With this regard, DDS can be viewed as a hierarchical design with many unit-cells inside a cell and many cells inside a block as in Fig.2.7. Each cell/block is optimized and will be explained in respective section.

# 2.5 Conclusions from chapter 2

In this chapter, the DDS parameters and DDS architectures are presented. The selection criteria of this typical TSC based DDS architecture is also provided. DDS is composed of many unit-cell, cell and block. This can be viewed as a hierarchical design and this chapter provides the top view with the DDS designed in a thesis. The high-speed circuit used in all the DDS blocks will be presented in the next Chapter.

# Chapter 3

# High-speed circuit design

This chapter describes the basics of the fabrication technology and relevant circuit design for the DDS. The circuit design explains the basic understanding of the available transistor's performance (speed, power consumption, and bandwidth) characteristics; biasing circuits, performance optimization etc. Most of the digital blocks, such as logic gates, buffers, registers, and clock-tree are primarily used in an accumulator. Other hands, the biasing circuit and emitter followers are used in whole DDS. The speed of the unit cell of the digital block impacts on the overall speed of the DDS. The unit cell is the fastest CML (current mode logic) that will be explained in detail, and the same CML approach is carried out to form other digital functional blocks. The block-level design for an accumulator, digital to analog converter and triangle to sine converter will be described in the respective chapters. However, the unit cell logic and important circuit design methodology will be inherited from this chapter.

# 3.1 SG25H4 technology

The SG25H4 (0.25 µm SiGe BiCMOS) technology has two unique high-speed NPN-HBTs ranging from a higher RF performance to high breakdown voltages up to 2.3 V. One of the HBTs npn200\_1 has  $f_T/f_{MAX}$  of 190/190 GHz while the other npn201\_2 has  $f_T/f_{MAX}$ of 180/220 GHz. The BEOL (Back-End-Of-Line) of SG25H4 technology is presented in Fig.3.1. It consists of five Aluminium metallization layers (TM2/TM1/M3/M2/M1). All the vertical layers can be connected by vias. The top two layers (TM1 and TM2) exhibit thicker metal layers, with 2 and 3 microns thickness respectively. M2 and M3 are  $0.73 \ \mu m$ thick, and the lowest metal layer M1 is  $0.53 \ \mu m$  thick. The vertical distance between TM1 and TM2 is 3  $\mu$ m while all other metal layer heights between respective adjacent layers are 0.9  $\mu$ m. TM1/TM2 is generally used for high current density lines such as power supply lines, and passive inductor design. The metal-insulator-metal capacitor  $(C_{MIM})$  is always designated between M3 and M2 layers. The HBT lies nearly on the bottom of the stack. The closest metal layer to the HBT is M1. Therefore, the ground layer is often made from a large mesh of the M1 layer. It helps to spread the heat from the HBT quickly and evenly to the ground. During the layout of the circuits, or even during the circuit design, this BEOL picture should be kept in mind to route signal lines efficiently between layers and to consider thermal and RF effects. Next, the basic of the transistor operating



Figure 3.1: BEOL of the IHP's 0.25 µm BiCMOS technology [9].

region is explained, and an in-depth investigation of these parameters is performed for high-speed circuit design using the SG25H4 HBTs.

# **3.2** Transistor I-V characteristics



Figure 3.2: Simplified I-V characteristics set up for HBT npn200\_1.

Simple large-signal modelling of the transistor can be represented by the collector current versus collector-emitter voltage, generally referred to as I-V(I<sub>C</sub> versus V<sub>CE</sub>) characteristics of the transistor. Here for the simulation HBT npn200\_1 is chosen (see Fig. 3.2). The supply voltage of V<sub>CC</sub> is applied with the collector load resistance of R<sub>L</sub>. The V<sub>BE</sub>, V<sub>CE</sub>, and V<sub>BC</sub> are the base to emitter, collector to emitter, and base to collector

voltage respectively. Similarly,  $I_C$  and  $I_C$  are the collectors and base current respectively. This HBT is simulated for the I-V plot as in Fig. 3.2, where the relationship between the  $I_C$  and the  $V_{CE}$  voltage with base current as a parameter (from 1  $\mu$ A to 61 $\mu$ A) can be observed. The I-V plot segmented in three important parts: 1) active region 2) cut-off region and 3) saturation region.



Figure 3.3: Simulated I-V characteristics of HBT npn200\_1.

#### 3.2.1 Active region

Active region is the operating region of the transistor where the transistor acts as a fairly linear amplifier. The collector current is constant for a fixed bias current. The increment in base current (I<sub>B</sub>) increases the collector current (I<sub>C</sub>) by the factor of current gain value ( $\beta \approx \frac{I_C}{I_B}$ ). In other words, the transistor behaves like a voltage-controlled current source in the active region. To be in this region, the V<sub>BE</sub> should be greater than V<sub>th</sub> (base to emitter threshold turn-on voltage). The V<sub>CE</sub> should be greater than V<sub>BE</sub> to put the transistor in the forward-active region but also less than the avalanche breakdown voltage limit V<sub>CEO</sub>. Beyond V<sub>CEO</sub>, the transistor might go into a nonlinear mode and cause the catastrophic breakdown of the transistor. In Fig.3.3, the simulation shows that the collector current increases dramatically beyond 2.3 V, the area generally avoided for circuit biasing.

#### **Conditions:**

$$\label{eq:VBE} \begin{split} & \text{i) } V_{BE} > V_{th}, \; V_{BC} \; < \; 0. \\ & \text{ii) } V_{CEO} > V_{CE} > V_{BE}. \\ & \text{iii) } I_B \; > 0, \; I_C \; > \; 0. \end{split}$$

#### 3.2.2 Cut-off region

The cut-off region is the inactive region of the transistor where the transistor is in offstate, the emitter and collector nods are open circuit. It means, there is no or negligible collector current  $I_C$  following through the load  $R_L$ . It occurs when  $V_{BE}$  is less than the minimum threshold voltage  $V_{th}$ , the emitter to the base junction is not forward biased irrespective of  $V_{CE}$ . In Fig. 3.3, for a base current of 1  $\mu$ A, an insignificant emitter/collector current flows defining cut-off region.

**Conditions:** 

i)  $V_{BE} < V_{th}, V_{BC} < 0.$ ii)  $I_B \approx 0, I_C \approx 0.$ 

#### 3.2.3 Saturation region

In a saturation region,  $V_{CE}$  drops below  $V_{BE}$ , and the collector-base junction becomes forward bias. The base current increases and the current gain  $\beta$  decreases. As in Fig.3.3, the saturation region increases with increasing the base current.

#### **Conditions:**

i)  $V_{CE} < V_{BE}$ ,  $V_{BC} > 0$ .

ii)  $I_B > 0$ ,  $I_C > 0$ .

Using two states of the active and cut-off region, the transistor can be used as a switch. The switching of the transistor can be explained using operating points of the transistor in two states. For example, when the transistor is operating at  $(V_{CE}, I_C) \approx (1.7 \text{ V}, 1.2 \text{ mA})$ , it is an active region and fully operating. When the operating condition of the transistor is brought down by decreasing the base current with the operating condition at  $(V_{CE}, I_C) \approx (1.9 \text{ V}, 0 \text{ mA})$ , no collector current flows, hence the transistor turns into a cut-off region and is called to operate in off state. The speed is the fastest in an active to cut-off region switching. For high-speed operation, the transition between saturation to active mode is a slow process [12].

# 3.3 $f_T/f_{MAX}$ of SG25H4 transistor

For the DDS design, the primary concern is to achieve the highest speed circuits. The transistor's  $f_T/f_{MAX}$  value depends upon the biasing condition (V<sub>CE</sub>, I<sub>C</sub>). The SG25H4 foundry provided the measured  $f_T/f_{MAX}$  values (npn200\_1 HBT) which are 180/200 GHz for fixed biasing condition (V<sub>CE</sub>, I<sub>C</sub>) of (1.5V, 2mA) [78]. Before calculating, the  $f_T$  of the two-port system defined as the frequency where the magnitude of the short circuit current gain becomes unity. The  $f_{MAX}$  is the frequency where the power gain of the two-port system becomes unity [25]. In actual circuit design various biasing condition might be needed, or at least to find out the optimum biasing condition for peak  $f_T/f_{MAX}$ . Therefore an extensive study of the speed simulations of the transistor over different basing conditions is done. The  $f_T/f_{MAX}$  frequency as a function of I<sub>C</sub> and V<sub>CE</sub> is calculated as described below. The frequency extracted when a current gain of the circuit is one and varied with various collector currents. As shown in Fig.3.4, the transit frequency increases with increment in collector current up to 2mA for various V<sub>CE</sub> (1 to 2V). The V<sub>CE</sub>



Figure 3.4:  $f_T/f_{MAX}$  with the function of  $I_C$  and  $V_{CE}$  for npn200\_1 (emitter number =1).



Figure 3.5:  $f_T/f_{MAX}$  with the function of  $I_C$  and  $V_{CE}$  for npn200\_1 (emitter number =2).

beyond 2 V is not recommended for two reasons. The first reason is that the collectoremitter breakdown voltage is 1.9 V and having  $V_{CE}$  beyond that threshold might be risky for the transistor reliability [79],[80]. The second reason is that there is no increment in transit frequency for 2 V as compared to 1.5 V for a constant collector current. The  $f_{MAX}$ is highest around 2 mA collector current. The  $f_{MAX}$  increases with increment in  $V_{CE}$  for a constant  $I_C$ , but it should be less than the breakdown voltage. These simulation results suggest that the optimum biasing for peak  $f_T/f_{MAX}$  should be around (1.5 V, 1.5 mA).

The emitter fingers can be increased to have a higher current density. This will also shift the optimum collector current levels for peak  $f_T/f_{MAX}$  as well. For example in Fig.3.5, the two emitter fingers HBT is simulated for the  $f_T/f_{MAX}$  for various  $V_{CE}$  level. The optimum current level is doubled (3mA) in double emitter finger HBT as compared to the optimum current level (1.5 mA) of the single emitter finger HBT for the peak  $f_T/f_{MAX}$ .

# **3.4 Biasing circuit**



Figure 3.6: Resistor divider biasing [10].

The first task of the circuit design is to select a biasing network for the constant power supply. Ideally, the biasing circuit does not influence RF performance. But it has an influence on a stable and predictable current source that is fundamental in any circuit design. The simplest method of biasing is a resistor divider, as shown in Fig.3.6. Here the base to emitter voltage of the transistor T1 is set by the potential divider with the  $R_1/R_2$ values, which is the voltage drop across the  $R_2$ . The current gain ( $\beta$ ) is not constant due to the fabrication process limitations, thus the current source value varies. Also, this biasing technique is more prone to temperature and supply voltage fluctuation and cannot supply the constant collector current affecting both the DC and RF performance. To reduce such influence, a simple current mirror biasing can be implemented as in Fig.3.7. Here the current mirror provides a stable current source for biasing purpose. It is a good alternative to a previous simple biasing circuit. It comprised of two identical transistors (T0 and T1), where T0 is a diode-connected HBT and T1 is the amplifying transistor. The input current (Iin) flowing through the biasing branch is replicated or mirrored to the collector of T1. Hence, it provides the constant collector current  $I_C \approx I_{out}$ , to the amplifying-branch by simply mirroring Iin. The HBT pair (T0 and T1) should have the same current gain ( $\beta$ ) and should be kept as close as possible to decrease process spread.

Some circuits demand multiple values of reference current, depending upon current drive level and biasing conditions. In such case, higher currents can be generated utilizing the same current source as in Fig.3.7 and transferred into multiple reference current mirrors as in Fig.3.8. It also keeps the reference current low, which reduces the DC loss at bias resistor. In Fig. 3.8, a summation of multiple reference currents (Iout<sub>Total</sub>) can be mirrored in the common branch of collectors of transistor T1/T2/.../TN, by implying the proper ratio of transistor's number.

Another important issue is the current matching between the current sources. In some application, for example, in current weighted nonlinear digital to analog converter (DAC), precise current source values are required to function as a converter [81]. A perfect current matching is difficult to achieve due to the process variations in implementing the transistors [82], [83]. To alleviate this problem, the bipolar current mirror with resistive



Figure 3.7: Current source using a simple current mirror [10].



Figure 3.8: Current source using a multiple current mirrors [11].

degeneration is used. It is a modification of the fundamental current mirror in which the degeneration resistor is added for higher output resistance and the lower sensitivity of the gain to the input-output voltage imbalance [84]. It implies the gain is less sensitive to change in input and output voltages. The resistive emitter degenerated current mirror (see Fig.3.9) increases the current matching. This current source is symbolized as in Fig.3.9(right). It also increases the output resistance of the current source. The emitter degeneration significantly improves current matching by compensating for process mismatches at the expense of a small power loss through resistor  $R_E$  (i.e.  $I_C \times R_E$ ) [85]. The voltage drop across the  $R_E$  is maintained below 200 mV to limit the voltage headroom of the transistor and power dissipation. Also, the emitter degeneration is inversely proportional to the gain of the circuit; hence RE should be small. The output collector current of transistor T1 is given by:

$$I_{out} = I_{in} \left( 1 - \frac{2}{\beta} \right) \tag{3.1}$$

In our case, current gain  $\beta = 270$ , the mirrored output current (I<sub>out</sub>) is 99.26 % of the input current (I<sub>in</sub>).



Figure 3.9: Current source using an emitter degenerated current mirror (left), and its symbol [12].

In some cases, the current level has to be precise to the weight of the output. As shown in Fig.3.10, the beta helper improves the precision of the current mirroring at the expense of extra circuits. The output current for the beta helper technique is given by:

$$I_{out} = I_{in} \left( 1 - \frac{2}{\beta(\beta+1)} \right)$$
(3.2)



Figure 3.10: Current source using a beta helper current mirror [11].

For current gain  $\beta = 270$ , this mirrored output current (I<sub>out</sub>) replicates the 99.99 % current of the input current (I<sub>in</sub>). As compared to emitter degenerated current mirror, its Iout has only 0.73 % more precise replication of input current (I<sub>in</sub>). But this beta helper requires one extra transistor, resistor and a diode, while the improvement is not significant. One of the most critical issues with current sources appears in a DAC. For nonlinear DAC small fraction of the current mismatch will lead to incorrect decoding of the input code.

But we will use linear DAC where such mismatch is more relaxed. Even the transistor temperature increment up to 80 °C has minimal effect on the performance of the linear DAC. It implies we can opt-out the necessity of the current source with beta helper current mirror saving area and power consumptions. Therefore emitter degenerated current mirror is sufficient in DAC and used as current sources in this thesis unless otherwise specified.

# 3.5 Emitter follower

Emitter follower (EF) is one of the primary single-stage amplifier topologies. The input signal is applied to the base of the transistor, and the output is taken at the emitter as in Fig.3.11. The voltage gain the EF is close to unity and the output voltage is always shifted down by the value of  $V_{BE}$  (  $\approx 0.85$  V). At low frequencies, the input impedance of the EF is:

$$R_{\rm in} = \beta \times R_{\rm E} \tag{3.3}$$

The output impedance is given by:

$$R_{out} = \frac{R_{source}}{\beta}$$
(3.4)

Where,  $R_E$  is the emitter resistance. EF has therefore high input impedance, and avoids loading of the previous circuit.



Figure 3.11: Simplified emitter follower circuit.

 $R_{source}$  is the source resistance when looking towards the base of the transistor. At the output, the value of source impedance is scaled down by the current gain. Thus, EF can drive low-resistive loads. EF is a simple yet unique circuit that is widely used as a buffer between two logic circuits. A buffer is a circuit that helps impedance matching enabling power transfer from the source to load. It enhances the driving capability of high capacitive load. The downside of the EF is that it requires an extra current source introducing additional power consumption [12]. It can be used as a buffer or voltage shifter, according to the requirements. For example, in XOR gate using emitter-coupled logic, the input of the lower differential pair has at least V<sub>BE</sub> less voltage level as compared to the input of the higher differential pair. Thus the same voltage level used for higher differential pair can be used to lower the voltage at the differential pair as well by using an emitter follower that eventually shifts the voltage by  $V_{BE}$ . In this DDS design, EF is used in several circuits such as digital logic, DAC, and TSC.

# **3.6** Digital logic circuits

The DDS is a mixed-signal circuit where a large portion of the area and transistor count (approximately 75 %) are employed for the digital block. It means that logical expressions are implemented by using digital logic gates. The goal is to realize the highest speed of digital logic in this SG25H4 technology. There are several high-speed logic technologies such as current mode logic (CML), emitter-coupled logic (ECL), diode-transistor logic (DTL), transistor-transistor logic (TTL), static CMOS and resistor-transistor logic (RTL). Among them, CML/ECL is the fastest logic [86], [87]. Both CML and ECL logic is based on differential pair circuits. They are almost identical, the only difference is: in ECL, extra emitter follower circuit at the output is inserted to increase the fan-out or to drive large node capacitances. The addition of emitter followers increases the drive capability of the logic gate and allows feeding multiple loads. Generally, CML and ECL gates are interchangeable. The constant current source (tail current) of the CML gate is maintained using a current mirror. The tail current source is usually chosen such a way that it operates close to the maximum transit frequency of the given technology. All the transistors are kept in active mode so that the switching speed is highest by lowering the time needed to alter the base charge while switching [12]. The unsaturated transistor eliminates any memory time and also prevents the fluctuation of power dissipation of CML circuits as the frequency increases. Therefore, the CML gate is used for very high-speed circuits [88]. The drawbacks of the CML gate as compared to static CMOS counterpart: the former one consumes larger area due to non-minimum size transistors and more importantly static power consumption. However, the primary goal of these circuits is to have the highest speed despite large area and high power consumption. The CML is more robust against common-mode signals and supply voltage noise because it is fully differential. Due to low noise and small logic voltage swing as compared to other logic circuits, CML facilitates integration to sensitive analog, RF, and mm-wave circuits. Therefore CML gate is used in almost all digital logic in our DDS. The ECL gate is used only when there is a strong requirement to drive the following stage or to shift the voltage level.

One of the major high-speed limitations of SiGe HBT is the input base resistance. Since SG25H4 is a BiCMOS technology, the BiCOMS-CML logic can be used for the high-speed switch [89]. It employs a cascoded stage of MOSFET as a common source with the HBT as a common base. This combines the large slew rate and low input resistance benefits from HBT and MOSFET respectively. However, this logic family is not implemented due to the reliability issue of the MOSFET and HBT fabrication in a single chip. Therefore here only HBT based highest speed CML logic is used for a variety of the most common digital logic gates such as inverter, exclusive OR gate (XOR), XNOR, OR, NOR, AND, NAND [90]. It is used to derive different blocks such as flip flops, registers and majority gates, adders, and accumulators. It implies that the faster the CML gates leads to the faster corresponding logic block. The highest speed individual blocks with optimized clock tree assure the fastest operation of the whole DDS circuit in SiGe technology. Next, various logic gates, their optimization, and their functions will be explained step by step.

#### **3.6.1** Inverter gate

The inverted gate is one of the basic and most commonly used logic. As the name suggests, the output of this gate becomes logically low for an input logic high level and vice versa. As shown in Fig.3.12, a simple differential pair transistor T1 and T2 in CML functions as an inverter gate. Both transistors are biased with  $V_{CE} > V_{BE}$  and tail current  $(I_C)$  is maintained for the highest speed using a current mirror. The output voltage swing of this CML logic is governed by the product of the collector current  $(I_C)$  and the load resistance  $R_L$ . The high output logic level is limited to  $V_{CC}$  and low output logic level is  $V_{CC} - I_C \times R_L$ .



Figure 3.12: Inverter gate using CML logic [12].

To understand further, a sufficiently large voltage is applied to the differential inputs  $(A_P \text{ and } A_N)$  at the base of the transistors T1 and T2, respectively. When  $A_P$  is high and  $A_N$  is low, the tail current is fully switched to T1 transistor side while the other transistor T2 is cut off mode. This means collector current of Ic is passing through the collector load of the transistor T1 while no such current flows through the transistor T2. Therefore, the output voltage (Out<sub>P</sub>) taken from the collector load when is  $V_{CC} - I_C \times R_L$ , which is a low logic level. And the output voltage taken from the collector load of the transistor T2 is  $V_{CC}$  (high logic). For complete switching of the current through either side of the differential pair, the single-ended output peak-to-peak voltage swing  $V_{PP}$  should be [91]

$$V_{\rm PP} > 4 \times V_{\rm T} + I_{\rm C} \times r_{\rm e} \tag{3.5}$$

Where,  $V_T$  is the thermal voltage ( $\approx 26 \text{ mV}$  at room temperature),  $r_e$  is the parasitic emitter resistance. In advanced HBT devices, the current density for peak  $f_T$  increases thus

the emitter width also decreases. It introduces a noticeable parasitic emitter resistance. For example in SG25H4 process, the maximum value of emitter resistance times emitter area is  $6.174 \text{ W} \times \mu \text{m}^2$  for a base current density ranging from 0.1 to 2 mA  $\mu \text{m}^2$  [78]. The emitter finger area of standard HBT is 0.21  $\mu \text{m} \times 0.84 \,\mu\text{m}$ , which yields 35  $\Omega$  parasitic emitter resistances. From a rough estimation, for 1mA of collector, the voltage drop across  $r_e = 35 \text{ mV}$ . Therefore, to exceed the thermal voltage and  $r_e$  voltage drop, at least 135  $\Omega$  load (collector) resistance is required while operating a 1 mA collector current. The collector resistance is further increased up to 150-250  $\Omega$  giving a 150-250 mV output voltage swing to guarantee switching operation overall process and temperature corners [92], and compensating a voltage drop on  $r_e$ . The output voltage swing  $V_{PP}$  should not be further increased for two reasons:

1) It increases unnecessary power consumption. Since power dissipation  $(P_D) = I_C \times V_{CC}$ , for a constant  $I_C$ ,  $V_{CC}$  headroom should also be increased to cover increased output voltage swing.

2) The slew rate of the output switching decreases while increasing the output voltage swing making logic slower.



**Figure 3.13:** Propagation delay  $(t_{delay})$  and rise time  $(t_{rise})$  calculation scheme [13].

For simplicity,  $r_e$  is mentioned only in Fig.3.12, while in other CML circuits it is not shown, although, its influence is always taken into account during simulations. The risetime (t<sub>rise</sub>) of the inverter can be defined as the time needed to switch from the 10 % of low state to reach 90 % of the settled high state (or peak-to-peak V<sub>PP</sub> level). Conversely, the fall time is the time needed to fall from 90 % of the settled high state to reach 10 % of the low state as shown in Fig.3.13, where V<sub>inH</sub> and V<sub>inL</sub> are input differential signals. V<sub>outH</sub> (equals to V<sub>CC</sub>) and V<sub>outL</sub> (equals to V<sub>CC</sub> - V<sub>PP</sub>) are output differential signals. The term **switching-time** is defined by the rise time and fall time of the gate. The propagation delay (t<sub>delay</sub>) is the time needed to response the change in output due to the change in input.

The CML logic circuit (see Fig.3.12) is simulated and the result is shown in Fig. 3.14 where the  $t_{delay} = 3.3$  ps and  $t_{rise} = 5$  ps, and single ended voltage swing (Vpp) is 220 mV. The spike in voltage is also observed at the output.



Figure 3.14: Rise time  $(t_{rise})$  and fall time  $(t_{fall})$  of CML inverter cell.



Figure 3.15: Inverter gate using CML and ECL logic.

#### **3.6.2** Comparison between inverter gates

The CML gate can be converted into ECL gate by adding emitter follower at the output as shown in Fig.3.15. The CML and ECL are simulated and both outputs are plotted in Fig.3.17. Input  $A_P$  and  $A_N$  are differential inputs that are triggered at the time instance  $t_0 \approx 0$  ps. The differential outputs are  $Out_P$  and  $Out_N$  for ECL and CML gates. It shows that switching time  $(t_{rise})$  of the ECL gate is 4 ps which is faster than the CML which has 5 ps rise time. It might be tempting to use ECL instead of CML, but the propagation delay of the ECL is longer than the CML gate. The propagation delay for ECL and CML are 4 ps and 3.38 ps, respectively. Whether to consider switching time  $(t_{rise})$  or propagation delay  $(t_{delay})$  to choose either CML or ECL, depends upon the application of the gate in larger scale. In the circuit design where there is no memory and the same digital input codes or signals never feedback to the input of gates, the fastest switching time (lowest  $t_{delay}$ ) is preferred. On the other hand, in the case of DDS design, there are many digital blocks which are connected together, and several of them are in the



Figure 3.16:  $t_{delay}$  and  $t_{rise}$  comparison between CML and ECL gates.

feedback loop. It suggests that the propagation delay is crucial for overall optimization of the circuit. In other words, the time needed to change is output due to change in input of the gate is a propagation delay, and it determines the maximum data rate at which the gate can operate within a specified bit error rate. Despite having the fastest  $t_{rise}$  time, the propagation delay becomes the limiting factor of this circuit. This demands also consideration of the  $t_{delay}$  performance including  $t_{rise}$ . Comparing ECL and CML technology, CML is preferable considering the both  $t_{delay}$  and  $t_{rise}$  performances. It also improves power consumption since CML consumes less power than ECL. Therefore in most cases, CML logic is exclusively used in the DDS. The ECL gate is used only when the gate has to drive various loads or to buffer two stages or shift the voltage level.

#### 3.6.3 Optimum tail current selection



Figure 3.17: CML and ECL gate  $t_{delay}$  over different  $I_{C}$ .

Up to now, it is considered that the peak  $f_T$  current assures the highest speed of the circuit, which is not entirely true. When the tail collector current is increased up to peak  $f_T$  current (trading speed over power consumption), but peak  $f_T$  current cannot guarantee the maximum operation of the ECL/CML gate. The peak  $f_T$  current indeed is only an indicator of the average speed of the ECL/CML gates. It is because the attainable data rate in high-speed logic is limited by the RC time constant of the real circuits not only by peak  $f_T$  of the transistor used, which is, in fact, an intrinsic measure of transistor alone [93]. Peak  $f_{\rm T}$  itself is an extrapolated frequency for short-circuited unity current gain of the transistor. In practice, the operating tail current is set to 50-70 % of the peak f<sub>T</sub> current since increasing collector current further will not speed up the ECL/CML gates significantly [52], while power consumption increases. When tail current is biased close to a peak f<sub>T</sub> current, any variations in biasing may lead the tail current beyond the peak f<sub>T</sub>, which slows down the transistor significantly. This means increased power consumption and lower speed, both are strongly undesirable. To understand a step further, the ECL/CML gates were simulated with various collector current sources and the logical crossing time is obtained. As discussed earlier, in all cases of current sources the CML is faster than ECL which can be seen in Fig.3.17. It can be inferred from the figure that the peak  $f_T$  of 2 mA does not provide the fastest speed nor propagation delay, which was predicted by the previous simulation (see Fig.3.4 and Fig.3.5). In CML/ECL circuits, 0.8 to 1.3 mA current sources provide the fastest operation. Therefore, the peak  $f_T$  current is not a true measure of the real circuits. From Fig.3.17, it can be seen that the collector current of 1 mA shows the fastest operating logic gate. Thus, in most of our design, the current source of 1 mA is maintained in order to get the highest speed. However, in some special cases when the speed is not crucial, e.g. the non-critical path of the combinational circuits, some cells are allowed to perform slower deliberately. In such cases, the current source of only 0.5 mA is maintained, thus significantly lowering the power consumption, yet not compromising the overall speed. Another hand, where the fastest speed is vital despite significant power consumption, even 1.3 mA current is used. However, this is a special case used cautiously only in TSC block.

#### **3.6.4** Cascoded inverter gate



Figure 3.18: Inverter gate using cascoded CML logic.



Figure 3.19: Inverter gate using cascoded CML logic.



Figure 3.20: Gain versus frequency comparison between with and without casocded CML gates.

In a single level CML gate topology as in Fig.3.12, a spike is observed during switching due to the sudden change in current (see Fig.3.14). Also, the Miller capacitance is more pronounced in this topology and the bandwidth is limited. A cascode stage is added to reduce the Miller capacitance, increase the output impedance and bandwidth (see Fig.3.18). From simulation result (see Fig.3.19), it is clear that the spikes during transitions have been suppressed. The symmetry between rise and fall time introduce a glitch in succeeding logical stages increasing power consumption [94]. However, the propagation delay increased from 3.3 ps to 5.5 ps. This is due to a capacitive node introduced at the output by the extra cascoded common base transistor. However, the Fig.3.20 also shows that the 1 dB compressed gain is increased by 10 GHz (from 20 GHz in normal single level CML inverter to 30 GHz in cascoded CML inverter).



Figure 3.21: Shunt peaking cascoded CML inverter cell.

#### 3.6.5 Inductive peaking

The speed of the CML gate can speed up further by the use of inductor for shunt inductive peaking as shown in Fig.3.21[95]. Shunt inductor L is placed in series with the collector resistance RL. This inductor can be either passive or active inductor. The passive inductor can be designed with high-quality factor micro-strip inductor using high current density metal layer (e.g. TM1/TM2 metal layer in this technology). But there are hundreds of CML gates used in the whole DDS chip so the area required for all inductors is significant and set the difficulties especially for clocks distribution and overall area optimization. On the other hand, the active inductor may relax in the area, but it has nonlinearity and noise problem [96]. Another disadvantage having an inductor in series with the collector load is that there is amplitude peaking near the cut off frequency.



Figure 3.22: Shunt peaking cascoded CML inverter cell.

The simulated CML results without shunt peaking and with shunt peaking for various inductance values are shown in Fig.3.22 for 1 GHz input frequency. The propagation delay is reduced by 2ps. The propagation delays for with 200 pH and without shunt-peaking are 3.5 ps and 5.5 ps respectively. Thus propagation delay is improved by 2 ps.



Figure 3.23: Shunt peaking cascoded CML inverter cell.

For various values of inductance, the cascoded CML inverter is simulated and gain versus bandwidth are also plotted in Fig.3.22. It is evident that with the introduction of 600 pH gain peaking occurs and it is more severe for 1000 pH. This would be problematic when a large number of such CML stages is connected and overall influence causing an extremely high gain and circuit instability. In addition to this, even using an area-efficient inductor, 1000 pH inductor size covers the area ten times the size of the transistor. On the other hand, lower inductance value (100 pH to 200 pH) does help to increase the speed without any peaking and realizable area. Therefore in this design, small 100-200 pH inductances are used as shunt peaking all the CML/ECL gates. This small inductance value is realized with a small transmission line. The simulated result shows (see Fig.3.23) that the propagation delay is only 3.5 ps, comparable to the highest speed topology as described in CML without casocoding. It is one of the fastest CML circuits so far in terms of the propagation delay [97]. Thus both highest speed and wide bandwidth are achieved using shunt peaking cascoded CML inverter.

Once the highest speed CML configuration is designed for inverter logic gate, the same cascoding approach will be used in most of the logic gates. These gates are used to design XOR, AND, DFF and OR gates, where cascoding is achieved by the intrinsic structure of the corresponding circuits. Differential CML is implemented in all the individual circuits including XOR gates, majority gates, and registers respectively. This leads to the fastest logic possible in this technology due to the use of the highest speed CML logic configuration.

#### 3.6.6 XOR gate

XOR's logic output value is high when only one of its inputs is high. Fig.3.24 shows the XOR gate using bipolar devices. Simulations of the circuit in Fig.3.25 show that each transistor exhibits a base-emitter turn-on voltage,  $V_{BE} = 0.88V$ . The supply voltage of 3 V is divided into 3 ×V<sub>BE</sub> = 2.64 V and 0.175V drop across the XOR outputs. The optimum  $I_C = 1$  mA for maximum speed is maintained over all the transistors using a current mirror circuit.



Figure 3.24: XOR gate using a CML.



Figure 3.25: XOR gate simulated result for its functionality test (left) and rise time (right).

#### 3.6.7 AND gate and Majority gate

The conventional adder uses logical AND gates (see Fig.3.26). Its logic output value is high only when both of its inputs are high Simulation results are illustrated in Fig.3.27. However, these gates have an asymmetry problem. The utilization of majority gates relaxes the logical operation. In an adder, one has to use two AND gates which can be realized by only one majority gate (see Fig.3.28). It makes the majority gate faster than employing two AND gates. The majority gate is special since it has 3 inputs and three logical output levels.

Majority gates are the vital logic function in the adder circuit. In the present design, the adder circuit consists of majority gates, which take three inputs: the frequency control word (FCW), the Sum-feedback (FB) and the Carry-in ( $C_{in}$ ) from the previous stage, respectively, and outputs the Carry-out logical high only when at least two of its inputs are high. In the circuit (see Fig.3.29) the FCW's =1 are provided by the input voltage level of 2.9 V with a reference voltage of 2.8V. It means that any FCW input voltage below 2.8 V is considered logic low. For the carry-in bit and sum output feedback, a logical high level of 2 V and a low level of 1.2 V chosen, respectively. The simulation result in Fig.3.29 shows that the output reference of the majority gate is 2.8 V, resulting



Figure 3.26: AND gate using a CML.



Figure 3.27: AND gate simulated result for its functionality test (left) and rise time (right).

in a logical low for any voltage below this level.

#### 3.6.8 Register/memory block

In DDS circuits, there are several data lines which have to be synchronized, especially in the accumulator section. The propagation delays of XOR, EF, majority gates etc. are different. Therefore it is difficult to maintain the delay of each data line identical. Also, the data path delays for various functions in both sequential and combinational circuits might be different. Even if they are equal, in real-layout, there might be the chances that the propagation delays are not the same due to the associated parasitic components. Thus a synchronization of the data lines is important and can be achieved by using register blocks that make sure all the data align. The register block holds the signal that comes earlier than other signals and releases the data only when all the data arrives. The register block can be realized in terms of the master-slave configuration of D-latches [98]. The D-latch (see Fig.3.30) is a flip flop circuit whose truth table is presented in Table 1. It has one data input (D), one clock input (C), two complementing data outputs (Q, Q'). When the clock is low, the hold stage stores the current value of data Q. Other hands,



Figure 3.28: Majority gate simulated result for its functionality test (left) and rise time (right).



Figure 3.29: Majority gate simulated result for its functionality test (left) and rise time (right).

when the clock is high, the input data at the D input propagates to the output Q. The bipolar D-latch is a vital component in the implementations of high-speed systems. The highest speed performance of the CML D-latch is obtained at the expense of the high static power consumption [99]. Here the speed of the D-latch is designed to be beyond 20 GHz making it compatible with other high-speed blocks such as XOR and majority gate.

The D-latch can be realized using a CML gate (see Fig.3.31). A good description of CML logic D-latch and its timing characteristic is presented in [100]. The operation of a D-latch CML is explained as follows. Referring to Fig.3.31, when the clock signal is high (i.e.  $Clk_P$  is high), the transistor T1 is turned on and T2 is turned off, thus all the current

| Clock | Input D | Q                   | Q'                  | Remarks   |
|-------|---------|---------------------|---------------------|-----------|
| Low   | Х       | Q <sub>stored</sub> | $Q_{\text{stored}}$ | No change |
| High  | Low     | Low                 | High                | Reset     |
| High  | High    | High                | Low                 | Set       |

Table 3.1: D-latch Truth table



Figure 3.30: D-latch block diagram.



Figure 3.31: D-latch using a CML (left)

flows through both of transistors T3 and T4, making only sample branch active while no such current flows through hold (latching) branch and transistors T5 and T6 are cut off. Again in the sample branch, the flow of current through T3 or T4 is guided by the high logic in  $D_p$  or  $D_n$ . For instance, if input  $D_p$  is high, T3 is on, T4 is cut off, then input propagates through the OutP with high logic. The input to the hold differential transistor pair T5 and T6 are taken from the output of the differential transistor T2 is turned on, and no current flows through the sample branch and transistors T1, T3 and T4 are cut off. All tail current passes through a hold (latching) branch to the output side that is already at the low logic level, thus preserving the previous output states. The output does not change even if there is a change in input data  $(D_p/D_n)$  until the next positive clock pulse (i.e. when Clk<sub>P</sub> is high) arrives.

The register circuit has implemented CML D-latches using a Master-slave configuration (see Fig.3.32). The two identical D latches are connected as shown in Fig.3.33 to form a register. The first D-latch is called master D-latch and following D-latch is called slave D-latch. When the clock is high, the input to the master D-latch is stored in its hold stage while the slave D-latch cannot change its state. Next, when the clock is low, the value from the hold stage from the master D-latch is transferred into the sample stage of the slave D-latch while the mater D-latch cannot change its state. It means that the data is synchronized as fast as the speed of the clock signal. The result is shown in Fig.3.34,



Figure 3.32: Register using a master-slave D-latch configuration.



Figure 3.33: Register using a CML D-latches using a master-slave configuration.

where 5 GHz sinusoidal input signal is applied to  $D_p$  with 20 GHz clock signal. It clearly shows the working of the register as the input data  $D_p$  is being held high or low at the output Out<sub>P</sub> for half of the input sinusoidal pulse time. A small delay from input clock CLK<sub>P</sub> to output Out<sub>P</sub> signal pulse is due to the delay of individual master and slave D-latches. The ripple in Out<sub>P</sub> can be smoothened by using an output driver stage (not shown here).

#### **3.6.9 OR gate**

The OR logic gate (see Fig.3.35) can also be implemented by using CML logic. It is one of the most commonly used logic in combinational logic. However, the delay in this gate is the largest amongst XOR, AND, and majority gates. The rise time of this gate is 35 ps, at least 20 ps higher than other gate logic. Therefore, in this thesis OR gate functionality is manipulated using Boolean algebra to arrive at other gates such as Majority gate which accounts only 15 ps delay with the same functionality.



Figure 3.34: (Simulated) Register working at 20 GHz clock.



Figure 3.35: OR gate using a CML.

# **3.7** Conclusions from chapter **3**

In this chapter the background of the selection of the fabrication technology, the transistor's important characteristics such I-V curve and  $f_T/f_{MAX}$  are presented. This leads to the choice of the appropriate technology for the circuit design procedure: speed criteria, bandwidth, fan-out, bias point selection, biasing methods, and power consumption. One of the most sought performances of the DDS is clock speed and it greatly depends upon the performance of the digital logic gates. Therefore, an effort was made to achieve one of the fastest CML gate realizations using a cascoded shunt peaking inductor topologies. This shows a propagation delay of only 3.5 ps which is one of the fastest CML gates in SiGe technology. This CML gate becomes the basic building block for other digital logic gates. XOR, Majority gate and Register buffer are designed with optimised delay and reduced power consumption using the same CML cell as a unit cell. The propagation delay and rise time of each gate are summarised in Table3.2. In the following chapters, selection & optimization for the accumulator architecture, DAC, and subsequently TSC circuits explained step by step in detail.



Figure 3.36: OR gate simulated result for its functionality test (left) and rise time (right).

| Logic gate        | <b>Propagation delay</b> $(ps)$ | Rise time (ps) |  |
|-------------------|---------------------------------|----------------|--|
| ECL               | 4                               | 4              |  |
| CML               | 3.3                             | 5              |  |
| Cascoded-CML      | 5.5                             | 8              |  |
| Cascoded-CML with | 3.5                             | 7              |  |
| inductive peaking | 0.0                             |                |  |
| XOR               | 6                               | 11             |  |
| AND               | 5                               | 10             |  |
| Majority gate     | 6                               | 11             |  |
| Register          | 12                              | 25             |  |
| OR                | 18                              | 35             |  |

Table 3.2: Propagation delay and rise time comparisons between different gates

# Chapter 4

# Phase accumulator

The phase accumulator basics are already explained in chapter 2. Here the implementation of phase accumulator is described. The accumulator uses a frequency control word (FCW) as a phase jump tool. The multiple  $\frac{FCW}{2^N}$ , can be a fraction or whole number. But it is always a rational number since accumulator operation is linear in the frequency domain [60]. It also means the FCW is always an integer in decimal points. The accumulator can be realised using a binary accumulator. The binary accumulator is a digital integrator which has an arithmetic function as:

$$S(n) = S(n-1) + FCW$$

$$(4.1)$$

where S(n) and S(n-1) are the N-bit accumulator word status at present clock cycle and previous clock cycle respectively, and FCW is an input control. Such accumulator comprised of an adder and registers. The N-bit frequency registers stores the digital phase information using an N-bit adder and phase registers. The phase register is updated each clock and the data to the register is added subsequently. The accumulator architecture contains mostly adders, registers, and XORs. Voltage shifters, output drivers and buffers, are also necessary for an accumulator, especially when it has to drive the following stage (DAC) in DDS device. The function of output drivers/buffers is to strengthen the signal and reducing any ripple in settled output. The buffer is capable of decoupling the DAC stage. The voltage shifter maintains the appropriate level of a signal between each stage. The basic building block of the accumulator is described step by step and comparisons are made to choose the best possible option. Various accumulator architectures will be discussed, compared, and selected for later realization.

# 4.1 Adder

The adder is a combination of a logic gate which combines the binary inputs to output sum. In terms of ability to accept and combine digital inputs, they are categorized as a half adder, full adder, and multiple bit adder. Their primary function is to add digital inputs but the number of bits in addition and its carry output bit handling approach are different. The basics of digital electronics and usage of adder and register are also presented in books [14],[101],[91].

#### 4.1.1 Half adder

The half adder is the smallest structure of the adder. It can take two binary digits (say A and B) and generates two outputs: Sum output (Sum) and Carry output ( $C_{out}$ ). Truth table contains all the possible output corresponding inputs combinations. The Carry signal is a representation of signal overflow due to addition. The sum function for half adder represented as:

$$Sum = A \oplus B \tag{4.2}$$

And Carry signal  $(C_{out})$  is given by:

$$C_{out} = A \times B \tag{4.3}$$

Fig.4.1 shows the truth table and the logic block diagram of Sum and Cout function using XOR and AND gates. The signs  $\times, +, \text{and} \oplus$  denote multiplication, addition, and XOR operation respectively.



Figure 4.1: Half adder truth table (left) and logic block diagram (right) [14].

#### 4.1.2 Full adder



Figure 4.2: Full adder truth table (left) and logic block diagram (right)[14].

A full adder can add two "one-bit" inputs (operands). Therefore it is called 1-bit adder. The full adder can be derived from a combination of half adder including various logical function such as OR, NAND, XOR gates. These logical functions can be analyzed and synthesized by using a Boolean function. From now on, the full adder referred to as an adder only. In Fig.4.2(left), the truth table for 1-bit adder shows all possible outputs for the all possible input combination. In Fig.4.2(right), an adder consists of a sum of

three 1-bit inputs (A/B/carry<sub>in</sub>) resulting in the carry out (C<sub>out</sub>) and sum output (Sum). At the output, the  $C_{out}$  is the most significant bit (MSB) and Sum is the least significant bit (LSB). Using Karnaugh maps, a simplified expression for the outputs corresponding to input bits can be found [14]. It is important to note that the adder used in the accumulator is a special case of an adder. It has feedback from the sum output to its input for the accumulation as shown in Fig.4.3(right). The register (R) holds the data until the adder finishes the current addition operation. The sum and carry functions are the same in both conventional adder and adder used in the accumulator as shown in Fig.4.3.



Figure 4.3: 1-bit adder (left) and 1-bit adder in accumulator block diagram (right).

The sum bit is 1 (high) when odd numbers of input bits are high.

$$Sum = A \oplus B \oplus C_{in} \tag{4.4}$$

The  $C_{out}$  is high only when more than two inputs out of A/B/Cin are high. The logical reorientation of the truth table is:

$$C_{out} = A \times B + B \times C_{in} + C_{in} \times A \tag{4.5}$$

$$C_{out} = A \times B + C_{in} \times (A \oplus B)$$
(4.6)



Figure 4.4: 1-bit adder accumulator circuit using CML gates.

This logical function allows using a single majority gate for the carry out function rather than to using three ANDS gates for the same function. Utilising CML gates in synthesized functions, a 1-bit adder in the accumulator can be realised as in Fig.4.4. It has sum output feedback to its input and registers are added for synchronisation. FCW is the frequency control word bit, FB is the sum feedback,  $C_{in}$  is an input carry, Cout is output carry, and Sum is sum output. The propagation delay of the CML based XOR, Majority gate, AND, Register, and OR are 6 ps, 6 ps, 5 ps, 12 ps, and 18 ps respectively (see chapter ??/ Table3.2). The C<sub>out</sub> function can be realised according to Boolean expression as in Eqn.4.5 or Eqn.4.6. Firstly, Eqn.4.6 consists of one OR, one XOR, and two AND gates. Therefore the worst-case propagation delay of Cout function is given by:

$$t_{out} = t_{XOR} + t_{AND} + t_{OR} = 29 \text{ ps}$$

$$(4.7)$$

Alternatively, using Eqn.4.5, Cout function can be obtained by using a single majority gate which sets output high only if two of the inputs are high value (as designed in chapter 3). Comparing these two options, the latter one is more suitable since it has only 6 ps propagation delay, thus significantly faster carry operation can be achieved. The 1-bit adder propagation delay ( $T_{A1}$ ) is given by the worst-case propagation delay of the adder circuit which is given by:

$$T_{A1} = \max(t_{sum}, t_{cout}) \tag{4.8}$$

The sum propagation delay  $(t_{sum})$  is given by:

$$t_{sum} = t_{XOR} + t_{XOR} + t_{register} = 6 \text{ ps} + 6 \text{ ps} + 12 \text{ ps} = 24 \text{ ps}$$
 (4.9)

The carry propagation delay  $(t_{sum})$  is given by:

$$t_{cout} = t_{Majority} + t_{register} = 6 \text{ ps} + 12 \text{ ps} = 18 \text{ ps}$$

$$(4.10)$$

Therefore the worst-case propagation delay of the 1-bit full adder  $(T_{A1})$  is 24 ps. This translates into the operating speed of the 1-bit adder accumulator of 41.6 GHz.

## 4.2 Multi-bit adder

Multiple bit adders can deal with an operand of larger size (more than one-bit). It is derived from the half or full adder by using Boolean synthesis. Multiple bit addition can be done by cascading more than one 1-bit adder [102]. There are various methods to achieve a multi-bit adder, among them ripple carry adder (RCA) and carry look-ahead adder (CLA) are widely used. The CLA has faster speed as compared to RCA because it calculates one or more bit before the sum operation while in RCA the carry bit is calculated at the same time with sum bit. As described in [39], for lower bit-adder (<10 bits), the significance of the CLA is negligible. On circuit level, the CLA is more complex and the real layout realization of the CLA is complex. The wire delay due to complex layout routing makes CLA inferior to RCA, at least for lower bits. Therefore, in this accumulator, the entire adder is made of RCA structure.

Multiple adder circuits are cascaded in parallel to form an N-bit RCA. The carry out of each adder is the carry-in of the succeeding adder and this operation occurs at each clock cycle. The carry bit ripples from one LSB adder to following MSB adder. The addition of two bits must be completed before the addition of the next significant bit starts. It means, the sum and carry output bits of any half adder circuit inside the adder are valid only when the carry-input of that circuit occurs. It leads to a small propagation delay inside the logic circuit. Registers are required to synchronize such data lines  $(Sum/C_{out}/C_{in})$ . It makes RCA intrinsically slow. The worst-case full adder propagation delay of N-bit RCA

 $(T_{AN})$  can be expressed as [11],[103]:

$$T_{AN} = (N-1)t_{cout} + t_{sum}$$

$$(4.11)$$

Where  $t_{cout}$  and  $t_{sum}$  are the propagation delay of carry circuit and sum circuit respectively. The operating clock frequency  $F_{clk}$  is given by:

$$F_{clk} = \frac{1}{T_{AN}} \tag{4.12}$$

#### 4.2.1 2-bit adder design

The block diagram of a 2-bit adder and 2-bit adder in accumulator are shown in Fig.4.5 and Fig.4.6 respectively. The 1-bit adder combinational circuit can be derived from the arithmetic sum of three bits, namely, FCW, input carry ( $C_{in}$ ), and sum feedback (FB) and two outputs (Sum) and  $C_{out}$ ). A 2-bit full adder is realized by concatenating the same 1-bit adder as shown in Fig.4.7. For simplicity adder used in accumulator simply referenced as a 2-bit adder. FCW0 and FCW1 are LSB and MSB of frequency control word input. FB0 and FB1 are the feedback from the sum outputs (Sum0 and Sum1) from the lowest adder and highest adder respectively as shown in Table 4.1.



Figure 4.5: 2-bit adder blockdiagram.



Figure 4.6: 2-bit adder accumulator block diagram.



Figure 4.7: 2-bit adder accumulator using CML gates.

Table 4.1: Input equivalency between block diagram and CML of 2-bit adder accumulator

| Signal notation        |                        |  |  |  |
|------------------------|------------------------|--|--|--|
| 2-bitadder accumulator | 2-bitadder accumulator |  |  |  |
| block diagram          | using CML              |  |  |  |
| A0                     | FCW0                   |  |  |  |
| A1                     | FB0 = Sum0             |  |  |  |
| B0                     | FCW1                   |  |  |  |
| B1                     | FB1 = Sum1             |  |  |  |

For "two-bit" size operands addition, implementation of 2-bit adder requires one less register circuit than a 1-bit adder. The 2-bit adder uses 4 XORs, 2 majority, and 3 registers while two 1-bit adder counts 4 XORs, 2 Majority, and 4 registers. The propagation delay of each logical gate, the propagation delay and maximum operating frequency of 1-bit, 2-bit, 4-bit and 12-bit adder are presented in Table 4.2. The fastest frequency of operating can be obtained by using the lowest bit-size adder because of the smaller propagation delay in smaller bit size adders.

# 4.3 Accumulator architecture

The size of the accumulator used in DDS is generally beyond 5 bits [51],[59]. Let's take an example of DDS that requires 12-bit accumulator. It has 12-bit accumulating circuits (adders) for phase accumulation purpose. The accumulating circuit can be composed of any of the number of such 1-bit, 2-bit, 4-bit, and 12-bit adders, respectively.

| N-bit Adder | $\begin{aligned} \mathbf{T_{AN}} &= (\mathbf{N}-1)\mathbf{t_{cout}} + \mathbf{t_{sum}} \\ & (\mathrm{ps}) \end{aligned}$ | $\mathbf{F_{clk}}$ (GHz) |
|-------------|--------------------------------------------------------------------------------------------------------------------------|--------------------------|
| 1-bit       | $t_{sum} = 24$                                                                                                           | 41.6                     |
| 2-bit       | $t_{\rm cout} + t_{\rm sum} = 42$                                                                                        | 23.8                     |
| 3-bit       | $2 \times t_{\rm cout} + t_{\rm sum} = 78$                                                                               | 12.8                     |
| 4-bit       | $3 \times t_{cout} + t_{sum} = 222$                                                                                      | 4.5                      |

Table 4.2: Comparison between N-bit adders

| Anabitaatuma     | Adder  | Nr. of | $T_{AN}$ | Fclk  | Nr. of    | Latency |
|------------------|--------|--------|----------|-------|-----------|---------|
| Arcintecture     | size   | adders | (ps)     | (GHz) | registers | (ps)    |
| Full pipeline    | 1-bit  | 12     | 24       | 41.6  | 132       | 288     |
| Partial pipeline | 2-bit  | 6      | 42       | 23.8  | 60        | 252     |
| Full pipeline    | 4-bit  | 3      | 78       | 12.8  | 24        | 234     |
| Parallel         | 12-bit | 1      | 222      | 4.5   | 0         | 222     |

 Table 4.3: Comparison between different accumulator architectures

The operating speed is defined by the critical delay in the accumulator. This is determined by the worst-case propagation delay of the N-bit adder. Comparison between various accumulator-architectures is presented in Table 4.3. Their number of adder count, registers requirements, operating speed, and latency are compared. The accumulator architecture of three different options (1-bit, 2-bit, and 4-bit) are shown in Fig.4.8.

In parallel architecture, all 12-bits accumulated in a single 12-bit adder accumulator. It uses the least number of the hardware. But its operating speed is limited to only 4.50 GHz. It means the 12 multi-bit additions may not be completed with a one clock cycle (clock cycle of the target frequency 20 GHz) due to the carry bit rippling through the adders [42]. For 12-bit adder, the worst-case propagation delay is 222 ps. Switching latency defined as the time to flush 12-bit data of m input to output. Although the parallel approach has the fastest latency (222ps), and least hardware structure, it is not suitable due to slowest operating speed.



Figure 4.8: 12-bit accumulator using a) 1-bit adders b) 2-bit adders c) 4-bit adders.

One of the profound techniques to reduce the overall propagation delay or increase the speed of operation of the accumulator is using pipeline architecture [102]. It reduces the worst-case propagation delay and also increases the maximum achievable clock frequency. The fully-pipeline architecture uses all 1-bit adders. It has the maximum clock frequency of operation. It is because a sum is an output every clock cycle, in the meantime partially computed sum is stored within the pipeline registers thus exhibits the lowest propagation delay [11]. Fig.4.8(a) is the example of 12-bit accumulator using a full pipeline approach. The switching latency of this fully pipelined adder is the worst due to the large number of buffers required to align all twelve 1-bit data lines. In this example, the critical delay of a 1-bit adder is 24 ps. It implies, ideally, such accumulator can operate up to 41.6 GHz. However, it requires 66 pre-skewing and 66 post-skewing registers for input and

output data line alignment. The propagation delay of the skewing register is deliberately designed slower than the register inside the adder circuit. The propagation delay of the register for larger N-bit adder is made higher so that the register/buffer can align input/output data. For DDS, operating speed is the primary concern, which suggests fully pipeline implementation. However, it has a significant area and power consumption due to the excessive buffer requirement (see Table 4.3 and Fig. 4.8). Also, clock distribution is more complex in such an approach. For example, it needs 36 clock nodes inside 1bit adder accumulator while such count is only 18 in 2-bit adder accumulator. As the number of clock nodes increases, it needs longer clock distribution lines and additional drivers, again introducing an extra area and power burden. The parallel and fully pipeline architectures are two extreme cases. A partial pipeline is a trade-off between maximum operating speed and switching latency/area/power-consumption. Therefore this approach is selected. Again, two alternatives for the partial-pipelines presented: one using 2-bit adder and other using 4-bit adder. The 4-bit adder is compact but considerably slower, due to speed limitation according to the closed-loop gain of the adders [104]. It is limited to only 12.50 GHz operating speed, hence not suitable for our maximum clock frequency goal. The 2-bit adder allows operating the DDS up to 23.8 GHz clock frequency. Even considering wire delays and additional unavoidable parasitic effect (capacitive loads), the targeted 20 GHz operating speed of operation can be achieved.

## 4.4 Pipeline accumulator without pre-skewing register



Figure 4.9: 12-bit accumulator using 2-bit adders without pre skewing register.

The 12-bit accumulator with 2-bit adder consists of 30 pre skewing and 30 post skewing registers. For 12-bit pipeline accumulator, once the accumulator has operated, flushing data from input to output within 6 clock cycle, accumulator output becomes a steady state. From the 7th clock cycle, there is no need to use pre-skewing register anymore, because the accumulator has settled for a phase jump given by FCWs. The pre-skewing registers are eliminated (see Fig.4.9) to save area and power consumption. The removal of pre-skewing registers has an effect on FCW and phase jump relationship. The lower FCWs signal reaches sooner to the 2-bit adder than higher FCWs creating a jump in accumulator value. It means when such accumulator architecture used in DDS,

| Accumulator<br>size<br>(bits) | Frequency<br>points | Frequency<br>resolution<br>@ 20 GHz<br>clock(MHz) | Nr. of<br>2-bit<br>adders | Nr. of<br>registers | Latency<br>(ps) |
|-------------------------------|---------------------|---------------------------------------------------|---------------------------|---------------------|-----------------|
| 6                             | 32                  | 312.5                                             | 3                         | 6                   | 126             |
| 8                             | 128                 | 78.1                                              | 4                         | 12                  | 168             |
| 10                            | 512                 | 19.5                                              | 5                         | 20                  | 210             |
| 12                            | 2048                | 4.9                                               | 6                         | 30                  | 252             |
| 14                            | 8192                | 1.2                                               | 7                         | 42                  | 294             |
| 16                            | 32768               | 0.3                                               | 8                         | 56                  | 336             |

 Table 4.4: Comparison between different accumulator sizes

it does not have continuous phase outputs when FCW input changes. It implies these types of DDS used in frequency modulation transmitter, only frequency shift keying (FSK) modulation is possible because this is the only frequency modulation that does not require a continuous phase.

## 4.5 Accumulator size selection

Here the necessity of exact 12-bit accumulator is described. The number of synthesizable frequency depends mainly on the bit-size of the accumulator. The comparisons between various accumulator sizes based on 2-bit adders in pipeline architecture are presented in Table 4.4. The table shows the number of frequency points, frequency resolution, switching latency, and numbers of components required with respect to accumulator bitsize. Increasing bit size increases the number of output frequency points and the fine resolution. The number of frequency points and resolution are the significant features of the DDS. However, it also increases the number of 2-bit adders and increasing the switching latency, area, and power consumption. The 6-bit accumulator consists only three 2-bit adders and six registers. But it can generate a maximum of 32 frequency points only and the frequency resolution is limited to 312.5 MHz. Other hands, 16-bit accumulator has 32768 numbers of frequency points and 0.3 MHz of the resolution, however, it needs eight 2-bit adders and 56 registers, which makes it worst in terms of area and power consumptions. Larger size accumulator suffers from larger switching latency as well. Therefore, a compromise is needed to limit the size of accumulator to 12-bit such that it can provide a sufficient number of frequency points (2048), switching latency ( $\approx 400 \text{ ps}$ ), and frequency resolution (4.9 MHz), moderate area  $(2 \text{ mm}^2)$  and power consumption (<2 Watts).

#### 4.6 Phase accumulator phase truncation

DDS contains a high number of spurious output components due to the periodic nature of the digital implementation making high SFDR challenging to achieve [42]. The SFDR of the DDS is influenced by three primary building blocks: due to the bit size of the accumulator, DAC, and the quality of the TSC transformation. This is important to note that the SFDR performance of each block has to be similar. When most of the blocks are designed to have a high SFDR value and only one block with poor SFDR, the inferior block will be the bottleneck for the overall SFDR performance. Therefore obtainable SFDRs of all blocks and its limitation on other blocks have to be discussed. For example, the previous accumulator from Fig.4.9 is modified to Fig.4.10. The output bits of the accumulator are truncated from the 12-bits to 7-bits. The main reasons behind are to consider the input bits of the succeeding digital to analog converter block and the comparable SFDR performance.



Figure 4.10: 6-bit phase truncated 12-bit accumulator using 2-bit adders

The phase truncation adds the quantization noise in the phase of the desired signal that degrades the phase noise performance. The higher SFDR limit for K-bit truncated accumulator is given by [105]:

SFDR (accumulator) = 
$$6.02 \times K - 3.992$$
 dBc. (4.13)

The following DAC has 6-bit DAC working beyond 20 GHz frequency of operation. For a similar speed, the 6-bit DAC designed in the same SiGe technology [81]. Higher bit size DAC implementations for such high speed is difficult in this technology. Therefore the DAC size is limited to only 6-bit. The signal to noise ratio (SNR) of the DAC is given by:

SNR 
$$(DAC) = 6.02 \times D + 1.76 \text{ dBc.}$$
 (4.14)

Where D is the amplitude resolution of the DAC. The amplitude resolution is given by the bit size of the DAC. Spurs & quantization noise due to phase truncation; and quantization noise due to finite phase and amplitude resolutions are explained in details in [106]. Furthermore, the design and analysis of the high SFDR in DDS is described in [7]. The dominant spurs caused by the phase truncation can be suppressed by the conventional dithering approach but it introduces frequency offset and also degrades SNR. [7] explains the improved dithering technique to increase SFDR based on frequency compensation to sufficiently suppress the truncated errors with the minimal additional power consumption and SNR degradation, however, the implementation is not in multi-GHz clock accumulator.

To achieve appropriate spectral-noise criterion the worst-case SFDR should be at least one bit higher than the SNR of the digitization [107], [106]. Therefore, the accumulator is
truncated to 7-bits that corresponds to SFDR of 38.15 dBc which is comparable to SNR of the 6-bits DAC (37.88 dBc) and the accumulator architecture is slightly modified to Fig.4.11. This architecture uses five 2-bit adders, two 1-bit adders, and six registers. If the accumulator had all 12-bits outputs that would lead the upper bound SFDR of 68.23 dBc at the expense of 24 extra registers (see Fig.4.8(b), using 2-bit adders) as compared 7-bit truncated accumulator. However, this will not improve the overall SFDR of the DDS because it is limited by lower SFDR value of DAC or TSC. The Taylor series expansion in TSC is limited to 30 dBc of SFDR [52]. An improved TSC mapping achieved 42 dBc [108]. The SFDR of TSC also lies in the 30 to 40 dBc range. Therefore all three blocks are optimized together to achieve overall DDS SFDR performance of 30 to 40 dBc. This is crucial because it gives a designer the flexibility to choose the lower than highest SFDR topology for some block and helps to optimise/reduce the extra circuits and power (e.g. truncating accumulator output bits from 12-bits to 7-bits).

Recalling the DDS block diagram, the MSB output from the accumulator complements increasing phase ramp to decreasing phase ramp thus creating digital triangle signal. Therefore above architecture (Fig.4.10) is modified as in Fig.4.11.4.11. This includes two 1-bit adders as well. One 1-bit adder for FCW0 input. Another 1-bit adder is special because it is used to generate the sum11 output, MSB (correspond to FCW11 input). Since its carry does not have to feed the next adder stage, the carry circuit is omitted. The accumulator input never reaches all its FCWs to all 1s. Because the maximum FCW corresponds to half of the clock which is either FCWs = 1000 0000 000 or FCWs = 0111 1111 111. The latter one is used; allowing FCW11 input to connect ground directly making its logic 0. This eliminates one comparator circuit.



Figure 4.11: 7-bit phase truncated 12-bit accumulator using 1-bit and 2-bit adders.

Here the accumulation process of accumulator is described step by step, firstly only linear increasing ramp, and secondly with complement from the XOR.

#### **4.6.1** Linear increasing ramp accumulation

To illustrate a linear accumulation process one example of accumulator operation is explained as shown in Fig.4.12. It has 7 sum outputs (MSB, sum10, ..., sum5) and it is updated each clock cycle. The sum output values are 0 000 000 initially with a clock frequency of 20 GHz. The highest sum output is referred to as MSB. The FCWs input =

00000 00100000 is fed to the accumulator. This typical FCW input is chosen such a way that it does not change lower 5 sum outputs (sum0 to sum4), which in fact, are already truncated. On the other hand, The FCW input value higher than 1000000 would not change all sum outputs due to the larger phase jump. In this example, for each clock cycle, the accumulator (sum output) is increased one bit by one bit (here considering only top 7 bits sum output), e.g. 0000001, 0000010, 0000011, 0000 100 and so on up to 1111111. The next step (128 clock cycle), would lead to stage 10000000. Note that there is no 8th bit in this 7-bit truncated accumulator output, therefore in this final step the 7-bit accumulator sum outputs reset to all zeros (0 000 0000). This whole accumulation process takes place in  $2^{K} = 2^{7} = 128$  sampling points, i.e. within 128 clock cycle. The same ramp would repeat the same linear increasing ramp in following clock cycles. The actual implementation of the accumulator process in this DDS with increasing and decreasing ramp using an XOR gate a flipping tool is described next.



Figure 4.12: 12-bit accumulator digital ramp (sum output) generation.

# 4.6.2 Accumulation process with flipping

The accumulator is again modified to Fig.4.13, which introduces XOR gates for digital triangle generation. This process is explained as shown in Fig.4.14. The triangular outputs are denoted by "accu" bits. The accu output is the XOR operation between MSB and remaining six sum outputs. Now beginning the accumulation process one bit each step, starting from 000000, 000001, 000010 and so on up to 111111. This takes place in 63 clock cycles. Up to now the MSB which is the 7th bit of the sum output remained low (0). The XOR operation between sum outputs (0 or 1) with MSB (0) results in the same outputs as in sum outputs. Therefore accu outputs are equivalent to sum outputs. Up



Figure 4.13: 7-bit phase truncated 12-bit accumulator using 1-bit and 2-bit adders including XORs for digital triangle generation.

to 63 clock cycle, the linear accumulation can be observed as described previously in fig 13. However, the next clock cycle (i.e. 64 clock cycle), the MSB becomes high (1). Now, this complements (XOR operation) each increasing sum outputs bitwise during each clock cycle. For example, at 64<sup>th</sup> cycle MSB (1) XORed with sum outputs (000000) resulting in accu outputs 111 111. Similarly, at the following clock, accu outputs become 111 1110, and so on for each clock cycles 111 101, 111 100, 111 110... and until 000 000. Note that after accu outputs become 000000, the MSB also becomes 0 due to continuous accumulation process of FCW inputs. This is be beginning of another cycle of triangular output. The amplitude information of the accu outputs for the whole 128 clock cycle appears to be a digital triangle, as desired for DDS operation. After every 128 clock cycles, this process (ramp up and down) repeats, as long as Fclk and FCW input is fixed. However, when the FCW is changed to different value while accumulation still on a process for previous FCW set up, it takes a certain time to flush out the previous accumulation and start with new FCW setup. This time is related to the switching latency of the accumulator and it is less than 300 ps. From another perspective, this a switching speed between output frequencies of the DDS as well.

The lower five bits from the accumulator outputs are truncated which implies their outputs do not influence the changes in phase information. When referred to a 12-bit accumulator, one-bit increment in accumulator output in the above example is, in fact, equivalent to the change in FCW value of 100000 each step. Using equation (1), the output frequency of the digital triangle equivalent to FCW =100000 (32 in decimal) for a clock of 20 GHz is:

$$F_{out} = \frac{FCW}{2^{12}} \times F_{clk} = \frac{32}{4096} \times 20GHz = 156.25 MHz$$
 (4.15)

The 156.25 MHz DDS output for 20 GHz clock can also be represented as  $\frac{F_{clk}}{128}$ . The phase changes are noticed in all top six accumulator outputs for output frequency  $\leq \frac{F_{clk}}{128}$ . This seems a trivial remark; however, this suggests the higher frequency outputs do not have phase changes in each accumulator outputs. The digital triangle has a large jump in amplitude when it is converted into an analog triangle by the 6-bit DAC. And in fact, the 6-bit DAC is not even necessary for higher output frequency since not all six outputs are changing. Increasing beyond 6-bit size DAC does not improve SFDR of this TSC based



Figure 4.14: 7-bit phase truncated 12-bit accumulator using 1-bit and 2-bit adders including XORs for digital triangle generation.

DDS for higher output frequencies  $F_{out} > \frac{F_{clk}}{128}$ . But lower output frequency has more than six sum outputs change, thus increasing DAC bit-size and accumulator outputs bits would increase the SFDR of the DDS unless it is limited by the lower SFDR of the TSC. This also means that to get higher SFDR using a TSC based ROM less DDS, increased clock frequency, wider DAC size, and improved TSC are necessary (not mentioning their design challenges). While in DDS operation, the DDS output frequency should be operated with a smaller fraction of the clock frequency. When the output frequency approaches to ideal maximum output frequency (close to half of the clock), the phase changing accumulator outputs are reduced and thus it suffers from a poor SFDR despite increased DAC size, accumulator outputs bits, and improved TSC. It will be further discussed in the chapter 6 (Triangle to sine wave converter) and chapter 7 (DDS simulation and characterization).

# 4.7 Phase control word unit in accumulator

The input of each adder in the accumulator is called as frequency control word since it's either high (1) or low value (0) dictate the output frequency for a given clock frequency. For input FCWs, the pre-skewing register is removed to save area and power [49] [8]. Thus, an ideal continuous phase transition is not possible between the two output frequencies. This limits the continuous phase modulation capability of the DDS. The addition of phase control word (PCW) unit (see Fig.4.15) allows tuning the phase of the output frequency, independently with the FCW input control, enabling phase modulation capabilities. The 6-bits phase control word is made from three 2-bit adders which are connected to outputs of the three 2-bit adders from the FCW unit as shown in Figure. Both the FCW and PCW



Figure 4.15: Final 12-bit accumulator including PCW.

control unit uses the same post-skewing registers while pre-skewing register used for PCW control unit only. This additional phase control unit can change the phase of the output frequency from 0° to 180° instantaneously. Finally, buffers are added at each accumulator output to smoothen the signals. Buffer is composed of delay gate with emitter follower stage for driving capability.

# 4.8 Clock tree for synchronization

H-shaped clock tree provides minimum clock skew and robustness in terms of manufacturing variability or reliability mechanisms [109]. It is used to ensure equal timing of the clock signal for each register used in the 1-bit adder, 2-bit adders, and buffers. The three-level clock trees are implemented as in Fig. 4.16. At least 16 nodes are required to synchronize the clock inside all blocks of the accumulator. The main clock signal is first fed into a primary buffer and then spread into four different secondary buffers which are in equidistance locations. These buffers increase the fan out up to 16 nodes. Each node has a synchronised clock which can be used to clock two blocks which are close to each other and have the equal path in layout. This structure is sufficient to feed all adder blocks and skewing registers. In the layout, each path is carefully connected with same high current density metal layers, TopMetal1 (TM1) and TopMetal2 (TM2) and the number of vias to the respective node. Note that the H-tree shape acts on the real layout structure of the accumulator. The transmission line lengths for each level are optimised to be compact



Figure 4.16: H-tree block diagram (left) and its layout realisation (right).

yet sufficient to accommodate all blocks.

# 4.9 Accumulator simulation

In this section, the simulated result of the accumulator is presented. The accumulator presented in Fig.4.15 is simulated with the clock frequency of 20 GHz and FCW = 00000010 0000 using a transient analysis as shown in Fig.4.17. The six accumulator outputs with high (3.48 V) and low (3.15 V) voltage levels can be observed. The lowest accumulator output bit (accu5) has the shortest time duration of each pulse. This time duration doubles in the next higher accumulator output (accu6) and so on up to the highest accumulator output (accu10). Each pulse has 50 % duty cycle in all cases. Therefore, all the six voltage levels can be summed at a time instance and then plotted in the time domain that results in a triangular waveform (depicted by the red triangle line in Fig.4.15). It verifies the correct performance of the accumulator the frequency of this triangle is 156.25 MHz as calculated before. This simulation (transient time = 11 ns) takes approximately 30 minutes in real-time (with a moderate resolution set up in Cadence (Virtuoso Schematic Editor L/XL environment). The finest resolution set up in Cadence would increase the real simulation time drastically. To calculate fine frequency resolution in the frequency domain spectrum from the time domain plot of the signal, the number of samples of the signal should large enough. It provides the fast Fourier transform (FFT) to calculate frequency contents with a fine resolution. This means transient simulation time should be increased to accommodate a sufficient number of samples in the time domain. Note that longer transient simulation time increases the total real simulation time accordingly. Therefore especially for lower frequency (< 500 MHz) takes a considerable amount of time if a large number of samples (>10) needed to be calculated. Further simulations will be carried out including DAC and TSC in the following sections.



Figure 4.17: Simulation result of six most significant bits from the accumulator for a clock of 20 GHz and FCW =  $0000\ 0010\ 0000$ .

# 4.10 Conclusions from chapter 4

In this chapter, the phase accumulator design is discussed. The architecture of the accumulator is explained step by step and the optimized accumulator is selected. The complete accumulator (12-bit frequency control unit and 6-bit phase control unit) has an active area of  $1.21 \text{ mm} \times 0.73 \text{ mm}$  and power consumption of the 1.15 W. It can generate up to 2048 frequency outputs. It has the worst case SFDR of 38 dBc and clock frequency of operation up to 20 GHz. The comparison of published GHz clock accumulator for DDS application is presented in Table 4.5. This work has one of the best figure of merit that considers the clock frequency, total number of bits (N), truncated output bits (K), and power consumption. The highest clock accumulator is dominated by the InP process. This work and [8] are designed in the same technology and have similar performances. From an architecture point of view, both accumulators are same, however here 2-bit adders have been implemented while [8] used all 1-bit adders. This accumulator has slightly higher power consumption because it has used two lines of the output drivers to drive the following DAC in the DDS. If these two drivers were taken out of the account, the total power consumption would remain below 1.1 W. Following this accumulator, the next important block (DAC) is explained in the next chapter.

| Accu.<br>work | Technology | Total<br>bit<br>(N) | Output<br>bit<br>(K) | <b>F</b> <sub>clk</sub><br>(GHz) | Power consumption<br>whole DDS<br>(W) | $\frac{\textbf{FOM}}{\frac{\textbf{F}_{clk} \times \textbf{N} \times \textbf{K}}{\textbf{Power}}}$ $\left(\frac{\text{GHz.bit}^2}{\text{W}}\right)$ |
|---------------|------------|---------------------|----------------------|----------------------------------|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| [110]         | InP        | 4                   | 4                    | 41                               | 4.1                                   | 160                                                                                                                                                 |
| [3]           | InP        | 8                   | 5                    | 32                               | 4.9                                   | 261                                                                                                                                                 |
| [8]           | SiGe:C     | 12                  | 7                    | 20                               | 1.08                                  | 1557                                                                                                                                                |
| [74]          | InP        | 8                   | 6                    | 13                               | 2.13                                  | 293                                                                                                                                                 |
| [52]          | SiGe:C     | 9                   | 8                    | 12                               | 0.825                                 | 1047                                                                                                                                                |
| [51]          | SiGe:C     | 9                   | 8                    | 11                               | 0.825                                 | 960                                                                                                                                                 |
| [111]         | SiGe       | 8                   | 8                    | 5                                | 0.495                                 | 646                                                                                                                                                 |
| This<br>work  | SiGe:C     | 12                  | 7                    | 20                               | 1.23                                  | 1461                                                                                                                                                |

Table 4.5: Comparison of recently published accumulators for GHz clock DDS applications

# Chapter 5

# Digital to analog converter

The digital to analog converter (DAC) converts the digital signal into analog current or voltage signals. In today's radar & communication system, both analog and digital signals used to achieve the high-speed data rate where DAC plays a significant role in such mixed-signal circuits [81]. The DAC is an indispensable part of the DDS as well. The SFDR of the DDS strongly depends upon DAC performance [59], [5]. Major performance parameters of the DAC are integral nonlinearity (INL), dynamic non-linearity (DNL), data-rate, amplitude resolution, monotonicity, SFDR, and glitch-amplitude. In general, these parameters depend upon output & load impedances of the DAC, the code-dependent settling time constants, code-dependent switch feedthrough, timing skew between current sources, major carry glitch, current source switching, on-chip passive analog components, and current mismatches. At higher frequency, increasing the bit size of the DAC only reduces the quantization noise but not the SFDR. Other hands, the dynamic nonlinearities increase in magnitude for increasing signal frequency. It is because of the change in output values is faster and a large proportion of the clock cycle is occupied by the nonlinear switching transients [112]. The switching speed (of a current switch) is proportional to the speed of the technology itself and but it can not guarantee the overall speed of the DAC. Irrespective of the technology, the highest speed can be different for different DAC architecture. Most of the high-speed DAC uses a current-steering DAC architecture because of its high sample rate, the efficiency of driving resistive loads and also due to low power consumption [113]. Thus high-speed DAC is a combination of highspeed technology, relevant architecture, optimized clock synchronization, precise current matching, and all other consideration for minimizing nonlinearities. The record speed of the measured DAC in SiGe technology is 6-bit 56 GHz DAC [114]. There are several GHz speed DACs are reported [81], [115], [116]. In this section, DAC performance matrices, various DAC architectures, their relation to a high-speed application, and optimization for suitable DAC required for the DDS will be discussed.

# 5.1 DAC performance parameters

### 5.1.1 Integral nonlinearity (INL)

INL is the performance metrics in the digital domain that describe the static linearity of a DAC. It is the deviation from the ideal straight line of input-output DAC relationship where the gain and offset errors are removed. A straight line can be defined as the endpoints of the converter's transfer response [15]. This method is used to calculate INL in DACs. Alternatively, the best fit straight line to the actual transfer characteristic can also be used as a reference [117]. For example, Fig.5.1(left) shows the transfer characteristics of the 3-bit DAC for INL error. The gain and offset error are adjusted to zero for the simplification. One LSB output is the change in output level due to change in one input bit of the DAC. In an ideal case, endpoints of output analog values for zero (000) and full scale (111) should be straight line making perfect linear input to output response. In non-ideal case, the input to output relationship is not perfectly linear but somehow deviated from the straight line (see Fig. 5.1(right)). For instance, at digital input code 110, the actual measured output is higher than the ideal line (represented by a straight line). The difference in amplitude value (X) normalized to LSB value is the INL at 110. Using the same approach, INL is calculated at each digital input code. INL is a representation of the element matching for overall DAC and is independent of DAC architecture.



Figure 5.1: Static transfer characteristics of 3-bit DAC for without INL error (left) and with INL error (right) [15].

## 5.1.2 Differential nonlinearity (DNL)

For any digital input, the DNL is the difference between its output amplitude step and ideal 1 LSB output amplitude value. Ideally, the difference is zero, hence the DNL is zero at that input point. DNL is normalized to the amplitude of the one LSB output amplitude. As ideal DAC in Fig.5.2(left), all step width amplitudes are 1 LSB, DNL is zero for each digital input code. Other hands, in non-ideal DAC Fig.5.2(right), e.g. at 100 digital input code, the step width is more than 1 LSB, therefore the surplus amplitude normalized to 1 LSB value is considered as a DNL at 100. Using the same approach, DNL is calculated at each digital input code.



Figure 5.2: Static transfer characteristics of 3-bit DAC for without DNL error (left) and with DNL error (right) [15].

### 5.1.3 Monotonicity

Monotonicity is guaranteed if |INL| < 0.5 LSB and |DNL| < 1 LSB. It means the DAC output always increases as the input code increases. It depends upon the accuracy of elements (resistors, capacitors or current source) matching.

## 5.1.4 SFDR and glitch

SFDR and glitch and are the dynamic property of the DAC. The glitch is an extra spike in voltage (or current) than ideal voltage (or current) value. In this DAC, a glitch is a primary factor for SFDR degradation at a higher frequency, which occurs mainly when more number of switches turned on at the same time. It is described in detail later in this section.

# **5.2 DAC architecture**

The definition of the DAC architectures are based on A) the mode of implementation of logic carriers (e.g. current, voltage, and charge) in DAC and B) the digital bits coding. These two architectures are not mutually exclusive. For example, the current steering DAC is a current mode DAC but also the currents are coded in binary-weighted form. The same current-steering DAC can be designed in the thermometer coded version. There is no single DAC architecture that is suitable for all DAC applications. Therefore various DAC architectures based on implementation mode and digital coding will be described briefly for the better understanding of their applicability, advantages, and disadvantages. It helps into the selection of a DAC for the DDS.

#### 5.2.1 DAC architecture based on implementation mode

This DAC architecture based logic carriers are namely, voltage, charge distribution, and current. In voltage mode DAC, the unit element levels of the voltage mode DAC outputs are given by the voltage values. In charge distribution DACs, such unit elements levels are represented by the capacitor values. In current mode DACs, unit elements are given by the current values, for example, such current levels are represented by the current source or the current divided by the resistor string to yields sub-currents. Here four examples of DAC based on different implementation modes are briefly explained.

#### **Resistor string DAC**



Figure 5.3: 3-bit resistor string DAC (voltage mode) [16].

This DAC consists of a resistor as a voltage divider to provide output levels. Using equally valued resistors, outputs are taped from each node of the resistors. This is simple in implementation and inherently monotonic but not suitable for large bits due to an increased number of resistors  $(2^N)$  and  $2^N$  different values of a resistor. Thus it has high settling time and speed limitation. An example of 3-bit voltage mode resistor string DAC is shown in Fig.5.3.

#### Binary weighted resistor string

The resistor string DAC suffers from a large number of resistor and area for larger bits DAC. The binary-weighted resistor DAC relaxes the number of a resistor by using a binary form of a resistor string. However, the element matching is difficult for larger bits. Resistors are ratioed with the multiple of two. Thus, many unequal resistor sizes increase element mismatching. Fig.5.4 is an example of 3-bit voltage-mode binary weighted resistor string DAC. For 3-bit binary-coded digital word, the output voltage is given by:

$$V_{out} = V_{ref}(b0^2 + b1 \times 2^1 + b0 \times 2^0)$$
(5.1)

 $V_{ref}$  is the reference voltage. Digital inputs b0, b1, and b2 are related to LSB0, LSB1, and MSB bits respectively. Their values are either 1 or 0.



Figure 5.4: 3-bit voltage mode binary weighted resistor string DAC (voltage mode) [17].

#### **R-2R** ladder DAC

The R-2R ladder DAC is a binary-weighted DAC which is composed of only two different values of a resistor (R and 2R). It is an advantage for element matching as compared to binary-weighted resistor string. On the downside, it has LSB to MSB propagation delay that increases as the bits of the DAC increases. Therefore it is not suitable for high-speed DAC for a larger bit size. The voltage mode 3-bit R-2R ladder DAC is shown in Fig.5.5.

#### Charge distribution DAC

This type of DAC uses the charged stored in various scaled values of capacitors. It converts a reference voltage using the binary-weighted capacitors strings. The 3-bit binary-weighted charge distribution capacitor is shown in Fig.5.6. Scaled capacitor matching, switch on-resistance and finite amplifier bandwidth are a disadvantage of such DACs and not uses for high-speed DACs. In addition to this, capacitors cover a comparatively larger area than a resistor in integrated circuits.

#### **Current steering DAC**

The resistive voltage divider DACs and charge distribution DAC suffer from a large RC delay thus not suitable for high-speed operation. They also need a high speed and very linear amplifier to drive a resistive output load (50  $\Omega$ ) for all inputs codes. Current-steering DACs have an advantage of high current drive inherent in the system. This DAC is preferred in high-speed applications because output buffers are not needed to drive resistive load [118]. In Fig.5.7, binary-weighted current sources are used along with switches for current steering. Each bit is weighted by 2<sup>0</sup>, 2<sup>1</sup>, and 2<sup>2</sup>. Therefore the total output current for this 3-bit DC is given by:

$$I_{out} = b0 \times 2^{0}I + b1 \times 2^{1}I + b2 \times 2^{2}I$$
(5.2)



Figure 5.5: 3-bit R-2R ladder DAC (voltage mode) [17].

Each switch is controlled by input code. The current from each current source is directed to the load or ground depending upon the value of digital input. Various current source levels are formulated using a simple current mirror circuit. For example, in bipolar technology, multiple currents are generated by increasing the emitter numbers in amplifying transistor or adding more parallel transistor to the amplifying transistor of the current mirror. This DAC has high power efficiency since only small power is dissipated to the small resistive load at the summing output.

### 5.2.2 DAC architecture based on coding

There are generally three types of DACs based on a coding approach used for the input and output values. They are binary weighted, thermometer coded (segmented), and partially segmented. The binary-weighted and thermometer coded coding for 3-bit DAC are presented in a Table 5.1. The unit element can be of any mode (current source, resistor, or charge distribution).

#### **Binary weighted DACs**

Binary weighted DACs use the binary-weighted circuits (resistor, current, capacitors etc.). Each DAC output levels are weighted by the value of  $2^1, 2^2, \dots 2^N$ ; where N is the number of DAC bits. It requires only N number of switches or reference elements for N-bit DAC, thus it is area and power efficient. As in Table 1, for 3-bit DAC only 3 switching elements are required. The binary-weighted of elements (current source, resistor or capacitors) need precision in their values which is challenging due to fabrication or circuit design aspects. The monotonicity might not be guaranteed since the neighbouring digital codes are translated with a disparate set of elements into analog value. Also, the



Figure 5.6: 3-bit Charge distribution binary weighted DAC (simplified) [16].



Figure 5.7: 3-bit binary weighted current steering DAC (current mode)[16].

DNL value is larger when large numbers of switching elements are turned on or off at once. The glitch is severe when there is a major carry transition [119]. For example in 3-bit DAC (see Table5.1), from 011 to 100 transitions, all three switching states are changed at once. If the MSB is faster than rest switches, all three switches are turned on momentarily surging spike of current, thus generating a glitch. This happened when there is a transition from 100 to 011 as well, generating a falling glitch. The glitch more pronounced when there are larger numbers of bits used for DAC using the binary-weighted approach.

#### Segmented DACs

Segmented DACs recodes the digital values into the thermometer-code equivalent. It consists of a  $2^{N} - 1$  number of unit elements used for N-bit DAC. For example, for 3-bit DAC, it requires seven reference elements. It is inherently monotonic. It is because,

| Decimal (D) | $\begin{array}{c} \textbf{Binary} \\ \textbf{weighted} \\ (B_2B_1B_0) \end{array}$ | $\begin{array}{c} \textbf{Thermometer} \\ \textbf{coded} \\ (T_7T_6T_5T_4T_3T_2T_1T_0) \end{array}$ |  |  |
|-------------|------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|--|--|
| 0           | 000                                                                                |                                                                                                     |  |  |
| 1           | 001                                                                                | 0 0 0 0 0 0 0 1                                                                                     |  |  |
| 2           | 010                                                                                | $0\ 0\ 0\ 0\ 0\ 1\ 1$                                                                               |  |  |
| 3           | 011                                                                                | $0\ 0\ 0\ 0\ 1\ 1\ 1$                                                                               |  |  |
| 4           | 100                                                                                | 0 0 0 1 1 1 1                                                                                       |  |  |
| 5           | 101                                                                                | 0 0 1 1 1 1 1                                                                                       |  |  |
| 6           | 110                                                                                | 0 1 1 1 1 1 1                                                                                       |  |  |
| 7           | 1 1 1                                                                              | 1111111                                                                                             |  |  |

 Table 5.1: Partially segmented 6-bit DAC combinations

for increasing ramp input, it always adds one reference element increasing one LSB each step. It guarantees the monotonicity. For example, even during a most significant carry generation for 3-bit DAC, i.e. 011 to 100 in binary DAC, it corresponds to the transition from 0000111 to 0001111. Therefore, a severe glitch is reduced as compared to binary DAC. Other hands, each reference element (unit cell) is identical, unlike binary DAC. It is easier to match identical unit cell than various size and value elements in binaryweighted DAC. Thus, better INL/DNL performances can be obtained. The downside of this type of DAC is the increased area and power consumption for larger bits, and additional driver & encoding circuit requirements.

#### Partially segmented DACs

The thermometer coded DACs have a large area consumption and complex routing problem due to  $2^{N} - 1$  number of unit elements requirement for N-bit DAC. The matching elements for LSBs are more relaxed because of their contribution to the final output value is at least factor of two lower than MSBs. The partially segmented approach is a mixture of binary-weighted and segmented codes. The lower bits can be derived from the binaryweighted code, while higher bits are derived from the fully segmented thermometer codes. A smaller area means a decrease in footprint in real-layout. It implies less parasitic capacitance to drive that enhances high-speed operation. Therefore an optimization is needed to choose the optimum of binary-weighted and segmented DAC combination.

# 5.3 Investigation for optimum partially segmented DAC

The partially segmented DACs have a compromise between area, power consummation, complexity, static performance and dynamic performance. The partial bits that use binary array still suffer from the glitch at major carry transition and poor static errors. A thorough investigation is needed to optimize the best suitable combination of binary-weighted bits and segmented bits.



Figure 5.8: DAC LSB0 switch using a CML cascoded inverter.

#### 5.3.1 DAC switches for partially segmented DAC

DAC switches (SWs) are the fundamental and one of the most important cells of the DAC. Only one SW unit cell for all bits of the DAC is easy to maintain, replicate, synchronize, and optimize. This helps to improve the linearity of the DAC. Other hands, non-linear DAC or DAC which uses weighted current sources face challenges on ideal current source optimizations. Realizing high-speed SWs with various current source levels are also possible, but difficult. The requirement of current scaling can increase the mismatch between SWs in the binary-weighted current steering method.

In our technology, binary-weighted current source is challenging for lower speed and not possible for higher speed ( $f_T > 165$  GHz). It is because a various number of emitter fingers in transistor should be implemented to maintain the same high speed between switches while keeping their current ratios. But the numbers of possible emitter fingers are limited due to fabrication techniques and cannot be arbitrarily increased as the current source level increases. In this technology, the fastest transistor has  $f_T > 165$  GHz for the collector current of 0.3 mA to 3mA (see Chapter3/Fig.3.5 and Fig.3.5) for emitter finger 1 to 2. Beyond 4 emitter fingers, the  $f_T$  shifts even further away from 0.9mA-3mA collector current range. For instance, multiple current ratios of 1:2:4 cannot be maintained with the same high speed ( $f_T > 165$  GHz) using this technology. It is because the 1:2:4 ratios would require switches having 0.9mA, 1.8mA, 3.6mA current sources. The 3.6 mA current source SW would be slower due to lower  $f_T$  (< 150 GHz) as compared to other current sources SWs. The mismatch increases the code dependent glitches [81]. Therefore it is challenging to use one finger and multiple fingers transistor as multiple current switches because of their different size and dynamic behaviour. It creates a problem in layout for data alignment, especially when no synchronization circuits are intended to save area and power, as in this DAC. Alternatively, multiple current sources can be used to supply larger current values, however, it will be less compact, needs additional routing, and introduce more mismatch between current sources.

In the R-2R and segmented combination, only two different current sources are sufficient to realize LSB and MSB switches. Therefore LSB SW and MSB SW are designed



Figure 5.9: Fall time of DAC LSB switch and MSB switch.

to operate with the collector current of 1 mA and 2mA respectively, which corresponds to fT > 165 GHz. Therefore the problem in binary-weighted current steering approach for multiple current sources will not be an issue. The current mode R-2R ladder is combined with segmented current mode thermometer (segmented) DAC. The cascoded CML inverter gate (see Fig.5.8) is used as a DAC unit element for lower bits used for R-2R ladder networks. It is important to note that the cascode current source is a simple yet effective technique to increase the output impedance of the DAC which is related to SFDR of the DAC [39]. It has a unit current source of Is ( $\sim 1$ mA), provided by the current mirror circuit (not shown here). The unit current value is a reference current for the DAC output levels. The speed of the differential pair circuits depends upon the RC time constant given by the resistive load and collector to base intrinsic capacitance  $(C_{bc})$  of the driving stage transistor, capacitance associated with the next stage and interconnection parasitic [120]. The switching speed of this DAC cell alone is comparable to CML inverter discussed in chapter 3. However, resistive load (2R) here is 50  $\Omega$  and the inductive peaking increased due to extended inductive line (in the form of the transmission line). The transmission line is not only an inductance at each collector node but also used as a current combiner for all the DAC cells outputs. The speed of the LSB cell is shown in Fig.5.9. The switching speed (90 % -10 % fall time) of the LSB SW cell is 3 ps while the propagation delay is 6.2 ps. The same DAC switch using a CML cascoded inverter with slightly shorter inducting peaking accounted for the propagation of 7ps (refer Chapter 3/Table3.2). The eye-diagram of the LSB SW is shown in Fig.5.10(left). It shows the bit-rate of 88 Gb/s for LSB SW.

The current source value of LSBs are Is and the DAC output levels are set by the R-2R network in a binary fashion. However, from the R-2R network to segmented transition, the unit current source should be doubled in the MSB current source. In the segmented portion, each increasing input steps steers 2 Is ( $\approx 2$ mA) current to the DAC output. Therefore all MSB switches consist of the same 2mA value of current sources.

MSB switch is realized using the same inverter casocoded CML topology but with open collector output (see Fig.5.11). However, the emitter finger of the transistor for the doubled since it withstands 2mA current and also operates near to peak  $f_T$  of the two



Figure 5.10: Eye-diagram of LSB switch (left) and LSB switch (right).



Figure 5.11: DAC LSB0 switch using a CML cascoded inverter.

emitter fingers transistor. Using only a single emitter finger transistor at 2mA collector current means it is already at exactly peak  $f_T$  current level. It is not recommended since any slight change in biasing may shift the actual current level beyond 2mA. Hence single finger transistor in MSB switch is not suitable. The speed, input/output impedance of the single and multi-finger transistors are different. Their values are optimized for the final current steering exactly follows the ideal DAC input-output response. It means the biasing and emitter degeneration can be different between LSB and MSB switches. The simulation shows that at least the MSB with two emitter finger transistors and LSB with single emitter finger transistor have the same switching speed (see Fig.5.9). It is important since the difference in switching time (or rise/fall time) between switches degrades the SFDR of the DAC. The switching speed of the MSB SW cell is 3.2 ps while the propagation delay is 6.8 ps. The difference in propagation delay between LSB and MSB SWs is only 0.6ps. The eye-diagram of the MSB SW is shown in Fig.5.10(right). It shows that bit-rate of 88 Gb/s, similar to LSB SW.

Generally, the contribution of the current summation of MSBs is significantly higher than LSBs in partially segmented DACs. Finally, all the current from the LSBs and MSBs are combined. The 50  $\Omega$  load at the DAC output generates the required voltage level of the analog triangular output. The load resistance and the EF provides a sufficient voltage swing and voltage offset to the following TSC input.

### 5.3.2 R-2R and segmented combinations for 6-bit DAC

The 6-bit partially segmented DAC can be formed by using any combination as shown in Table5.2. The R-2R has the advantage of being a small area, low power consumption, less parasitic footprint, and simple routing. The small area advantages for high-speed operation in such DAC are reduced if a larger number of bits is used. Therefore 2-bit to 4-bit R-2R DAC is tested with the segmented DAC. Other hands, despite having perfect linearity and monotonicity, segmented DAC consume significant area and power, thus limited bit-size. For simplicity, 4-bit R-2R and 2-bit segmented DAC are referred to as 4R-2S DAC. The 4R stands for 4-bit R-2R network, and 2S stands for 2-bit segmented portions and so on for rest DACs. All three possible DAC circuits are shown in Fig.5.12, Fig.5.13, and Fig.5.14.

Table 5.2: Partially segmented 6-bit DAC combinations

| R-2R   | Segmented | DAC nomenclature | Nr. of SWs |
|--------|-----------|------------------|------------|
| 4 bits | 2 bits    | 4R-2S DAC        | 4+3 = 7    |
| 3 bits | 3 bits    | 3R-3S DAC        | 3+7 = 10   |
| 2 bits | 4 bits    | 2R-4S DAC        | 2+15 = 17  |



Figure 5.12: Schematic of 4-bit R-2R and 2-bit segmented DAC (4R-2S DAC).



Figure 5.13: Schematic of 3-bit R-2R and 3-bit segmented DAC (3R-3S DAC).



Figure 5.14: Schematic of 2-bit R-2R and 4-bit segmented DAC (2R-4S DAC).

### 5.3.3 Voltage level for switches



Figure 5.15: Voltage level for switch in DAC.

The accumulator output (Accu Outs) has voltage levels 3.48 V for high logic and 3.15 V for low logic level (330 mV peak to peak swing). DAC inputs are taken from 6 bits of outputs from the previous accumulator. But the appropriate input voltage levels of the switch are much lower that is depicted in Fig.5.15. It shows that it needs at least  $1.4V + V_{BE}$  for steering the current on its branch. This accounts > 2.3 V for high input logic. However, when both the upper level (3.5 V) and lower level (3.15 V) are above this threshold (2.3 V), there is a problem to distinguish a logic. An extra emitter follower is

placed before each switch. It lowers the voltage levels from the Accu Outs to 2.58 V and 2.25 V respectively that are high and low logic with respect to 2.3V threshold. These emitter followers also improve the driving capabilities of each Accu Outs. Note that if EFs were placed before combining transmission lines, it would need only six EFs. In that case, the most significant Accu Out 5 has to drive four switches. Therefore an individual EF for each SW helps to drive the load more effectively.

# 5.4 6-bit partially segmented DAC design

All three 6-bit DAC combinations mentioned previously are modified by adding an extra emitter follower before each SWs to serve the accumulator outputs. Inputs of the DAC are fed by the signals coming out (Accu out) from the previous Accumulator. The lower input bits (LSBs) are fed directly to respective lower accumulator outputs. Higher DAC inputs (MSBs) related to segmented DAC are combined with respective binary-weighted current sources. These are fed by higher accumulator bits. Next, all three DACs are designed and simulated individually to select one of them for the linear DAC implementation in the DDS.

### 5.4.1 4-bit R-2R and 2-bit segmented DAC (4R-2S DAC)

The schematic representation of 4-bit R-2R and 2-bit segmented DAC was presented in Fig.5.12. The R-2R network section is formed by the lower three LSBs. Each LSB is composed of one emitter follower (EF) and one switch (SW). These LSBs are fed by the lower accumulator outputs Accu Out0, Accu Out1, and Accu Out2 and Accu Out3 respectively. The R-2R based DAC does not have a fan-out problem because accu output has to drive only one switch. The segmented section has MSB1 and MSB0 switches. The MSB1 has two EFs and two SWs, MSB0 has one EF and one SW (see Fig.5.16). These MSBs switches are fed by the respective higher accumulator outputs (Accu Out 4, and Accu Out 5). The output from LSB2 and outputs from MSBs are combined to yield final DAC output.

The resistor R and 2R scales the binary-weighted voltage output level for the R-2R network section of the DAC. The value of R (and 2R according to R) cannot be chosen randomly. If the value of R is too low, this creates a problem in switching behaviour of the SW as the output voltage may not be sufficient to fully turn on the switch. If the value of R is too large, the full-scale DAC output becomes large, which indeed increase peak to peak voltage swing triangle output. Thus the voltage drop is necessary to maintain appropriate voltage swing required by the following TSC input. For example, the full-scale output voltage swing for 4R-2S DAC for  $R = 12 \Omega$  and  $R = 60 \Omega$  are 120 mV and 600 mV respectively. The following TSC input that is connected to DAC output, which demands input voltage swing from 200 mV to 500 mV. The value of  $R = 60 \Omega$  can be also be used but requires extra potential divider to decrease the DAC full-scale swing. Even  $R = 12 \Omega$  V can be used yielding 120 mV, but a gain amplifier should be applied to increase the swing over 200 mV.

After the various simulation, it found out that  $R = 25 \Omega$  (and  $2R = 50 \Omega$ ) generates sufficient voltage output level (240 mV) from the DAC that suits TSC input voltage level



Figure 5.16: 4R-2S DAC complete block diagram.

and also shows good linearity. It requires neither extra potential divider nor a gain amplifier. In a practical case, however, the R and 2R fraction may not be the exact 1:2. It is because the switch can have different gain due to slightly mismatched current sources. Also, each switch experiences different load at the output despite being a similar stage. The dynamic performances of a current steering SW based DAC are affected by the output impedance of the current-switches and output glitches [121]. It means ideal 1:2 ratio may not result in the best performance. For example, [81] used a parameterizable abstract model to inspect the switching behaviour of the binary DAC using a system-level simulation and found out the correct element values. Therefore the ratio (R:2R) should be optimized. It is accomplished here by varying R and 2R values and performing many simulations for the best result.



Figure 5.17: Layout of R-2R section of 4R-2S DAC.

The area of whole 4R-2S DAC is 50 % smaller than 2R-4S. But suffers from nonlinear errors due to only two bits segmented form and rest is in binary-weighted from. It is worth mentioning the intrinsic gate to gate transmission line delay problem in R-2R network implementation (in real layout). The lower LSB reaches later at the output node than higher bits. As shown in layout (see Fig.5.17), the LSB0 output has a longer transmission line than the LSB1. Similarly, the LSB1 output has a longer transmission line than the LSB2. Increased transmission line increases propagation delay. Thus the layout of the current summation should be optimized. For higher bit R-2R network implementation, the influences of the gate to gate transmission line delay are more significant. In this 4R-2S DAC, the contributions of the LSB0, LSB1, and LSB2 to full-scale DAC output are 1.5 %, 3 %, and 6 % of the full-scale output respectively. The LSB3 has a contribution of 12 % to full-scale output but it does not have transmission line delay. It is because the highest LSB (LSB3 it this case) and all other MSBs are combined using an equal transmission line delay respectively. The LSB1, and combine the delay are harder to optimize and introduce considerable mismatching at the current summation point in real fabricated DAC.

## 5.4.2 3-bit R-2R and 3-bit segmented DAC (3R-3S DAC)

The schematics representation of 3-bit R-2R and 3-bit segmented DAC was presented in Fig.5.13. The R-2R network section is formed by the lower three LSBs. These LSBs are fed by the lower accumulator outputs Accu Out0, Accu Out1, and Accu Out2 respectively. The 3R-3S block diagram is shown in Fig.5.18.



Figure 5.18: 3R-3S DAC complete block diagram.

It has one less R-2R bit therefore the delay is reduced by one gate to gate transmission line delay and their contribution is decreased by half. In this 3R-4S DAC, the contributions of the LSB0 and LSB1 to full-scale DAC output are 1.5 % and 3 % of the full-scale output respectively. The LSB0, LSB1 has two, and one gate to gate transmission line delay respectively. Similarly, LSB2 has a contribution of 6 % to full-scale output it does not have the gate to gate transmission line delay since it is combined with other MSB output with equal transmission lines.

The segmented section has higher bits (MSB2, MSB1, MSB0). The MSB2 has four EFs and SWs, MSB1 has two EFs and SWs, and MSB0 has only one EF and SW These MSBs switches are fed by the respective higher Accu outputs (Accu Out 3, Accu Out 4, and Accu Out 5). Note that higher accu out has to drive a higher number of switches; therefore individual EF driver at the input of each switch is reasonable. After the various simulation, it found out that  $R = 25 \Omega$  (and  $2R = 48 \Omega$ ) generates sufficient voltage

output level (490 mV) from the DAC that suits TSC input voltage. Finally, both the R-2R and segmented section are combined to a 50  $\Omega$  output.



Figure 5.19: Layout of R-2R section of 3R-3S DAC.

### 5.4.3 2-bit R-2R and 4-bit segmented DAC (2R-4S DAC)

The 2-bit R-2R and 4-bit segmented DAC schematics and block diagram are shown in Fig.5.14 and Fig.5.20 respectively. The disadvantages of larger segmented bits not only increased the number of switches but also EFs required for highest input bit (Accu Out5). The output from Accumulator can drive up to 4 switches because there are already driver added at the output of the accumulator block. However, Accu Out5 does not have a fan out to drive 8 switches simultaneously. The extra driver for Accu Out 5 solves the fan-out problem but also introduces increased delay on such driver. To align delay at the input of all switches, extra buffers are also required after Accu Out0 to Accu Out4. This increases area and power consumption tremendously. The routing is more complex and increased transmission lines in layout (increased footprint) introduce more parasitic capacitances. After the various simulation, it found out that  $R = 25 \Omega$  (and  $2R = 65 \Omega$ ) results in output voltage swing of 1220 mV. It requires extra potential divider to feed the TSC which has a suitable input range from 200 mV to 500 mV.

# 5.5 Simulations of various DACs

To verify the performance of the DAC, simulations of the state transitions in the DAC have been performed for each DACs. The transient simulation of the DAC is performed for each input code starting from 000 000 up to 111 111, yielding a total 64 output voltage steps. The total output waveform simulation time is 20 ns with 312 ps for each step. Since the whole DAC is composed of differential circuits, the inverting output corresponds to a decreasing linear ramp. Ideally, both increasing ramp and decreasing ramp should have same fullscale output voltage and their crossover in the transient output should be exactly at the half of the simulation time for one complete fullscale output. However, due to slight mismatch in differential circuits (arising from the imperfect current steering to both branches), the increasing ramp has small gain offset and also its linearity is worse compared the decreasing ramp output. Therefore this worst-case (decreasing ramp) is



Figure 5.20: 2R-4S DAC complete block diagram.

considered for the calculation for DNL/INL performance in all DACs. Other hands, the major glitch occurs when the most significant bit (MSB2) changes (halfway point, i.e. at 10 ns). The glitch appears higher in increasing ramp, mainly due to increased full-scale output, is taken for glitch calculation and comparison.

# 5.5.1 Simulation for 4R-2S DAC



Figure 5.21: DAC output voltage vs linaer input code for 4R-2S DAC.

The transient simulation results show it has an increasing full-scale output voltage and

1 LSB voltage level of 233 mV and 3.66 mV respectively as in Fig.5.21. The increasing and decreasing ramps cross exactly at 10ns (halfway-point). The worst-case glitches of the increasing and decreasing ramps are 55 mV and 68 mV respectively.



Figure 5.22: INL and DNL static result (simulated) for 4R-2S DAC.

To see the static behaviour of the DAC, the INL and DNL performance of increasing ramp are simulated as shown in Fig.5.22. For all the input digital code, the INL is within  $\pm 0.37$  LSB ( $\pm 1.354$  mV). The DNL is even better which is within  $\pm 0.21$  LSB (0.77 mV) for all digital input codes.

# 5.5.2 Simulation for 3R-3S DAC



Figure 5.23: DAC output voltage vs linaer input code for 3R-3S DAC.

The transient simulation results show it has an increasing full-scale output voltage and 1 LSB voltage level of 468 mV and 7.32 mV respectively as in Fig.5.23. The increasing

and decreasing ramps cross exactly at 10ns as well. The glitches of the increasing and decreasing ramps are 96 mV and 120 mV respectively.



Figure 5.24: DAC output voltage vs linaer input code for 3R-3S DAC.

The INL and DNL performances are simulated for increasing ramp as shown in Fig.5.24. For all the input digital code, the INL is within  $\pm 0.48$  LSB ( $\pm 3.51$  mV). The DNL is even better which is within  $\pm 0.19$  LSB (1.39 mV) for all digital input codes.

# 5.5.3 Simulation for 2R-4S DAC



Figure 5.25: DAC output voltage vs linaer input code for 2R-4S DAC.

Finally, the transient simulation results show it has an increasing full-scale output voltage and 1 LSB voltage level of 1140 mV and 17.75 mV respectively as in Fig.5.25. The increasing and decreasing ramps cross exactly at 10ns (halfway point) as well. The glitches of the increasing and decreasing ramps are 145 mV and 195 mV respectively.



Figure 5.26: DAC output voltage vs linaer input code for 2R-4S DAC.

The INL and DNL performances are simulated for increasing ramp as shown in Fig.5.26. For all the input digital code, the INL is within  $\pm 0.48$  LSB ( $\pm 8.52$  mV). The DNL is even better which is within  $\pm 0.15$  LSB (2.66 mV) for all digital input codes.

# 5.6 Simulated results of DACs comparison

Here comparision is made between three DACs in terms of gain offset, DNL, INL, glitch, output voltage swing, area, and power consumptions. This suggests the suitable DAC for the DDS.

### 5.6.1 Gain offset, DNL, and INL comparison

The gain offset is the best in 4R-2S DAC that is only 2 LSB. The 3R-3S DAC and 2R-4S DAC have gain offset 3 and 4.5 LSB respectively. The worst-case DNL for 4R-2S DAC, 3R-3S DAC, and 2R-4S DAC are  $\pm$  0.21 LSB,  $\pm$  0.19 LSB, and  $\pm$  0.15 LSB respectively. The fluctuation of DNL is higher in 4R-2S DAC compared to others. It means the DAC with a higher number of segmented bits performs better in terms of DNL. Other hands, the INL is independent of the architecture chosen. It is better for 4R-2S than any other DAC. The worst-case INL for 4R-2S DAC, 3R-3S DAC, and 2R-4S DAC are  $\pm$  0.37 LSB,  $\pm$  0.48 LSB, and  $\pm$  0.48 LSB respectively. It is because the INL strongly depends upon element matching of switching cells. Element matching becomes difficult to achieve as the number of elements increases. Also, in our design, such element matching is relatively better among LSB switches than MSB switches. More number of MSB switches is used in segmented DAC, contributing to the INL error. However, the worst cases DNL/INL are within an acceptable limit of  $\pm$  0.5 LSB in all DAC variants. Thus all DACs are acceptable in terms of DNL/INL performance. Monotonicity guaranteed in all DACs.

| DAC   | SWs/<br>EFs | DAC<br>Vout<br>(mV) | Gain<br>offset<br>(LSB) | <b>Glitch</b><br>(mV) | Glitch<br>% | INL/<br>DNL<br>(LSB)  | $\begin{array}{c} \textbf{Active} \\ \textbf{area} \\ (mm^2) \end{array}$ | Power<br>cons. total<br>(EFs/SWs)<br>(mW)                           |
|-------|-------------|---------------------|-------------------------|-----------------------|-------------|-----------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------|
| 4R-2S | 7/7         | 233                 | 2                       | 68                    | 28.3~%      | $\pm 0.37 / \pm 0.21$ | 0.0213                                                                    | $     \begin{array}{r}       140 \\       (72/64)     \end{array} $ |
| 3R-3S | 10/10       | 468                 | 3                       | 120                   | 24.4 %      | $\pm 0.48 / \pm 0.19$ | 0.0310                                                                    | $214 \\ (92/117)$                                                   |
| 2R-4S | 17/17       | 1140                | 4.5                     | 195                   | 15.9~%      | $\pm 0.48 / \pm 0.15$ | 0.0533                                                                    | $378 \\ (165/213)$                                                  |

 Table 5.3: Performance comparison between the partially segmented 6-bit DAC combinations

#### 5.6.2 Glitch comparison

The glitch can be observed all DACs. However, the full-scale DAC outputs are not the same for all three DAC since their actual current summations are not the same. More segmented DAC has higher full-scale output voltage due to increased current switches. However, the glitch amplitude is proportional to the full-scale output voltage. Therefore for fair comparisons between differently segmented DACs, the Glitch % term is defined. The Glitch % calculated as the following formula:

Glitch % = 
$$\left(\frac{\text{Glitch amplitude (mV)}}{\text{Fullscale DAC output (mV)}}\right) \times 100\%$$
 (5.3)

The worst-case glitch amplitude and glitch % for various DACs are presented in Table5.3. It shows that the worst-case glitch is highest (195 mV) for 2R-4S DAC and lowest (68 mV) for 4R-2S DAC. However, the glitch % which shows the glitch compared to the full-scale swing is the highest (28.3 %) in the case of 4R-2S DAC and lowest (15.9 %) in 2R-4S DAC. The 3R-3SDAC has glitch = 120 mV and Glitch % = 24.4 %, that means performances are lying in-between previous two DACs. DAC with more segmented bits has better performance in terms of glitch.

#### 5.6.3 Area and power consumption comparison

The active area accounts for EFs and SWs without the transmission lines. As it can be seen in Table 5.3, the active area of the 2R-4S DAC is 0.0533 mm<sup>2</sup> which is 2.5 times larger than 4R-2S DAC. The 3R-3S has an active area of 0.0310 mm<sup>2</sup> (42 % smaller than 2R-4S DAC, 14 % larger than 4R-2S DAC). As expected, the more active area demands more power consumption (see Table 5.3). The total power consumption for 2R-4S DAC is 378 mW while 4R-2S has only 138 mW. The 3R-3S DAC has a power consumption of 214 mW (43 % less than 2R-4S DAC and 15 % more than 4R-2S DAC). In all DACs, more than 42 % of total power consumption goes to emitter followers (EFs) alone, and rest goes to switches (SWs). Relative glitch, area, and power consumption comparison (normalized to 100 % for the highest value) between DACs are also presented in the bar graph (see Fig.5.27). The glitch increases with increment in R-2R bits. It also shows the active area linearly proportional to power consumption.



Figure 5.27: Glitch, area, and power consumption comparison between various DACs.

# 5.6.4 Final selection of a DAC

In conclusion, 2R-4S DAC has the least relative glitch at the expense of greater area and power consumption. But it also shows nonsymmetrical increasing and decreases ramp, worst among three DACs. While the least gain offset, least area, and the best power efficiency are obtained using a 4R-2S DAC. But it has the highest glitch and suffers from physically unavoidable gate to gate delay problem. The DNL and INL performance is within an accepted  $\pm$  0.5 LSB boundary for all three DACs. Finally, the trade-off between glitch, area, and power consumption suggests 3R-3S DAC is the best candidate.

# 5.7 Layout for 3R-3S DAC



Figure 5.28: Layout of final DAC (3R-3S DAC).

The layout of 3R-3S DAC is realized as in Fig.5.28. The differential accumulator outputs are fed into the DAC inputs. Since all cell is differential circuits, each SW input and output have differential input and output that can be seen also in layout. All input transmission lines (from the accumulator to SW input) have equal length providing an equal delay despite different physical structure. The LSB1 and LSB0 have one gate to gate transmission line delay. To minimize this delay, the LSB0 and LSB 1 should be moved as close as physically possible to LSB2 switch. All other outputs from switches (LSB2 and MSBs) have equal transmission line path (using a power combiner fashion) maintaining the same delays at the summing output point as in Fig.5.28. Note that this in the layout figure only top metals, SW and EF blocks are shown. All other routings that are beneath the top metal and filler metals are not shown for a simplification. This layout is used in the final DDS realization.

# 5.8 Glitches in DAC

The dynamic properties of a DAC are related to output signals of the DAC and switching dynamics between two states. These are measured in terms of settling time and glitch energy in the time domain. Despite an effort to suppress the glitch of the DAC using a cascoded CML switches, careful biasing optimization, and combining segmented DAC; the glitch persists especially during a significant logic transition. Besides, the finite bit resolution of a DAC results in a staircase-type waveform. Glitches and low resolution of high-speed DAC circuits result in a poor SFDR of the DAC output. The SFDR and harmonic distortion are greatly affected by the transient glitch and smoothness of the DAC output [5]. Therefore, a solution to this problem in terms of circuit design is needed. To solve this problem, this is viewed from the DAC implementation in DDS perspective, rather than a conventional DAC. The DAC used in this DDS is unique since its inputs are always digital triangle signal. However, the frequency of such digital signals may change. Other hands, in conventional DAC, e.g. DAC used to transfer the information signal in the communication system, the shape of the information signal could be various shapes (e.g. sine wave, triangle, rectangle, or saw-tooth). These are not limited to the only triangle wave. Here, having only the triangle as an input wave allows manipulating switching behaviour of the DAC SWs. Slowing down the speed of the linear DAC of the DDS to some extent helps the glitch smoothed out from the signal and suppressing the glitch during each transition. Here the high-speed DAC circuit is demonstrated that diminishes the impact of glitches leading to overall DAC circuit performance at highest operating frequencies, without increasing the circuit design and the complexity.

## 5.8.1 Glitch analysis of 3R-3S DAC

To understand the glitch problem, simulations of the state transitions in the DAC are performed (as previously shown in), and glitches are observed at various time instances. There are many significant glitches visible in Fig.5.29 at time instances of 2.5 ns and its multiples in both increasing and decreasing ramps. The decreasing ramp exhibits worse glitch performance compared to the increasing ramp using the same circuitry. At 2.5 ns, the digital input code changes from 111 000 to 110 111 for a decreasing ramp, and the



Figure 5.29: Glitches in 3R-3S output voltage with linearly increasing input code.

corresponding glitch is 30 mV. Similarly, the 30 mV glitch appears when the digital input code switches from 110 000 to 101 111 at 5 ns, and so forth. The corresponding glitch levels for the decreasing ramp are presented in Table.5.4.

| <b>Time</b> (ns) | Input<br>transition<br>(in decimal) | Input<br>transition<br>(in binary) | Nr. of<br>switch<br>involved | Glitch<br>amplitude<br>(in mV) |
|------------------|-------------------------------------|------------------------------------|------------------------------|--------------------------------|
| 2.5              | 56 to 55                            | 111000 to 110 111                  | 4                            | 30                             |
| 5                | 48 to 47                            | 110 000 to 101 111                 | 6                            | 60                             |
| 7.5              | 40 to 39                            | 101 000 to 100 111                 | 4                            | 30                             |
| 10               | 32  to  31                          | 100 000 to 011 111                 | 10                           | 120                            |
| 12.5             | 24 to 23                            | 011 000 to 010 111                 | 4                            | 27                             |
| 15               | 16 to 15                            | 010 000 to 001 111                 | 6                            | 53                             |
| 17.5             | 8 to 7                              | 001 000 to 000 111                 | 4                            | 26                             |

Table 5.4: Input code transition and corresponding glitch level for decreasing ramp in DAC

From Fig.5.29 and Table5.4, the largest glitch (120 mV) occurs when all switches change their states. The most significant bit (MSB2) changes from 1 to 0 while remaining all lower bits change from 0 to 1. The glitch level increases with the number of involved switches. Therefore, a significant glitch is observed due to the drastic increase in current needed to switch all the switches involved.

## 5.8.2 Glitch suppression of 3R-3S DAC

The glitch can be reduced by incorporation of a glitch suppression capacitance ( $C_{SUP}$ ) at the output of each switch, as illustrated in Fig.5.30 [112], [122]. The impacts of  $C_{SUP}$  on the glitch amplitude for worst-case glitch level are studied. The maximum glitch level occurs while all switches change yielding a worst-case glitch. The glitch level for various capacitance values are simulated for this worst-case situation are shown in Fig.5.31. The introduction of the  $C_{SUP}$  makes the switching speed of the DAC cell slower, i.e. gate delay



Figure 5.30: Switching cell used for LSB (left) and MSB (right) with glitch suppression capacitor  $C_{SUP}$ .

increases. Here the optimized value of  $C_{SUP}$  is chosen such a way that it will not affect the SW speed required for DAC to interpret Accu digital output data into analog. It is seen that the major effect of  $C_{SUP}$  is for low capacitance values  $\leq 100$  fF with a moderate increase in gate delay (15ps) of the switch, as indicated in Fig.5.32. It accounts for at least 60 Gbps for both LSB and MSB SW that can be seen in eye diagrams in Fig.5.33. Increasing the value of  $C_{SUP}$  reduces the glitches, but with substantial gate delay increase. The extensive simulation shows that the optimum  $C_{SUP}$  value is 100 fF. The DAC outputs for  $C_{SUP} = 100$  fF for increasing and decreasing ramp are shown in Fig.5.34. All glitch levels are strongly suppressed with this approach.

In Fig.5.35, the DAC output with and without  $C_{SUP}$  are shown. With  $C_{SUP}$  has smoother (due to gate delay effect) triangle shape as compared to without  $C_{SUP}$ , which has a staircase-like signal shape and also severe glitch. The following TSC is sensitive to the shape of its input triangle signal to generate good SFDR sinewave signal. Thus TSC demands smoother signal since higher glitch and staircase case triangle shape will increase uncertainty in sinewave transformation. However, the  $C_{SUP}$  should not be too large otherwise; gate delay will be so large that DAC can not interpret the accu digital outputs at all. Various simulations performed as it is found that the up to 100 fF, the gate delay lies within a limit and the DAC can still decoding digital inputs. In real-layout, however, the  $C_{SUP}$  should be slightly less than 100 fF. It is because the parasitic capacitance associated with each output nodes account in the range of several fF. Therefore 85-90 fF should be enough for the glitch suppression to make a triangle shape smoother and without introducing a significant gate delay. Thanks to the CML based on HBT which has higher trans-conductance than NMOS, thus it can drive even an additional glitch suppression capacitive load efficiently, which would be difficult to drive with NMOS CML [53]. In this DDS, the DAC output feeds a triangular to sine waveform converter (TSC), which demands a glitch-free (as in Fig.5.34 instead of Fig.5.29) triangle input signal for a better SFDR sinewave output. Thus, better SFDR can be achieved



Figure 5.31: Glitch amplitude level of the MSB2 switch output for various capacitor ( $C_{SUP}$ ) values (0 fF, 50 fF, 150 fF and 100 fF, clockwise starting from the top-left).

when appropriate  $C_{SUP}$  is used in DAC in DDS circuits.



Figure 5.32: The glitch amplitude and gate delay during the input code 100 000 to 011 111 switching with the function of glitch suppression capacitor ( $C_{SUP}$ ).



Figure 5.33: Eye-diagram of LSB switch (left) and MSB switch (right) with  $C_{SUP} = 100$  fF.



Figure 5.34: DAC output voltage with linearly increasing input code (with glitch suppression  $C_{SUP} = 100 \text{fF}$ ).



Figure 5.35: DAC output voltage (without and with glitch suppression  $C_{SUP}$  for digital triangle input codes.
#### 5.9 SiGe high speed DAC comparisons

| DAC<br>work  | Size<br>(bits)/<br>Type | Power<br>diss.<br>(W) | Out-<br>put<br>swing<br>(V) | $\begin{array}{c} \textbf{Node} \\ (\mu m) / \\ \textbf{f_T} \\ (GHz) \end{array}$ | DAC<br>speed<br>(GHz) | Fall/<br>Settling<br>time<br>(ps) | INL/<br>DNL<br>(LSB)  | SFDR<br>(dBc)<br>@F <sub>in</sub> /<br>F <sub>sample</sub><br>(GHz) |
|--------------|-------------------------|-----------------------|-----------------------------|------------------------------------------------------------------------------------|-----------------------|-----------------------------------|-----------------------|---------------------------------------------------------------------|
| [116]        | 6/<br>Full seg.         | 2                     | 1.3                         | $0.130/ \\ 150$                                                                    | 22                    | -/70                              | $\pm 0.5/\ \pm 0.5$   | $\begin{array}{c} 35 @ \\ 8 / 22 \end{array}$                       |
| [123]        | 6/<br>Binary            | 0.36                  | -                           | $0.180/ \\ 155$                                                                    | 20                    | -                                 | -                     | 19 @<br>2.5 / 16                                                    |
| [124]        | 6/<br>Binary            | 0.19                  | 1.6                         | 0.250/<br>180                                                                      | 20.5                  | -                                 | $\pm 0.8/\ \pm 0.6$   | 28 @<br>6 / 13                                                      |
| [115]        | 4/<br>Binary            | 0.455                 | -                           | 0.250/<br>190                                                                      | 30                    | 15/-                              | $\pm 0.49/\ \pm 0.57$ | -                                                                   |
| [81]         | 6/<br>Binary            | 1.05                  | 1.1                         | 0.250/<br>180                                                                      | 26.8                  | 18/45                             | $\pm 0.5/\ \pm 0.5$   | $30.1 @ \\ 6 / 13.4$                                                |
| [114]        | 4/<br>Distribut.        | 5.2                   | 6.7                         | $\begin{array}{c} 0.130 / \\ 240 \end{array}$                                      | 56                    | 10/-                              | $\pm 1/ \pm 1.5$      | -                                                                   |
| [125]        | 8/<br>Partial<br>seg.   | 0.47                  | 0.6                         | $\begin{array}{c} 0.055 / \\ 320 \end{array}$                                      | 75                    | 8/22                              | $\pm 0.5 / \pm 0.5$   | $38.7 @ \\ 36.5 / 75$                                               |
| This<br>work | 6/<br>Partial<br>seg.   | 0.214                 | 0.49                        | 0.250/<br>180                                                                      | 30                    | 15/38                             | $\pm 0.48 / \pm 0.19$ | 30*<br>9.25 / 18.5                                                  |

Table 5.5: Comparison between high speed (>20 GHz) DAC based on SiGe process

\* Measured data along with accumulator and TSC blocks. Details will be explained in chapter 7.

Table 5.5 shows high-speed DAC based on SiGe technology. The clock speed is greater than 20 GHz in all DACs. The amplitude resolution is limited to the 8-bit and sampling speed of 75 GHz [125]. It has the SFDR of 38.7 dBc, thanks to 8-bit amplitude resolution. It stands out rest of the DACs in terms of sampling frequency (75 GHz) mainly due to the most advanced node (0.055  $\mu$ m) having f<sub>T</sub> of 320 GHz, which is 68 % faster than the next fastest node [115] (see Table 5.5). All other DACs are limited to 6-bit of resolution. [114] uses a special 6 bit distributed DAC structure to achieve 56 GHz sampling speed and very high output swing of 6.7 V. However it has the highest power consumption of 5.2 watts. The [116] is also a power-hungry DAC with 2 watts of power consumption because it is fully segmented DAC. However, it shows an excellent SFDR of 35 dBc that is the second-best among others. [116], [123], and [124] are among the least power consuming DAC due to their binary structure. However, DNL performances of [115], and [124] are relatively poor since they do not comply within  $\pm 0.5$  LSB criteria. Interestingly the [81], despite being binary-weighted DAC, has INL/DNL within  $\pm 0.5$  LSB range. It is because they performed extensive modelling for the dynamic behaviour of the DAC switches and took special care for the calculation of binary-weighted resistor values. They cancelled out the systematic error produced by the binary stage. Conversely, it has

higher power consumption (1.1 watts), which is unlikely in binary-weighted DAC. It is due to the addition of input/output clock drivers. It is also because the higher voltage swing maintained (400-mV instead of 200 mV) to provide fast transition and reduce the input jitter. Note that the [81] and this work on DAC based on the same technology (SG25H1/4). This work is among the best in terms of power consumption (214 mW) and sampling speed of 30 GHz. It shows good SFDR (30 dBc @  $F_{in}$  9.25 GHz/  $F_{clk}$ 18.5 GHz) performance while INL/DNL error lies within a limitation. Since the DAC is an integrated part of the DDS, it is not possible to measure individually. The SFDR performance is a measured value along with other blocks (accumulator and TSC). In separate DAC measurement, it would have better SFDR performance since this measured SFDR is degraded by the SFDR of the TSC block as well.

#### DAC\_Output 50Ω Accu Out7 SW MSB2 2R LSB5 SW EF 🗲 Accu Out4 sw R≸ LSB4 SW MSB1 SW 📕 🗲 🖌 Accu Out3 Accu Out6 R≩ LSB3 2RSW EF 🗲 Accu Out2 MSB0 Accu Out5 - EF R₹ LSB2 W EF 🗲 Accu Out1 R LSB1 🖌 Accu Out1 WHEF R LSB0 SW EF 🗲 Accu Out0

#### 5.10 DAC for future DDS

Figure 5.36: 5R-3S DAC block diagram.

This DDS has implemented 6-bit accumulator output and 6-bit DAC. For better phase resolution and SFDR performance, higher bit DAC is essential. While designing 6-bit DAC, a quest for even better resolution DAC is also briefly investigated for finer amplitude precision (e.g. 8-bit DAC). It is challenging to design a multi-GHz range DAC with 8-bit resolution. However, with a careful partially segmented DAC optimization, good linearity and DNL/INL performance can be realized. The proposed 8-bit DAC is composed of 5bit R-2R and 3-bit segmented DAC (5R-3S DAC) as shown in Fig. 5.36. The intrinsic gate to gate delay might be a problem in real layout realization but it can be reduced by LSB keeping switches as close as possible. The simulation result (see Fig.5.37) shows that the DAC is perfectly behaving to ramp digital input codes (both for increasing and decreasing). The area  $(0.0365 \text{mm}^2)$  power consumption (270 mW) is slightly increased as compared to 3R-3S DAC. The area and power consumption are increased because of three extra SWs for LSBs. The glitch % is 10 % at  $C_{SUP} = 50$  fF in this DAC which is only 5 % for 3R-3S DAC. It is because more numbers of bits are composed of in R-2R network. Although this DAC is not used in this DDS, it is fair to say that the DAC will not a bottleneck for the SFDR of future higher resolution ROM-less DDS. In the



Figure 5.37: Result of 8 bit DAC (5R-3S DAC) with  $C_{SUP} = 50$  fF.

future, faster technology (e.g. SG13 [58]) implementation, it will have a reduced voltage headroom, compact transistor size, and higher speed. It will reduce the area and power consumption. In the faster technology, the switching speed of the switch is also higher. Thus, the glitch can be suppressed better since it can compromise more gate delay with increased glitch suppression capacitance value. Therefore this 8-bit DAC is more suitable for faster technology.

#### 5.11 Conclusions from chapter 5

The 6-bit DAC based on a partially segmented approach is designed. It has a sampling speed of 25 GHz and while INL and DNL within  $\pm 0.5$  LSB limit. It has an active area of 0.176 mm<sup>2</sup>, the power consumption of 214 mW and the output voltage swing of 490mV. It is one of the fastest DAC in SiGe process (see Table.5.5). The future prospect of this DAC is also presented. The glitch in major carry transition is suppressed by the special glitch suppression capacitor. The DAC with 8-bit size and capable of working at a higher speed in more advanced technology is also explained.

# Chapter 6

## Triangle to sine wave converter

Triangle to sine wave converter (TSC) is used in arbitrary function generators, phaselocked loops and some communication circuits. Historically, this has been realized with a piecewise linear approximation using a diode shaping networks [126] and differential pair circuits [127]. [128] presented a monolithic approach for waveform generation in a single monolithic chip. The TSC is one of the main components and comes into the last part of this DDS design. The phase of the phase accumulator into amplitude conversion (PAC) is performed using a linear DAC and a TSC combined. The ROM- based PAC consumes a large chip area and power where the speed and latency of the circuit are also limited [129]. The nonlinear based PAC also face tremendous challenges on keeping the power consumption lower due to an extensive number of switching matrix needed to decode the amplitude of the sine wave with reasonable purity despite the operating speed are achieved in GHz range [129] [18]. The advantage of this TSC based PAC is that it can be very compact where only a few transistors are sufficient to generate a sine wave, saving area and power consumption. In addition to this, it can work a relatively wide frequency range. The TSC is designed such that it converts to a sine wave and also amplifies the signal to maintain the reasonable output power. Here the basic principle of TSC is explored in circuit level as described in [130] and directly transferred such design concept into this design.

#### 6.1 Differential pair based TSC

The differential circuits have the hyperbolic tangent transfer characteristic for the high signal excitation [5]. Therefore the triangular to sine converter has been implemented using the differential pair of the transistor where nonlinear transfer function of the silicon bipolar results a sine wave. Both bipolar and MOS transistor can be used for this transformation [18], [130], but here only bipolar transistor is discussed. An analog-triangular signal is applied (see Fig.6.1), such a way that edges of the triangle fall into the nonlinear transfer function. It results converts the triangular shape into a sine wave signal at the output. As the distortion caused by a nonlinear function dependent upon the amplitude of the input, thus the main design challenge is to maintain right biasing, so that appropriate amplitude/voltage levels of the input triangular wave fall into the nonlinear function symmetrically. It is observed in ill-designed case, where input falls in the wrong region

of the TSC transfer function, resulting clipping of upper, lower or both peaks of the sine wave. The SFDR of this TSC output governed by the third and fifth harmonic contents and an investigation is necessary to suppress them. After hundreds of simulation, the optimum biasing, the input/output voltage ranges and circuit parameters have been adjusted. All the optimization and tuning required for the TSC are explained step by step. TSC operating frequency range covers the lowest synthesizable frequency and the highest synthesizable frequency of the DDS.



**Figure 6.1:** Differential paired based TSC core (left) and its input-output wave characteristics [18].

At first, the TSC circuit is analyzed. It has two bipolar transistors T1 and T2 are connected as differential pair (see Fig.6.1). The input signal having  $V_{PP} = 2 \times V_M$  is applied to at the base of the transistor and the output current is taken from the collector (across the load resistor RL). The current can also be drawn out from the resistor RS given that the current gain (beta) of a transistor is relatively large. It would be possible since the of single emitter NPN transistor of this technology IHPSG25H4 is ~ 250. But the signal across the R<sub>s</sub> is smaller than RL. Thus the TSC output is taken from collector load for the biasing and larger output swing purpose. As in Fig.6.1(left) when the sufficiently large input signal is excited to differential pair the peak of the triangle is flattened by the nonlinear transfer function (curvature). In other words, intended distortion is created to the triangle wave to generate a sine wave. The quality of the sine wave depends upon the acceptable amplitude; input dc offset voltage and biasing. Some input dc offset helps to decrease the second-order harmonics of the signal, but it is not discussed for the simplicity of the analysis [130].

The input voltage  $V_i$  is given by:

$$V_{i} = V_{BE1} + i \times R_{S} - V_{BE2}$$

$$(6.1)$$

$$V_{BE} = V_{T} \times \ln\left(\frac{I_{C}}{I_{SS}}\right)$$
(6.2)

where  $V_{BE}$  is base to emitter voltage,  $V_T = \frac{KT}{q}$  is a thermal voltage (26 mV at room temperature),  $R_S$  is the emitter resistance, i is the current flowing through  $R_S$ ,  $I_C$  is the collector current, and  $I_{SS}$  is the saturation current of the transistor. Therefore the base to emitter voltage of the T1 is given by:

$$V_{BE1} = V_{T} \times \ln\left(\frac{I_{C1}}{I_{SS}}\right)$$
(6.3)

And the base to emitter volatge of the T2 is given by:

$$V_{BE2} = V_{T} \times \ln\left(\frac{I_{C2}}{I_{SS}}\right)$$
(6.4)

Now combining Eqn. 6.1, Eqn. 6.3, and Eqn. 6.4:

$$V_{i} = i \times R_{S} + V_{T} ln \left(\frac{I_{C1}}{I_{C2}}\right)$$
(6.5)

The T1 and T2 are the same size transistors from the same technology and for the same operating condition, the current gain beta values for both (T1 and T2) can be considered the same. The collector current for both transistors can be made the same using a very stable current mirror circuit as discussed in chapter 3.

$$I_{C1} = I + i \tag{6.6}$$

$$I_{C2} = I - i \tag{6.7}$$

Now again substituting Eqn.6.6 and Eqn.6.7 in Eqn.6.5:

$$\frac{V_{i}}{V_{T}} = \left(\frac{i}{I}\right) \left(I \times \frac{R_{S}}{V_{T}}\right) + \ln\left(\frac{1 + \frac{i}{I}}{1 - \frac{i}{I}}\right)$$
(6.8)

The log part of the above equation can be expressed as a power series terms:

$$\ln\left(\frac{1+\frac{\mathrm{i}}{\mathrm{I}}}{1-\frac{\mathrm{i}}{\mathrm{I}}}\right) = 2\left(\frac{\mathrm{i}}{\mathrm{I}}\right) + \frac{2}{3}\left(\frac{\mathrm{i}}{\mathrm{I}}\right)^3 + \frac{2}{5}\left(\frac{\mathrm{i}}{\mathrm{I}}\right)^5 + \dots$$
(6.9)

where it is valid when the current passing through the  $R_S$  is less than the collector current of T1 and T2. Now from Eqn.6.8 and Eqn.6.9:

$$\frac{V_{i}}{V_{T}} = \left(2 + \frac{I \times R_{S}}{V_{T}}\right) \left(\frac{i}{I}\right) + \frac{2}{3} \left(\frac{i}{I}\right)^{3} + \frac{2}{5} \left(\frac{i}{I}\right)^{5} + \dots$$
(6.10)

$$\frac{1}{\left(2+\frac{I\times R_{\rm S}}{V_{\rm T}}\right)} \left(\frac{V_{\rm i}}{V_{\rm T}}\right) = \left(\frac{\rm i}{\rm I}\right) + \left(\frac{2}{3}\right) \frac{1}{\left(2+\frac{I\times R_{\rm S}}{V_{\rm T}}\right)} \left(\frac{\rm i}{\rm I}\right)^3 + \left(\frac{2}{5}\right) \frac{1}{\left(2+\frac{I\times R_{\rm S}}{V_{\rm T}}\right)} \left(\frac{\rm i}{\rm I}\right)^5 + \dots$$

$$(6.11)$$

$$i = k1\sin(k2 \times V_i) \tag{6.12}$$

$$K2 \times V_i = \arcsin\left(\frac{i}{K1}\right)$$
 (6.13)

The expansion of the Eqn.6.13 is given by:

$$K2 \times V_i = \frac{i}{K1} + \frac{1}{6} \left(\frac{i}{K1}\right)^3 + \frac{3}{40} \left(\frac{i}{K1}\right)^5 + \dots$$
 (6.14)

After comparing Eqn.6.13 and Eqn.6.14, to get the desired transfer function it is necessary to have:

$$\mathbf{K}_1 = \mathbf{I} \tag{6.15}$$

$$K_2 = \left(\frac{1}{2 + \frac{IR_S}{V_T}}\right) \left(\frac{1}{V_T}\right)$$
(6.16)

The peak value of the amplitude  $K_1$  current should be equal to the current source I. Given that the peak of the triangle is  $V_M$ , for a sinewave output, it is necessary that:

$$K2 \times VM = \frac{\pi}{2} \tag{6.17}$$

Therefore from Eqn.6.16 and Eqn.6.17:

$$\frac{V_{\rm M}}{V_{\rm T}} = 1.57 \left(\frac{I \times R_{\rm S}}{V_{\rm T}}\right) + 3.14 \tag{6.18}$$

Eqn.6.18 gives the normalized input triangle amplitude to convert into sinewave with minimum distortion. The circuit transfer function represented in Eqn.6.11 should resemble as close as the arcsine function in Eqn.6.14. By equating the third and fifth terms of these two equations, one can find that  $\frac{I \times R_S}{V_T}$  should be 2 for the lowest second-order distortion and 3.3 for the lowest third-order harmonics performance.

### 6.2 TSC design for DDS

The analysis presented above is a very good approximation for any differential bipolar based TSC design. Now it is implemented in our circuit as in Fig.6.1(left). It should be noted that this is a low-frequency analysis. It does not take into the effect of parasitics (capacitance and inductance). Such parasitic may change the dynamic behaviour of the circuit as frequency increases. However, all the above parameters can be used for initial estimation irrespective of the technology and the speed of the transistor. Later the circuit will be simulated even at a higher frequency to adjust the correct values of above approximations. The gain of the TSC is shown in Fig.6.2. The 3-dB bandwidth of TSC is more than 25 GHz, and it is more than sufficient for the intended 10 GHz (maximum) frequency of operation.



Figure 6.2: TSC bandwidth (gain versus input/output frequency).

The TSC for the DDS in this technology is derived from the analogy presented in the previous section. Given equations and the input voltage ratio  $\frac{V_M}{V_T}$  for minimum distortion (THD/FHD) gives a good initial value for circuit optimization in our design. The R<sub>s</sub> can be optimized for either for third harmonic distortion (THD) suppression or fifth harmonic distortion (FHD) suppression. The current source value 1.3 mA is chosen for near peak  $f_T$  value. It is 0.3 mA more current than usual current source in high-speed circuits used so far in this DDS work. The certainly increases power consumption, but TSC is one of the smallest circuits compared to accumulator and DAC in terms of the area and transistor count. So the increased power consumption is less significant. In return, this serves benefits. It does not only increase the speed of the TSC but also the total output voltage swing. The best R<sub>s</sub> value for THD suppression is when

$$\frac{\mathbf{I} \times \mathbf{R}_{\mathrm{S}}}{\mathbf{V}_{\mathrm{T}}} = 2 \tag{6.19}$$

$$R_{\rm S} = \frac{2 \times V_{\rm T}}{I} = \frac{2 \times 26 \,\mathrm{mV}}{1.3 \,\mathrm{mA}} = 40 \,\Omega \tag{6.20}$$

Similarly the best  $R_S$  value for FHD suppression is when,

$$\frac{I \times R_S}{V_T} = 3.3 \tag{6.21}$$

$$R_{\rm S} = \frac{3.3 \times V_{\rm T}}{I} = \frac{3.3 \times 26 \text{mV}}{1.3 \text{mA}} = 66 \ \Omega \tag{6.22}$$

Now the  $R_S$  is chosen for the THD suppression. Using Eqn. 6.18, for  $R_S = 40 \Omega$ , I = 1.3 mA, and  $V_T = 26 mV$ :

$$\frac{V_M}{26 \text{ mV}} = 1.57 \times \frac{1.3 \text{ mA} \times 40 \Omega}{26 \text{ mV}} + 3.14 = 163 \text{ mV}$$
(6.23)

Therefore optimum input peak to peak amplitude of the triangle wave to the TSC input for the least THD is given by:

$$V_{\rm PP} = 2 \times V_{\rm M} = 327 \,\mathrm{mV} \tag{6.24}$$

#### 6.3 Under, medium, and over excitation to the TSC

To understand the under excitation, optimum excitation and overexcitation, the TSC is simulated for three different inputs at 5 MHz. When the input signal amplitude is too low, output remains like a triangle wave. The TSC is simulated with the low input voltage excitation ( $V_{PP} = 110 \text{ mV}$ ), the results are shown Fig.6.3, where the output signal remained like a triangle, hence no sinewave conversion. It is because the input signal swing is too low so that edges of the input triangle do not fall into the non-linear transfer characteristics of the TSC transfer function.



Figure 6.3: Under excitation ( $V_{PP} = 110 \text{ mV}$ , left), appropriate excitation ( $V_{PP} = 240 \text{ mV}$ , middle) and over excitation ( $V_{PP} = 850 \text{ mV}$ , right) of 5MHz triangle input to the TSC.

When an appropriate input excitation (240 mV) is applied, the edges of the triangle falls into the nonlinear portion of the TSC, and just enough to yield a curvature at the edges, resulting from a sinewave (see Fig.6.3(middle)). Finally, when the input signal amplitude is too large, the peak of the sinewave is clipped because it would go beyond saturation and result in square wave-like signal rather than a sinewave. The TSC is simulated with the high input excitation ( $V_{PP} = 850 \text{ mV}$ ), and results are plotted in Fig.6.3(right), where both upper and lower part of the output signal is clipped. Therefore, it is not functioning as a sine wave converter. The incorrect dc offset also degrades the smoothness of the output signal despite optimum input voltage swing because the sine wave is created using two transistors working and any imbalance leads imperfect sinewave.



#### 6.4 Harmonic suppression in TSC

Figure 6.4: Input peak to peak voltage (Vpp) versus harmonics suppression of TSC at the input triangle wave frequency of 5 MHz, 2.5 GHz, 5 GHz, and 10 GHz (clockwise from top-left).

The optimum excitation  $V_{PP}$  (327 mV) to the TSC given by Eqn.6.24 is also a rough estimation. Simulations should be performed at different input excitation for various TSC frequencies. To check the robustness of the TSC input drive level, the input excitation  $V_{PP}$ is simulated from a 125 mV to 350 mV. The frequency of the triangle wave is chosen from 5 MHz (lowest output frequency for DDS) to 10 GHz (highest output frequency for DDS). From the simulation result (see Fig.6.4), it can be seen that despite THD suppression applied, it is still more influential than FHD. Indeed for all case of input voltage ( $V_{PP}$ ), the SFDR of the TSC output is governing by the THD, except for 10 GHz. This means  $R_S$  optimization for THD is needed more than the FHD. This makes the selection of  $R_S$ value easier. Generally, RS cannot be optimized for both cases in the same circuit at the same time.

Most important information from the graph (see Fig.6.4) is that the  $V_{PP}$  value from 175 mV to 275 mV is optimum where the THD is suppressed at least 30 dBc, which directly translates into SFDR better than 30 dBc. It suggests that the TSC input has a wide range of input excitation range. It is a notable feature of TSC since the output



Figure 6.5: TSC SFDR over frequency for 225 mV input excitation. The SFDR is considered up to fifth harmonics of the fundamental signal. If the SFDR bandwidth were considered up 10 GHz, the SFDR would be much better specially for the output frequency greater than 5 GHz.

voltage swing of the DAC might fluctuate by  $\pm$  35 mV (from 465 mV ideal swing) for different frequency due to the dynamic nature of the DAC cells. Even in that case, the TSC can still be operated with good SFDR. Again simulations are performed for the triangular input signal of 2.5 GHz, 5.5 GHz, and 10 GHz for RS value of 40  $\Omega$  (optimized for THD) as well (see Fig.6.4(top right to bottom left clockwise)). At 2.5 GHz input, the THD suppression is above 30 dBc for Vpp range staring from 220 mV to 280 mV. At 5 GHz input, the THD is also significant compared to SHD and FHD. The THD is at least 30 dBc for input excitation from 210 mV to 300 mV. Finally, at 10 GHz input, the SHD is dominant than THD and FHD. The optimum value for input excitation ranges from 205 mV to 225 mV that shows worst-case suppression better than 30 dBc (Fig.6.4 (bottom-left)). In all input frequency, this optimum excitation is shifted down to 200-225 mV from the calculated 327 mV excitation. The SFDR up to 10 GHz for input excitation of 225 mV is presented in Fig.6.5. For all frequencies, the SFDR is better than 30 dBc.

#### 6.5 Complete TSC block design



Figure 6.6: Voltage shifter for the TSC input and output (at 1 GHz).



Figure 6.7: Complete schematics of the TSC core circuit including input EF, potential divider, and output EF.

The voltage swing (V<sub>PP</sub>) from the DAC to TSC is  $\approx 425$  to 500 mV. The designed TSC shows it has approximate input Vpp range of 175-225 mV. Therefore a voltage shifter is added at the input of the TSC to maintain the input level and to adjust the correct DC offset. Fig.6.6 shows the voltage transition from the DAC to the TSC output and Fig.6.7 is complete schematics of the TSC circuits. As an example, the output of the DAC (triangle wave) for 1 GHz signal has voltage level 4V/3.55V (with V<sub>PP</sub> = 450mV). It is shifted down (3.1V/2.65V) by the emitter follower. It also acts as a buffer between DAC and the TSC core. The 4 V supply is used for voltage shifter, and 3 V supply for TSC core and emitter follower. The potential divider of R1 = R2 = 100  $\Omega$  is applied that scales down the voltage swing by 50 % and compress the V<sub>PP</sub> (1.8V/1.58V) to 225 mV. It also fixes the offset voltage of 1.65 V on the base of the differential transistor. The TSC core generates the sinewave of V<sub>PP</sub>  $\approx 750$  mV. The voltage swing of the TSC output sinewave varies over frequency due to the slightly decreased gain of the TSC core (see Fig.6.2) at a higher frequency and also due to the transformation of the triangle to a sine wave. Finally, the emitter follower is used to drive the output load.

#### 6.6 TSC simulation results

The previous DAC has output swing from 430 mV to 500 mV. Therefore the TSC circuit (including input EF, potential divider, output EF) is excited with 450 mV triangle wave. The TSC is tested with five input triangles (5 MHz, 2.5 GHz, 5 GHz, 10 GHz, and 15 GHz) individually with the input excitation of 450 mV. The following EF and divider shift the input swing to an appropriate level of 215 mV to the TSC input. The output sinewave and their spectrum are plotted in Fig.6.8, Fig.6.9, Fig.6.10, Fig.6.11, and Fig.6.12. The TSC output spectrum covers at least fifth harmonics of the fundamental TSC output frequency. Therefore all the second, third, fourth, and fifth harmonics can



Figure 6.8: TSC output sinesave (left) and its spectrum (right) with input triangle of 5 MHz ( $V_{PP} = 450 \text{ mV}$ ).



Figure 6.9: TSC output sinesave (left) and its spectrum (right) with input triangle of 2.5 GHz ( $V_{PP} = 450 \text{ mV}$ ).

be observed with respect to the fundamental output. At higher frequencies, the second harmonics increases this is due to the clipping at the lower edge of the triangle input. Its influence is seen on spectrum Fig.6.12(right).

#### 6.7 Nonideal TSC input and the DDS

The TSC simulated above has an ideal input triangle wave. In the real DDS, the input triangle to TSC provided by the DAC is not ideal. Depending upon the ratio of output frequency ( $F_{out}$ ) to clock frequency ( $F_{clk}$ ) of the accumulator (or DDS), the smoothness of the DAC output triangle changes. For example, the 20 GHz clock is provided to DDS. And FCWs are chosen such a way that  $F_{out}$  :  $F_{clk}$  ratios are 1:4, 1:8, and 1:16 resulting 5 GHz, 2.5 GHz, and 1.25 GHz respectively. Three different DAC outputs (or TSC inputs)



Figure 6.10: TSC output sinesave (left) and its spectrum (right) with input triangle of 5 GHz ( $V_{PP} = 450 \text{ mV}$ ).



Figure 6.11: TSC output sinesave (left) and its spectrum (right) with input triangle of 10 GHz ( $V_{PP} = 450 \text{ mV}$ ).

are compared in Fig.6.13.

The DDS intended to work up to 10 GHz output (for maximum 20 GHz clock). So the low pass filter (LPF) at 10 GHz cutoff frequency is a conventional way to suppress any significant harmonics that fall beyond 10 GHz. The LPF could be either passive or active. Active LPF has the gain enhancement benefit. However, here only simple passive RC LPF is used. Nevertheless, the DDS output consists of several harmonics. For instance, 4 GHz DDS output has second and third harmonics at 8 GHz and 12 GHz respectively. Suppose LPF at 10 GHz used at the output of the TSC, that suppresses all harmonics beyond 10 GHz. The suppression of 12 GHz harmonics depends upon the structure (order) of the LPF. It is because higher-order LPF has a sharp roll-off after the cut off frequency, thus resulting in better suppression. For 5 GHz DDS output, LPF at 10 GHz can suppress only the third harmonics (15 GHz) and higher harmonics. Lowering the cutoff frequency of LPF from 10 GHz, e.g. to 8 GHz can also suppress the second



Figure 6.12: TSC output sines (left) and its spectrum (right) with input triangle of 15 GHz ( $V_{PP} = 450 \text{ mV}$ ).



Figure 6.13: DAC output (TSC input triangle) at 5 GHz (left), 2.5 GHz (middle), and 1.25 GHz (right).

harmonics (10 GHz). Ideally, this DDS output operates up to 10 GHz. If it is practicallyachieved (at TSC output), the LPF at 8 GHz (after the TSC) lowers the final amplitude of the DDS outputs that are beyond 8 GHz. Here a special case is observed in the DDS. In reality, the SFDR of the DDS output frequencies that are close to 10 GHz (or half of the clock in general) has low SFDR and spectral purity. It is because of the inferior quality of the triangle fed by DAC to the TSC as in Fig.6.13. It shows that even 5 GHz triangle has only four sampling points (for 20 GHz clock). For DDS output frequency higher than 5 GHz, the TSC input triangle will have even fewer sampling points, and the quality of the triangle further decreased. The higher ratio will have a problem of interpreting a sine wave using a non-ideal triangle input wave. The actual DDS output performance for higher  $F_{out}$ :  $F_{clk}$  ratio is limited due to the few sampling points available. In such a case, the output signal can have still mathematically correct frequency, but its SFDR is poor. TSC integrated within a DDS cannot yield good sine wave output for higher  $F_{out}$ :  $F_{clk}$ ratio, even though the TSC operates up to 24 GHz of ideal input triangle. Therefore, it is reasonable to put LPF higher than 5 GHz, if the DDS aims to get a good SFDR (sinewave) signal, not only the signal with the right frequency. It suggests that the LPF

can be placed at 8 GHz instead of 10 GHz, thus suppressing undesired harmonic contents. It helps to improve SFDR for the output frequency that has second or third harmonics beyond 8 GHz. Note that all DDS papers [52],[43],[3], who claim  $F_{out}$  :  $F_{clk}$  ratio up to 1:2, their maximum output signal are a distorted version of the sinewave, and thus the SFDR in such case is always worse than 30 dBc. From a practical point of view, this  $F_{out}$  :  $F_{clk}$  ratio limitation is independent of the technology used in DDS. To push the maximum frequency of the DDS sinewave output, increasing the clock speed of the DDS is necessary (but not sufficient).

#### 6.8 TSC temperature dependency



Figure 6.14: Conventional bandgap reference in current source with the first order temperate compensation.

The nonlinear transfer function of the fabricated TSC circuits greatly depends upon the variation of the process, voltage, and temperature (PVT). The process variation precautions are taken in real layout, e.g by placing a dummy cell around critical components, by using a redundant resistor (using two 2R resistors in parallel, instead of using single resistor R), using emitter degenerated current mirror for current source etc. The fluctuations of the input excitation voltages are simulated and limitations are already discussed. Finally, the quality of the TSC sine wave output also depends upon the strong temperature-dependent base-emitter voltage  $V_{BE}$  (see Eqn. 6.2). It might affect the performance of the TSC for the application that operates in the high-temperature environment. For instance, the temperature range of operation for a military application is -55 °C to +125 °C, for an industrial application is -40 °C to +85 °C and for a commercial application is 0 °C to +70 °C [131]. When there are unequal current sources for two bipolar transistors, the difference between their base-emitter voltages is directly proportional to the absolute temperature. One of the techniques to reduce temperature influence is to use bandgap reference as explained in [132], [133]. In such a case, the delta  $V_{BE}$  exhibits a positive temperature coefficient (TC) [134]. The first order temperature compensation is accomplished as shown in Fig. 6.14 and represented by the following equation [135]:

$$V_{\rm ref} = V_{\rm BE} + \gamma \times V_{\rm T} \tag{6.25}$$

where  $V_{ref}$  is the reference voltage and  $\gamma$  is the scaling factor for optimizing the compensation.  $V_{BE}$  has a negative temperature coefficient and  $V_T = KT/q$  has a positive temperature coefficient. The voltage offsets can be generated using emitter area ratio techniques in a monolithic implementation to achieve the temperature-dependent voltage offset [136]. The value of  $\Delta V_{BE}$  is given by:

$$\Delta V_{BE} = \ln \left(\frac{A1}{A0}\right) \times V_{T} \tag{6.26}$$

where A1 and A0 are the emitter-area of the transistor used in current source. Thus Eqn.6.25 can be written as:



$$V_{\rm ref} = V_{\rm BE} + \ln\left(\frac{A1}{A0}\right) \times V_{\rm T} \tag{6.27}$$

Figure 6.15: SFDR of the TSC output at various temperature at 5 MHz (left) and 5 GHz (right).

Note that the current source used in TSC has transistor emitter-area ratio A1 : A0 of 1:1. Therefore the reference input current  $(I_{in})$  to output current  $(I_{out})$  is also 1:1. The  $\Delta V_{BE}$  contribution due to temperature is zero. This can be viewed as both good and bad remark. The good remark is that there is no influence of temperature in a variation of  $\Delta V_{BE}$  due to transistor area scaling. The bad remark is that  $V_{BE}$  itself has strong temperature dependence and first-order compensation could have been used according to Eqn. 6.25 if transistor sizes in the current source were different. This is because  $V_{BE}$  and  $\Delta V_{BE}$ have opposite temperature coefficient sign. Therefore temperature could be compensated by introducing an appropriate resistor RC equivalent to  $\Delta V_{BE}$  in the circuit as shown in Fig.6.14. In reality,  $V_{BE}$  contains even higher-order temperature-dependent terms. For example, the electron mobility increases with increase in temperature in the low-temperature region due to impurity scattering. On the other hand, in a high-temperature region, the mobility decrease as the temperature increases due to phonon scattering. One way to cancel the higher-order temperature-dependent terms of  $V_{BE}$  is the manipulation in a beta of the transistor which has strong non-linear temperature coefficient [135]. However, this is not a scope of this thesis but is a reference for the future temperature resilient TSC design.

Without using remedies mentioned above, the emitter degenerated currents source are used in TSC. To see the influence of the temperature variation in the commercial application environment, the TSC is simulated from 0 °C to +70 °C ambient temperature. The SFDR of the TSC outputs (at 5MHz and 5 GHz) for various temperatures are plotted in Fig.6.15. It shows that the within two upper and lower extreme of temperature, the TSC performance is affected by less than two dBc for optimum input excitation of 225 mV.

#### 6.9 Conclusions from chapter 6

In this section, 25 GHz bandwidth TSC designed having the SFDR better than 30 dBc, and input excitation range from 0.22V to 0.27 V. The size of the TSC is less than 0.1 mm  $\times$  0.1 mm, and power consumption is less than 50 mW. Finally, this TSC is integrated with accumulator and DAC to form a complete DDS.

# Chapter 7 DDS simulation and characterization

The individual DDS block designs and simulation results are presented in the previous chapter. Simulations results showed that each block performs with specified performance parameters. The complete DDS should be simulated to analyze the DDS output behaviours. The loading effect between each block is already considered, therefore buffer/drivers are added at the output of each block. Only after the complete DDS simulation and verification, it will be ready for final layout and fabrication. In this chapter DDS simulation, PCB design DDS prototype, and DDS measurement will be discussed. Finally, this DDS is adopted in a 5-10 GHz Doppler radar for a demonstration of one of its applications.

#### 7.1 DDS simulation



Figure 7.1: DDS block diagram and simulation set up.

In this section, a complete DDS is verified by simulation. It ensures the final DDS chip for fabrication ready. The DDS simulation set is presented in Fig.7.1. It consists of 12 input FCW bits. The highest bit is already grounded and remains of low value. Remaining 11 control bits can be turned high or low value. The high value could be any value greater than reference 2.8 V and low value is any value (even ground) lower than 2.8 V. The FCW inputs goes to the base of the differential transistor of the adder circuit. Random high and low value might affect the biasing of the adder circuit. Therefore the high and low values are provided by supplying 3V and 2.6 V respectively. The  $F_{out}$  ranges from 10 MHz to 10 GHz with 2048 different frequency points for each  $F_{clk}$ . The  $F_{clk}$  can be any frequency from 10 MHz to 20 GHz. Therefore it is not possible to test all clock frequencies and their frequency points combinations.

The output frequency to the clock frequency ratio  $(F_{out} : F_{clk})$  is related to the resolution of the accumulator output and consequently the sampling points for the digital to analog triangle transformation. The relationship between such ratio and sampling points, their influence in overall DDS output quality described by using the analytical graph as in Fig. 7.2. It shows the DDS output frequency range and the corresponding number of sampling points. This sampling point number represents the number of discrete amplitudes available for a one complete (one period) digital triangle representation. The horizontal axis depicts frequency that includes the minimum/maximum DDS output frequencies, other output frequencies, and clock frequency. The vertical axis shows the number of sampling points available at the accumulator output for DDS sinewave output synthesis. Note that number of sampling points is not the only factor that determines the quality of DDS output. But this introduces the SFDR limitation for higher output frequency. Other contributing factors (biasing, dynamic load, gain over frequency are considered to be constant). Since the DAC influence, TSC limitations, and layout on SFDR performance are the same for our DDS and independent of FCW inputs. Next, all major sets of  $F_{out}$ :  $F_{clk}$ ratio cases are described.



Figure 7.2: DDS output frequency range and sampling points.

**Case I:**  $(1:4096 < F_{out} : F_{clk} < 1:128)$ 

The  $F_{out}$ :  $F_{clk}$  ratio is set by the FCW for a given clock. When FCW increases the  $F_{out}$ :  $F_{clk}$  ratio also increases. All  $2^{K+1}$  sampling points are available for the  $1: 2^N < F_{out}: F_{clk} < 1: 2^{K+1}$ (N is the accumulator total bits = 12, k is the accumulator output bits to DAC = 6). Therefore for  $1: 4096 < F_{out}: F_{clk} < 1: 128$ , each accumulator output has maximum available 128 sampling points. It accounts for the maximum SFDR of 38 dBc (due to accumulator). The  $\mathbf{F_{res}}$  is the maximum output frequency which has all 128 sampling points.

Case II:  $(1: 128 < F_{out} : F_{clk} < 1: 2)$ 

For  $1: 128 < F_{out} : F_{clk} < 1: 2$ , as the  $F_{out} : F_{clk}$  ratio increases the number of sampling points also decreased. Increasing sampling points means the smoother analog triangle, thus lesser spurs created by the DAC and TSC transformations, in turn, better SFDR of DDS output as well. Decreasing sampling points has the opposite effect.

Case III:  $(1: 2 < F_{out} : F_{clk} < 1: 4)$ 

The  $1: 2 < F_{out} : F_{clk} < 1: 4$  range is the critical case. Because the SFDR of the output frequency is poor due to less than four sampling points available. It resembles a poor digital triangle (generate by accumulator) and analog triangle.

Case IV:  $(F_{out} : F_{clk} \neq 1 : 2^{I})$ 

In the Fig.7.2, the  $F_{out}$ :  $F_{clk}$  ratios with only multiple of  $1: 2^{I}$  (where  $1 \leq I \leq 11$ ) are shown. Such a case has an advantage in sampling point distribution for triangle wave. It allows the sampling points to be placed equal to the rising and falling slope of the triangle. The FCW of non-multiple of 2 is less problematic for lower  $F_{out}$ :  $F_{clk}$  ratio ( $F_{out}$ :  $F_{clk} \leq 1:4$ . This is because of the large number of sampling is used, it decreases the effect of non-symmetric rising and falling slope of the digital triangle. This will be explained shortly in the DDS simulation results.

Finally, due to the limitation of a larger number of frequency outputs, only certain selected simulation sets have presented that cover all major cases as presented in Fig.7.2.

#### 7.2 DDS static simulation

Each FCW bits and clock can be changed as fast as the settling time of the DDS, i.e. 0.3 ns. It means the switching speed of the DDS is 3.33 GHz. However, it will be shown in a ramp simulation. Here static simulation refers to the DDS simulation for a fixed clock and FCW. The overview of the static simulation examples is presented in Table7.1. At first, the clock of 10 GHz used for five independent simulations with various FCWs. These FCWs are related to  $F_{out}$ :  $F_{clk}$  ratio of 1:3, 1:4, 1:8, 10.67, and 1:128, representing most of the cases described above. Even higher clock frequencies such as 14 GHz, 17 GHz

| $\mathbf{F_{clk}}$ | FCW             | $\mathbf{F}_{\mathbf{out}}$ | <b>F</b>                    | SFDR  |
|--------------------|-----------------|-----------------------------|-----------------------------|-------|
| (GHz)              | (in binary)     | (GHz)                       | <b>F</b> out · <b>F</b> clk | (dBc) |
| 10                 | 0100 0000 0000  | 2.5                         | 1:4                         | 24    |
| 10                 | 0010 0000 0000  | 1.25                        | 1:8                         | 29    |
| 10                 | 0101 0101 0101  | 3.33                        | 1:3                         | 13    |
| 10                 | 0001 1000 0000  | 0.93                        | 1:10.67                     | 31    |
| 10                 | 0000 0010 0000  | 0.78                        | 1:128                       | 37    |
| 14                 | 0100 0000 0000  | 3.5                         | 1:4                         | 38    |
| 17                 | 0111 1111 1111  | 8.5                         | 1:2                         | 30    |
| 20                 | 0000 1010 0000  | 0.78                        | 1:25.6                      | 34    |
| 20                 | 0111 1111 1111  | 10                          | 1:2                         | 39    |
| 5                  | 0111 1111 1111  | 2.5                         | 1:2                         | 15    |
| 1 to 20            | 0111 1111 1111  | 0.5 to $10$                 | 1:2                         | >15   |
| 20                 | 0001 to $01111$ | 0.5 to $10$                 | -                           | >24   |

Table 7.1: Various static DDS simulations

and 20 GHz tested. The DDS is a simulation for output frequencies that are half of the clock frequency. The clock covers the range from 1 GHz to 20 GHz. Finally, at 20 GHz clock, the FCW is swept from 0000 0000 0001 to 0111 1111 1111, and their corresponding output frequencies plotted. In all cases, a DC block used at the output.

#### DDS simulation: $F_{clk} = 10$ GHz, $F_{out} = 2.5$ GHz

For simplified simulation, the clock frequency of 10 GHz is provided to the accumulator, and FCWs are kept as 0100 0000 0000. It means FCWs and 10 GHz clock are fixed for the fixed DDS set up. The first MSB resembles FCW11 =0 (low), second bit is FCW10 =1 (high), third bit is FCW9 = 0 (low), and so on until FCW0 = 0 (low). The FCW for this set up is considered 0100 0000 0000. For the calculation, the FCW is expressed in a decimal point which is 1024. The expected output frequency is given by:

$$F_{out} = \frac{FCW}{4096} \times 10 \text{ GHz} = \frac{1024}{4096} \times 10 \text{ GHz} = 2.5 \text{ GHz}.$$
 (7.1)

Note that the frequency of the digital triangle from the accumulator, analog triangle from the DAC and final TSC/DDS output is the same. The entire accumulator outputs (Accu 5 to Accu0), DAC output, and TSC/DDS output are presented in Fig.7.3. The time-period of the 2.5 GHz signal is 0.4ns. Therefore for one period cycle (0.425 ns to 0.825ns), if six accumulator outputs are summed together in the time domain, it results in digital triangle wave. This information is interpreted by the DAC and DAC converts it into the analog triangle. Finally, the analog triangle is fed into TSC that translates it into the sinewave Fig.7.3(bottom). All individual simulation steps are already discussed in the respective section but here it is presented to observe all in one picture.



Figure 7.3: Accumulator, DAC, and 2.5 GHz DDS output with  $F_{clk} = 10$  GHz, FCW = 0100 0000 0000



Figure 7.4: 2.5 GHz DDS output spectrum where DDS is operated with  $F_{clk} = 10$  GHz and FCW = 0100 0000 0000. The SFDR of 2.5 GHz DDS output is 24 dBc.



Figure 7.5: 1.25 GHz DDS output spectrum where DDS is operated with  $F_{clk} = 10$  GHz and FCW = 0010 0000 0000. The SFDR of 1.25 GHz DDS output is 30 dBc.

#### DDS simulation: $F_{clk} = 10$ GHz, $F_{out} = 1.25$ GHz

Similarly, the DDS is simulated with the clock frequency of 10 GHz and FCWs of 0010 0000 000. The expected output frequency is given by:

$$F_{out} = \frac{FCW}{4096} \times 10 \text{ GHz} = \frac{512}{4096} \times 10 \text{ GHz} = 1.25 \text{ GHz}.$$
 (7.2)

The spectrum of the 1.25 GHz DDS output is shown in Fig.7.5. The frequency of the digital triangle from the accumulator, analog triangle from the DAC and final TSC/DDS output is also the same. All accumulator outputs, DAC, and TSC outputs are presented in Fig.7.6.



Figure 7.6: Accumulator, DAC, and 1.25 GHz DDS output with  $F_{clk} = 10$  GHz, FCW = 0100 0000 0000.

**DDS simulation:**  $\mathbf{F}_{out} : \mathbf{F}_{clk} \neq \mathbf{1} : \mathbf{2}^{I}$ 



Figure 7.7: DAC and DDS results for  $F_{clk} = 10$  GHz, FCW = 0101 0101 0101



Figure 7.8: 3.33 GHz DDS output spectrum where DDS is operated with  $F_{clk} = 10$  GHz and FCW = 0101 0101 0101. The SFDR of 31 dBc is obtained.

The FCW in decimal (1365) is non-multiple of  $2^{I}$ , where I is 1 to 11. The DDS is simulated for  $F_{clk} = 10$  GHz and FCW = 0101 0101 010 (1365)1 resulting  $F_{out}$  of 3.33 GHz. The  $F_{out} : F_{clk}$  ratio is  $\approx 1:3$  ( $F_{out} : F_{clk} < 1:4$ ). The simulation result can be seen in the Fig.7.7. The DAC output shows that it has in sufficient resolution to construct the triangle wave from the accumulator. The final TSC/DDS output is also clipped version of sine wave (poor SFDR of 15 dBc), despite it is still maintaining a correct fundamental frequency of 3.33 GHz (see Fig. 7.8).

Another example, with  $F_{clk} = 10$  GHz, FCW = 0001 1000 000 (384),  $F_{out} = 0.935$  GHz. FCW = 0001 1000 0000 (384), is also non-multiple of  $2^{I}$  but the  $F_{out}$ :  $F_{clk}$  ratio is



Figure 7.9: DAC and DDS results for  $F_{clk} = 10$  GHz, FCW = 0001 1000 0000

 $\approx$  1:11 (fulfilling the condition  $F_{out}$ :  $F_{clk} <$  1:4). The simulated DAC and DDS outputs are shown in Fig.7.9. This DDS output appear more closer to the sinewae compared to the DDS output in Fig.7.7. The SFDR (31 dBc) of the 0.935 GHz DDS output is shown in Fig.7.10

DDS simulation:  $F_{clk} = 10$  GHz,  $F_{out} = 78.125$  MHz



Figure 7.11: DAC and DDS results for  $F_{clk} = 10$  GHz, FCW = 0000 0010 0000,  $F_{out} = 78.125$  MHz

This simulation is chosen for the highest DDS output frequency ( $\mathbf{F}_{res}$ ) for a given clock that has all 128 sampling points. For 10 GHz clock, the highest frequency with maximum resolution in its digital triangle wave occurs at  $F_{res} = \frac{F_{clk}}{27} = \frac{10\text{GHz}}{128} = 78.125$  MHz output. Therefore the DDS is simulated with  $F_{clk} = 10$  GHz, FCW = 0000 0010 0000.



Figure 7.10: 0.935 GHz DDS output spectrum where DDS is operated with  $F_{clk} = 10$  GHz and FCW = 0001 1000 0000. The SFDR of 31 dBc is obtained.



Figure 7.12: 78.12 MHz DDS output spectrum with  $F_{clk} = 10$  GHz and FCW = 0000 0010 0000.

The DAC output & DDS outputs are plotted in Fig.7.11. Note the DAC has maximum resolution since all six bits from the accumulator has phase changing information. The corresponding simulated TSC/DDS sinewave has 37.7 dBc of SFDR (see Fig.7.12). It is close to the theoretical value for the maximum achievable SFDR (38 dBc) when all sampling points are available for the sinewave construction.

#### DDS simulation: $F_{clk} = 14$ GHz, $F_{out} = 3.5$ GHz

The clock frequency is increased to 14 GHz and FCW of 0100 0000 0000 (1024) is applied. The expected output frequency is 3.5 GHz. The sinewave output and its spectrum

are plotted in Fig.7.13. The SFDR (37 dBc) of the DDS output at 3.5 GHz is limited by the second harmonics at 7 GHz.



Figure 7.13: Simulated 3.5 GHz DDS output with the clock frequency of 14 GHz and FCW=0100000000000: sine wave (left) and its spectrum (right). The SFDR of 37 dBc is obtained.

#### DDS simulation: $F_{clk} = 17$ GHz, $F_{out} = 8.5$ GHz

Now the DDS clock is increased to 17 GHz and FCW of 0111 1111 1111(2047) is applied. The 8.5 GHz sinewave output and its spectrum are plotted in Fig.7.14. It shows the SFDR of 37 dBc.



Figure 7.14: Simulated 8.5 GHz DDS output with the clock frequency of 14 GHz and FCW= 0111 1111 1111: sine wave (left) and its spectrum (right). The SFDR of 37 dBc is obtained.

DDS simulation:  $F_{clk} = 20$  GHz,  $F_{out} = 0.78$  GHz

Finally, the clock increased to 20 GHz that is the highest possible clock of this DDS. The FCW of 0000 1010 0000 (160) is applied and the output frequency is expected to be 0.78 GHz (see Fig. 7.15). It has SFDR of 34.4 dBc as shown in Fig. 7.16.



Figure 7.15: DAC and DDS results for  $F_{clk} = 20$  GHz, FCW = 0000 1010 0000,  $F_{out} = 0.781$  GHz.



Figure 7.16: Spectrum of 0.78 GHz DDS output with  $F_{clk} = 20$  GHz, FCW = 0000 1010 0000. The SFDR of 34.4 dBc is obtained.



Figure 7.17: 10 GHz DDS output with  $F_{clk} = 20$  GHz, FCW = 0111 1111 1111 (left) and 2.5 GHz DDS output with  $F_{clk} = 5$  GHz, FCW = 0111 1111 1111. In both cases, all the accumulator outputs and DAC outputs are also shown.



Figure 7.18: Spectrum of 10 GHz and 2.5 GHz DDS outputs for 20 GHz and 5 GHz clocks. The former shows 39 dBc of SFDR while the later one has poor SFDR of 15 dBc.



Figure 7.19: Simulated SFDR and output power versus clock frequency. The result shows that the DDS can give atleast 25 dBc of SFDR over entire 10 GHz bandwidth.

Interestingly, there is a special case for a higher clock. For higher clock frequency ( $F_{clk} > 7 \text{ GHz}$ ), when the output frequency is exactly half of the clock ( $F_{out} : F_{clk} = 1:2$ ), DDS output has better SFDR than  $1:2 < F_{out} : F_{clk} < 1:4$ . The ratio  $F_{out} : F_{clk} = 1:2$  has only two sampling points while  $1:2 < F_{out} : F_{clk} < 1:4$  has more than 2 to less than 4 sampling points. In principle, higher sampling points should give a better signal interpretation. However, the sum of outputs of the accumulator at  $F_{out} : F_{clk} = 1:2$  behaves like a triangle. It is because for a higher frequency, the switching delay of the accumulator output is comparable to the pulse period of the accumulator output. This rising and falling time of accumulator outputs creates an increasing slope and decreasing slope. Therefore summation of all outputs results into the triangle. For example at 20 GHz clock and FCW = 0111 1111 1111 (see Fig.7.17(left)), the time period and delay of the

fastest-changing accumulator output are  $\approx 50$  ps and  $\approx 20$  ps respectively. The expected output frequency is always maintained as half of the clock given that the accumulator functions correctly. The DAC introduces a further 20 ps delay during rising and falling signal. Therefore, the DAC interprets summation of accumulator outputs as a triangle of the right frequency. Another hand, this special case is not true for lower clock frequency (< 7 GHz).

For lower frequencies, the pulse width of the fastest-changing accumulator output is much larger than its rise and fall time. For example, at 5 GHz clock, its time period of the fastest-changing accumulator output pulse (200ps) is much larger than the accumulator gate delay (20ps)(see Fig.7.17(right)). Therefore the summation of accumulator output does not appear like a triangle but rather square. The comparison between SFDR of DDS outputs for 5 GHz and 20 GHz clock is presented in Fig.7.18. Finally, the DDS is simulated for a fixed FCW of 0111 1111 111, and the clock is varied from 10 MHz to 20 GHz (see Fig.7.19). Note that the higher clocks, especially greater than 7 GHz have SFDR at least 30 dBc. The lower clocks have worse SFDR as explained before.

#### DDS SFDR & output power versus F<sub>clk</sub>

The DDS is simulated for a fixed clock of 20 GHz and FCW are applied such a way that the output frequency ranges from 5 MHz to 10 GHz. Fig. 7.20 shows the SFDR of the DDS output over frequency at 20 GHz clock frequency.



Figure 7.20: Simulated SFDR output power versus DDS output frequency at 20 GHz clock. The result shows that the DDS can give atleast 28 dBc of SFDR and atleast -14 dBm of output power for output frequency from 5 MHz to 5 GHz.

#### 7.3 DDS ramp simulation

Linear and quadratic frequency ramps can be generated by applying sequentially in time different FCW values. The FCW can be changed as fast as 300ps (DDS settling time). For a ramp with 100 frequency points, and ideally, linear frequency ramp can be generated with a pulse repetition period (PRP) down to 30 ns (pulse repetition frequency PRF = 33.33 MHz). To see the proof of the linear ramp, the DDS is simulated to generate the ramp output (see Fig.7.21). The 20 GHz clock frequency is chosen, and transient simulation is performed. As the lower frequency output has relatively large time period, e.g. four samples of 50 MHz frequency needs at least  $4(1/50 \times 10^6 + \text{DDS} \text{ settling time}) = 4 \times (20 \text{ ns} + 0.3 \text{ ns}) = 81.2 \text{ ns}$ . It implies generating ramps frequencies of 50 MHz, 100 MHz, 150 MHz, and 200 MHz, would take more than 168 ns of transient simulation time. It is not feasible due to limited simulation environment. Using the fastest available computer processing for the simulation, even 100 ns of DDS transient simulation takes almost 60 hours of simulation in real-time. Therefore highest clock frequency is chosen, and only four output frequencies are shown ranging from 1 GHz to 4.75 GHz, and each frequency has at least four samples. And it can be completed within 14 ns simulation. In real-time, this takes 4-5 hours to complete the simulation.

Table 7.2: DDS ramp output calculation

| FCW (in binary) | FCW (in decimal) | $\mathbf{F_{out}}$ (GHz)      |
|-----------------|------------------|-------------------------------|
| 0000 1100 1101  | 205              | $(205/4096) \times 20 = 1$    |
| 0001 1100 1101  | 461              | $(261/4096) \times 20 = 2.25$ |
| 0010 1100 1101  | 717              | $(717/4096) \times 20 = 3.5$  |
| 0011 1100 1101  | 973              | $(973/4096) \times 20 = 4.75$ |



Figure 7.21: Simulated results from the DDS: linear ramp (left), chirp signal time domain view (right).

Similarly, nonlinear frequency ramps are generated by applying sequentially in time different FCW values. The output frequency is plotted versus time and as a chirp signal in the time domain in Fig.7.22. The time-domain simulation covers more the one repletion period. Next cycle of chirp starts at 50 ns. These examples demonstrate that the DDS can be used to generate different ramp functions.



Figure 7.22: Simulated results from the DDS: nonlinear ramp (left), time domain view (right).

#### 7.4 DDS phase control word simulation



Figure 7.23: PCW changing the phase of the DDS output at 239 MHz by 180°. The PCW value is changed from 000 000 to 111 111 at time instance 8 ns.

The phase control word changes the phase of the DDS output instantaneously from 0° to 180° by hanging the PCW from 000 000 to 111 111. The novel feature of the DDS presented in this paper is the implementation of a direct digital phase modulation capability using the phase control word (PCW). No such phase control unit is applied in high-speed DDS circuits to the best of author knowledge. The phase of the output sine wave can be changed from 0° to 180° with a resolution of  $< 2^{\circ}$  using a 6-bit PCW. To verify the operation of the phase control word, Fig.7.23 provides a simulation of a signal at 239 MHz with a phase shift between 0° to 180°, as is common in BPSK signals. Again, the phase of the 625 MHz sinewaves is changed by the 90° by changing the CW from 000 000 to 1000 000 at time instance 5.25 ns (see Fig. 7.24). The means both the frequency

and phase of the output signal of the DDS can be controlled independently by FCWs and PCWs.



Figure 7.24: Changing the phase of DDS output at 625 MHz by 90°.

#### 7.5 DDS characterization

In this section the fabricated DDS MMIC is characterized. Various measurement are performed to test the features of the DDS.

#### 7.5.1 Test board design

A die photograph of DDS MMIC has an active area of 1.6 mm  $\times$  1.6 mm and a total die area of 1.8 mm  $\times$  2 mm (see Fig.7.25) including input/output pads. The DDS chip has been gold (Au) wire-bonded with a maximum wire length of 0.3 mm. The fabricated DDS has to be tested for the verification. One of the possibilities is to measure it on a wafer; the facility is not feasible due to a large number of probe requirements. Therefore an RF printed circuit board (PCB) is designed that enables to check the basic DDS functionality and performances. The following are the specification for three-layer PCB:

- Dimension :  $118.725 \text{ mm} (X) \times 100.000 \text{ mm} (Y)$
- Substrate : Rogers (RO4350B)
- Dielectric constant : 3.66
- Substrate thickness : 0.762 mm (both substrates)
- Minimum trace width : 0.1 mm
- Copper thickness : 0.035 mm (in all layers)
- Layer 1 (Top): Mostly Signal (+ ground)
- Layer 2 (Middle): Signal bridge
- Layer 3 (Bottom): Ground only
- Total height :  $(2 \times 0.762 + 3 \times 0.035) = 1.629 \text{ mm}$



Figure 7.25: MMIC of the fabricated DDS (size  $2.2 \text{ mm} \times 1.8 \text{ mm}$ ).

The most critical part of wire bonding was the differential clock signal and output lines, which are kept as short as possible to lower the parasitic inductance of the bond wires. The DDS is mounted onto test PCB board using double-layer commercial Rogers RO4350B laminate board having a loss tangent of 0.0037 and very good thermal stability [137]. The bottom of the MMIC is directly attached to Au plated heat spreader on the PCB and distributed to the backside heat sink. It avoids any external fan to cool down the DDS chip. The DDS MMIC in PCB is mounted as in Fig.7.26.

Each block has its optimum power supply requirements. All these different power supplies needed for the DDS are provided by a low noise high PSSR (power supply rejection ratio) voltage regulator IC (HMC1060LP3E). Therefore, as shown in Fig.7.26, the entire DDS test setup is powered by the single 9-V supply line. It will be distributed to various supply lines to power respective blocks. To suppress spikes and noise generation produced by the switch, a simple spike suppression circuit is used. For testing purpose, frequency control words (FCWs: F0 to F10) and phase control words (PCWs: P0 to P5) are given by the mechanical DPDT (double-pole-double-through) switches that are connected to either 3V or 2.6 V lines.



Figure 7.26: DDS printed circuit test board on Rogers4300B substrate (including a wire-bonded DDS MMIC)



Figure 7.27: DDS measurement setup in the lab

### 7.5.2 DDS measurements

Fig.7.27 shows the measurement setup. The clock signal generated from the clock generator (APSYN420) with an output power level of 8 dBm and SFDR of 30 dBc. The balun HL9402 feeds the differential inputs of the DDS. The balun has a good amplitude match ( $\pm$  0.1 dB) and phase balance ( $\pm$  2-3 degrees). The DC block restricts any DC signal reaching to RF input of the DDS. The optimum input power level of DDS is -4 dBm that is achieved by attenuator combined with the conversion loss of balun. The measured data briefly compared to simulated results as well. It is important to note simulated results do not include the losses on PCB, connectors, and cables. In other words, measured results are not calibrated with external losses. It is the main reason behind the decreased output power in measurements compared simulations. It suggests that the calibration should be done in future measurement campaign. Nevertheless, the decreased output power could also stem from the circuits itself, especially from the TSC. But this is not possible to measure separately. Also, the input power.



Measurement:  $F_{clk} = 10$  GHz, FCW = 0100 0000 0000,  $F_{out} = 2.5$  GHz

Figure 7.28: Measured result: 2.5 GHz DDS output with  $F_{clk} = 10$  GHz and FCW = 0100 0000 0000. SFDR and output power are 26 dBc and -15 dBm respectively.

The DDS is measured at  $F_{clk}$  of 10 GHz and FCW of 0100 0000 0000 (1024). The DDS sinewave and its spectrum are plotted in Fig.7.28. The SFDR of 25 dBc and output power of -15 dBm is achieved. The simulated SFDR was also 25 dBc. However, the output power is decreased by 10 dBm as compared to the simulated result.

The DDS is measured at 10 GHz clock for various FCW input (see Fig.7.29). For FCW =0010 0101 1010 (602), the 1.47 GHz DDS output has SFDR of 22 dBc and output power of -13 dBm. For FCW =0010 0101 1010 (913), the 2.23 GHz DDS output shows SFDR of 24 dBc and output power of -14 dBm. Likewise, at FCW = 0100 1101 1001 (1229), the 3 GHz DDS output has only SFDR of 16 dBc and output power of -15 dBm. Finally, the FCW of 0101 0000 0000 (1310) is applied that results in the DDS output frequency of 3.2 GHz. It shows SFDR of 17 dBc and output power of -16.5 dBm. These results show that the better SFDR is obtained when the  $F_{out}$  :  $F_{clk}$  ratio or FCW value gets lower. Nevertheless, the DDS has still produced the correct fundamental frequency in all cases.

#### Measurement: $F_{clk} = 14 \text{ GHz}$ , $FCW = 0100 \ 0000 \ 0000$ , $F_{out} = 3.5 \text{ GHz}$

The DDS is measured again at  $F_{clk}$  of 14 GHz and FCW of 0100 0000 0000 (1024). The DDS sinewave and its spectrum are plotted in Fig.7.30. SFDR of 31 dBc is obtained that is 4 dBc less than the simulated result. The output power is -16 dBm (13 dBm less compared to simulation result). The SFDR is 4 dBc worse as compared to the simulated result of the same setup.



Figure 7.29: Measured result: DDS outputs at 10 GHz clock for various FCW inputs.



Figure 7.30: Measured result: 3.5 GHz DDS output with  $F_{clk} = 14$  GHz and FCW = 0100 0000 0000. SFDR and output power are 31 dBc and -18 dBm respectively.



Figure 7.31: Measured result: 8.5 GHz DDS output with  $F_{clk} = 17$  GHz and FCW = 0111 1111 1111. SFDR and output power are 31 dBc and -25 dBm respectively.

#### Measurement: $F_{clk} = 17$ GHz, FCW = 0111 1111 1111, $F_{out} = 8.5$ GHz

The DDS is measured again at  $F_{clk}$  of 17 GHz and FCW of 0111 1111 1111 (2047). The DDS sinewave and its spectrum are plotted in Fig.7.31. The SFDR of 31 dBc and output power of -25 dBm is achieved. The SFDR and output power are 6 dBc and 15dBm less respectively as compared to simulated result for the same setup.

Measurement: SFDR & output power versus F<sub>clk</sub>



Figure 7.32: Simulated and measured result comparison: SFDR and output power of nyquist DDS output corresponding to the clock frequency



Figure 7.33: Measured result: SFDR and output power of nyquist DDS output corresponding to the clock frequency

The DDS is measured for half of the clock outputs various frequency to check the maximum clock frequency of the DDS (see Fig.7.32). The maximum clock of 18.5 GHz is obtained against the designed maximum 20 GHz clock frequency. The clock above 18.5 GHz is no more stable. There are several reasons behind this, most significant are the locking of the digital phase accumulator fails after 18.5 GHz. The PCB itself might not be efficiently working at 20 GHz. The measured SFDR and output power compared to simulated results (Fig.7.20) are lower than simulated results (Fig.7.32) but are comparable and nicely follows the patterns.

#### Measurement: SFDR & output power versus F<sub>out</sub>

The maximum DDS clock frequency is 18.5 GHz after which it no longer produces a better sine output due to bandwidth limitation in analog circuits and internal timing difficulty for digital circuits. The output frequencies versus the maximum clock are shown in Fig.7.33. Up to 6 GHz, the SFDR of the DDS outputs is at least 15 dBc. After 7 GHz, the output is unstable. The output power is ever decreasing as the frequency increases. The SFDR of the DDS output up to 2.5 GHz is at least 21 dBc. Both measured SFDR and output power follows the same pattern as simulated results (see Fig.7.20).

#### Measurement: phase noise

The phase noise of the DDS output frequency at 3 GHz and 4 GHz are measured (see Fig. 7.34 where the clock is 12 GHz). In both cases, the phase noise at 1 kHz offset from the carrier is better than -115 dBc/Hz. Even beyond 1 kHz offset (up to measured 1 MHz), the phase noise of -115 dBc/Hz is maintained. It is one of the vital characteristics of the DDS output since it can produce, with any additional filtering, clean closely separated



Figure 7.34: Measured result: phase noise of 3 GHz and 4 GHz DDS outputs with a clock of 12 GHz

frequency points. It is first the phase noise measurement of high-speed (>10 GHz clock) DDS output so far in any technology.

#### **Measurement:** frequency resolution



Figure 7.35: Measured result: changing two least significant bits to generate four different frequency points for 12 GHz clock.

The frequency resolution of the DDS is  $\frac{1}{4096^{\text{th}}}$  of the clock frequency. When the clock is 12 GHz, the frequency resolution is 3 MHz. To test this, four DDS output frequencies from 387 MHz to 396 MHz with a step of 3 MHz (separated by 1 decimal points of FCW value) are measured (see Fig. 7.35). The DDS can produce distinctively spaced signals in frequency domain even for the finest frequency resolution.

## 7.6 DDS based radar design and measurement

Radar systems are widely used in the automotive, biomedical, weather forecast, surveillance, and other commercial applications [138] [139]. It is less sensitive to environmental conditions such as heat, vapour, dust and visibility as compared to other types of sensor systems such as video, ultrasonic, infrared, and laser systems. The radar types in terms of signal generation and detection can be categorized as continuous wave (CW) and pulsed radar. The CW radar can be either frequency modulated or unmodulated, the former is called FMCW radar. The unmodulated CW radar transmits a single frequency towards a moving object and the reflected signal is captured and compared with the copy of the transmitted signal. Using a Doppler effect, the frequency of the received signal is changed by the moving object as compared to the originally transmitted signal. By analyzing the difference, the movement and the speed of the object can be found. The Doppler radar systems are being used to measure displacements in various applications, including heartbeat and respiration sensing [140], as well as mechanical vibration identification [141], [142]. In the context of structural monitoring the dynamic behaviour of a structure, e.g. a bridge or a wind turbine, can be characterized [142]. In this section, one of the applications of the DDS is demonstrated in a simple Doppler radar system.

#### 7.6.1 Design of Doppler radar system

Using a classical physics, when speeds of source and the receiver relative to the medium are lower than the velocity of waves in the medium, the relationship between observed frequency  $f_r$  and transmitted frequency  $f_t$  is given by [143]:

$$f_{\rm r} = \left(\frac{c \pm v_{\rm r}}{c \pm v_{\rm s}}\right) \times f_{\rm t} \tag{7.3}$$

Where c is the speed of the wave (light) in a medium,  $v_r$  is the speed of the receiver relative to the medium, and  $v_s$  is the speed of the source relative to the medium. The  $v_r$  is added to c when the receiver moving towards the source and subtracted to c when the receiver moving away from the source. Conversely, the  $v_s$  is added to c when the receiver away from the source and subtracted to c when the receiver towards the source.



Figure 7.36: Exemplary Doppler radar concept [19].

Using the above principle, the Doppler effect is derived. The Doppler effect is related to the frequency of the harmonic waves generated by a moving source with the frequency measured by an observer moving with a different velocity from that of the source [144]. For example (see Fig. 7.36), the emitted frequency  $f_t$  of a wave is transmitted from TX antenna to the moving target. The reflected frequency  $f_r$  of a wave is observed by the RX antenna of the source. The difference between the observed frequency  $f_r$  and the emitted frequency  $f_t$  of a wave due to moving target relative to the source of the waves is the Doppler effect. The  $f_r$  can be written as a function of the  $f_t$  for a target velocity v and speed of wave (light) c as:

$$f_{\rm r} = \left(\frac{1 + \frac{\rm v}{\rm c}}{1 - \frac{\rm v}{\rm c}}\right) \times f_{\rm t} \tag{7.4}$$

The Doppler frequency or beat frequency  $f_d$  is given by:

$$f_{\rm d} = f_{\rm r} - f_{\rm t} = 2 \times v \times \left(\frac{f_{\rm t}}{c - v}\right) \tag{7.5}$$

For  $c \gg v$ :

$$f_d = f_r - f_t = 2 \times v \times \left(\frac{f_t}{c}\right)$$
 (7.6)

Here Doppler radar system is at designed and implemented as a vibration detection sensor. Generally, the signal source of the radar realized by using a voltage control oscillator. The novelty of this radar system is the utilization of the DDS for direct vibration detection at a sub-hertz level in the 5-10 GHz frequency band. Thanks to the perfect linearity of the DDS no further signal processing is needed. In this chapter, our designed 18.5 GHz clock DDS is used as a signal generator (frequency source) of the radar system. The quality of the DDS output signals from 2.5 to 5 GHz is better in terms of higher SFDR, excellent phase noise, and high output power levels. Using a frequency doubler, the 5-10 GHz bandwidth is achieved. Excellent phase noise of the DDS output results in sub-hertz level detection of the Doppler frequency. For example, due to the excellent phase noise of the DDS, DDS sourced radar can be used in the human heartbeat and respiration detection which have a frequency of less than 1 Hz. In addition to this, radar systems with 5-10 GHz bandwidth are used for many biomedical applications such as non-contact respiration, breast cancer detection, and heartbeat detection [145], [146]. For instance, the heartbeat rate indicates the mental condition of a human and is related to the stress-index. An accurate and non-contact test with high precision is necessary for acute diagnosis and further treatment. In [147], the variability of the peak-to-peak interval of the heartbeats is detected using a Doppler radar-based spectrogram. The bandwidth, however, is limited to only a few hundred MHz. In contrast, this a radar has a wide bandwidth of 5 GHz using a DDS as a signal source. The unmodulated CW radar is demonstrated for multiple single frequencies within the 5-10 GHz bandwidth.

This radar combines DDS with various discrete commercial RF components such as gain amplifiers, low pass filters, high pass filters, attenuators, multipliers, mixers, low



Figure 7.37: Radar system block diagram used in ADS simulation.

noise amplifiers etc. The feature such as gain, return losses, maximum input/output power levels, harmonic suppressions is considered for link budget calculation. In the initial phase, the radar system simulation is performed with the block diagram as shown in Fig.7.37. The real measured DDS output signal has many spurs and output power of -18 dBm. In radar simulations, it is emulated by the VCO signal having the same output power and spurs level.

#### 6 GHz Doppler radar simulation

The DDS output of 3 GHz DDS is used to produce a 6 GHz radar signal (see Fig. 7.38(a)). The signal is filtered by the high pass filter at 2.5 GHz. Then the signal is amplified by AMP and filtered by the low pass filter at 5GHz. Finally using another amplifier, signal output power level of 11 dBm is achieved that is the optimum input power level for the following multiplier (see Fig. 7.38(b)). The multiplier doubles the frequency resulting in a 6 GHz signal as in Fig. 7.38(c). The high conversion loss of the multiplier is compensated by a power amplifier with 20 dB gain and 17 dBm output power. Then the signal is divided by the power splitter, feeding 14 dBm to the transmitting antenna and another 14 dBm to the LO port of the mixer. The optimum LO-power level at the mixer is 11 dBm achieved (see Fig. 7.38(d)), by placing 3 dB attenuator before the LO-mixer At the receiving end, the receiving antenna receives the reflected signal from the port. target. The low pass filter at 10 GHz eliminates the higher-order harmonic signals. For the target at 20 cm distance, the received signal from the receiving antenna assumed to be in the order of -25 dBm with a Doppler shift of 10 Hz. The low noise amplifier with 15 dB gain and NF=2 dB noise figure amplifies the received signal with the lowest additional noise possible. The RF input power of -10 dBm (see Fig. 7.38(e)) mixes with the LO input power of 11 dBm in the mixer. Let's assume,  $F_{rf}$ ,  $F_{lo}$ , and  $F_{if}$  are RF input frequency, LO input frequency, and intermediate frequency, respectively, then the (using the Eqn. 7.6) down-converted signal from the mixer is  $F_{if} = F_{lo} - F_{rf}$ . The down-converted IFfrequency of 10 Hz is shown in Fig.7.38 (f). The 10 Hz frequency is exactly the same frequency difference between the provided frequency at LO and frequency at RF port. It suggests that the radar block is functioning as expected.



Figure 7.38: ADS simulation result at different stages of the radar.

#### 7.6.2 5-10 GHz Doppler radar system measurement

The speaker is placed in front of the antenna as shown in the Fig.7.39. The DDS based CW Doppler radar is realized (see Fig.7.40). The radiating signal frequency of the transmitter (TX) antenna can be varied from 5-10 GHz manually by changing the FCW in the DDS board.

#### 6 GHz Doppler radar measurement

Initially, a 6 GHz signal frequency from the radar, named as 6 GHz Doppler radar, is used to perform the continuous-wave Doppler radar testing. The clock to the DDS is applied by a laboratory synthesizer at 12 GHz. The FCW is selected such a way that DDS can produce a 3 GHz DDS output. After filtering and amplification, the signal passes to the multiplier and finally to the LO and transmitting antenna. The antenna can operate from 1 GHz to 10 GHz with a gain of 6 dBi.

The speaker is excited with a known frequency using a signal generator which creates a vibration of its membrane. When the emitted signal hits on its membrane the reflected signal exhibits the Doppler shift corresponding to the vibration frequency of the speaker membrane. This is illustrated in Fig.7.41 for an excitation frequency of 10 Hz. The IF mixer output signal leads to a Doppler frequency shift of 10 Hz together with a 50 Hz spurious signal as depicted. Then, the vibration frequency of the membrane is swept from 1 Hz up to 10 Hz with a frequency step of 1 Hz as shown in Fig.7.42 (left) where the y-axis is the vibration frequency of a membrane and the x-axis is the detected frequency by the radar. It depicts that the radar directly detects the correct frequency of vibration without any pre- or post-processing. In addition to this, frequency to the speaker was



Figure 7.39: Radar system block diagram used in measurement detecting a vibration of the speaker.



Figure 7.40: DDS integrated Doppler radar system realization.



Figure 7.41: Measured result: the 6 GHz Doppler radar detecting 10 Hz speaker vibration. The net frequency of 50 Hz can also be detected.



**Figure 7.42:** Measured result: speaker vibration detection using 6 GHz Doppler radar: for 1 to 10 Hz (left), for 10 to 100 Hz (right). The net frequency of 50Hz can be clearly identified.

swept from 10 Hz to 100 Hz with correctly detected signals with perfect linearity illustrated in Fig.7.42(right), exhibiting an excellent signal to noise ratio (SNR). The 50 Hz a vertical line indicates detection of the mains leakage at 50 Hz in all measurements. The 50 Hz signal can be eliminated by using a battery-powered supply.

#### 8 GHz Doppler radar measurement

The Doppler radar with a signal frequency set to 8 GHz (referred as 8 GHz Doppler radar) is also measured for 1 Hz to 10 Hz vibration and 10 Hz to 100 Hz vibration of the speaker. The corresponding results are shown in Fig.7.43, with a slightly deteriorated SNR. A possible reason for relatively poor detection could be the inferior quality of the DDS signal source and overall performance degradation of each RF component at 8 GHz compared to 6 GHz.



**Figure 7.43:** Measured result: speaker vibration detection using 8 GHz Doppler radar: for 1 to 10 Hz (left), for 10 to 100 Hz (right). The net frequency of 50 Hz can be clearly identified.

#### Multitone measurement

This radar can distinguish more than one vibrating frequency without any additional expense of effort. To verify, the 6 GHz radar has also been tested with two independently vibrating speakers at different frequencies, one at 15 Hz, and another at 17 Hz. The radar correctly detects the two frequencies as in Fig.7.44. The amplitude of the two peaks is different since these two speakers are non-identical and also they have located a slightly different distance to the antenna. Due to the excellent phase noise of the impinging signal source of the antenna, eventually generated by the DDS, the two closest frequencies that can be distinguished and detected are as close as 0.3 Hz. In the next measurement set up, the vibration frequencies of two speakers kept 19.9 Hz and 20.2 Hz respectively. These two frequencies are correctly detected by the 6 GHz radar as shown in Fig.7.45. It confirms that sub-hertz level signal detection is possible thanks to an excellent phase noise of the DDS output signal that is feeding the antenna. It implies that two objects with very

closely spaced frequencies can be reliably detected in applications such as respiration and heartbeat detection & monitoring.



Figure 7.44: Measured result: 6 GHz Doppler radar detecting two vibrating speakers.



Figure 7.45: Measured result: 6 GHz Doppler radar detecting two vibrating speakers with narrowly spaced frequencies of 0.3 Hz.

## 7.7 Conclusions from chapter 7

In this chapter, the complete DDS simulations for various setups are performed. Next, the characterization of the DDS MMIC in RF PCB is carried out. The maximum clock frequency of DDS is limited to 18.5 GHz that is 1.5 GHz less than the simulated result.

The measured SFDR is 1 to 6 dBc less than the simulated result. Other hands, the power consumption is severely degraded by approximately 10 dBm in all cases. The main reason could be that losses on PCB, ports, and cables are not calibrated in the measurement. Finally, one of the applications of the DDS is demonstrated in the Doppler radar which can detect vibration of specimen correctly.

# Chapter 8 Conclusions

The requirement of the versatile signal generator has always been evident in modern RF and communication systems. The most conventional technique, voltage control oscillator (VCO), has inferior phase noise and narrow bandwidth despite its operating frequency can be up to the sub-THz regime. Its phase noise influenced by a various parameter associated with oscillator circuit e.g. transistor size & noise, bias current, noise leaking from the bias supply etc. The bandwidth is limited because the input voltage & an output frequency of the VCO is not strictly linear over the tuning range. The phase noise and SFDR of the VCO output are enhanced by using the phase-lock technique. The phase-locked loop (PLL) uses the feedback system locking the reference frequency set by the VCO. However, the settling time of the PLL is higher due to a feedback control loop. The higher settling time increases the frequency switching time between PLL outputs.

YIG-oscillators is suitable for multi-GHz range and wide bandwidth application. In addition to this, signal generation can be achieved by the free-electron radiation, optical lasers, Gunn diodes as well and they can operate even at THz domain. All these signal generators suffer from slow frequency switching, lack of digital controllability, and advance modulation capability even though their frequency of operation is THz regime. Alternatively, the AWG (arbitrary wave generator) can produce a wide range of frequencies with low phase noise, including digital controllability. One of the vital components of the AWG is the direct digital synthesiser (DDS). Generally, it is composed of phase accumulator, digital to analog converter, sine mapping circuits and low pass filter. It needs a reference clock that acts as samples the DDS outputs. Its output frequency can be varied by applying an appropriate digital input code. Although the maximum frequency of operation is less than all other alternatives (VCO, PLL, YIG-oscillators, laser diode etc.), the DDS has the following important features:

- Wide bandwidth: Ideally, the bandwidth of the DDS ranges from near DC to half of the reference clock.
- **Discrete output frequencies:** Its output is sampled by the clock and the output frequency is the arithmetic function of input digital code and the clock. Therefore there is no frequency offset.
- **Frequency points:** It has a large number of frequency outputs depending upon the size of the accumulator.

- **Fast frequency switching:** Since there is no feedback loop as in PLL, the DDS has fast frequency switching.
- **Modulation capability:** Due to its fast frequency switching and digital controllability due to input digital codes, it can be directly used in various modulation scheme.

The DDS that has high clock frequency, large output frequency points, high SFDR, modulation capable, low phase noise and low power consumption, demands intensive research on the appropriate technology selection, DDS & individual block architecture design, high-speed circuit design, and layout. In addition to this, the final assembly is also vital to use them in real RF systems.

The conventional ROM based DDS is not feasible due to the extensive number of a lookup table or memory block required to save the sine wave amplitudes. Thus, such techniques are limited to low DDS clock (< 5 GHz) operation. There is a CORDIC technique available for DDS even at 14 GHz however due to the requirement of the digital blocks, to implement such code demands extensive power consumption [2]. Other hands, the non-linear DAC based (ROM-less) can also be used for high-speed DDS, however, it also suffers from extensive power consumption and circuit complexity due to the large number (several hundred) of unit current source needed for weighting the sin wave amplitudes. Thus, the TSC based ROM-less architecture is used for DDS that can operate in the multi-GHz domain, where the sine mapping is performed by using the linear DAC and TSC. Three main building blocks (accumulator, DAC, and TSC) are optimized such a way that the operates beyond 20 GHz and offers similar SFDR performance.

The high-speed circuit used in the DDS is pivotal in the overall speed of the high-speed DDS. All the circuits are designed in 0.25  $\mu$ m SiGe (SG25H4)technology, which offers the HBT having f<sub>T</sub>/f<sub>MAX</sub> of 180/220 GHz. The significant block of the DDS is an accumulator and it is a combination of large numbers of digital logic. To operate them at the highest frequency, the digital logic should be the fastest. It is achieved by using a current-mode-logic (CML) with inductive peaking in all the logical gates in the DDS. The propagation delay and switching time of the CML cascoded inverter with inductive peaking are 3.5 ps and 5.5 ps respectively. All the digital gates are designed & optimized to operate at the fastest speed possible which is presented in chapter 3 / Table3.2.

A 12-bit phase accumulator is designed using a partial-pipeline architecture. It consists of eight 2-bit adders, two 1-bit adders, six registers, seven XOR gates, and twelve drivers. The H-tree for the clock synchronization in accumulator consists of sixteen registers and five buffer. Accumulator output is truncated to 7 bits, where MSB is used for digital triangle generation, and rest 6-bits are used to drive consequent 6-bit DAC. It can be synthesized up to 2048 frequency points and has SFDR of 38.15 dBc. Its area and power consumption are 0.88 mm<sup>2</sup> and 1.23 W respectively.

The 6-bit partially segmented DAC is chosen for the digital triangular wave to the analog triangular wave conversion. It is a combination of R-2R network DAC for lower bits and thermometer coded DAC for higher bits. Three partially segmented DACs are compared in terms of INL/DNL performances, relative glitch, area, and power consumption measures. And finally, 3-bit R-2R network and 3-bit thermometer coded structures are used in this DDS. It has a sampling speed of 25 GHz, SFDR of 37.88 dBc both DNL/INL errors within  $\pm$  0.5 LSB. It has an active area and power consumption 0.176 mm<sup>2</sup> and

214 mW respectively. This DAC is one of the fastest DAC in SiGe technology which is shown in chapter 4 / Table 5.5.

The sine mapping is performed in this DDS by the combination of DAC and TSC. Differential circuits have the hyperbolic tangent transfer characteristic for the high signal excitation. Therefore, the TSC converts the triangular wave generated by the DAC into the sine wave using the translinear properties of the differential pair circuits. TSC is compact in size, power-efficient, and offers wide bandwidth. The signal quality (e.g. SFDR) of the TSC output depends upon the optimum excitation of an input signal and precise biasing for the appropriate immature saturation of the differential pair circuits. This TSC has an active area of only 0.01 mm<sup>2</sup> and total power consumption of 50 mW. It has operating bandwidth up to 25 GHz and offers the SFDR of 30 to 42 dBc.

|                                                                                                                                | SiGe | SiGe  | SiGe | InP   | SiGe  | This  | InP   | InP  |
|--------------------------------------------------------------------------------------------------------------------------------|------|-------|------|-------|-------|-------|-------|------|
| Technology                                                                                                                     | HBT  | HBT   | HBT  | HBT   | HBT   | monk  | HBT   | HBT  |
|                                                                                                                                | [50] | [51]  | [52] | [43]  | [5]   | WOLK  | [4]   | [3]  |
| $\begin{tabular}{ c c c c c c c c c c c c c c c c c c c$                                                                       | 200  | 120   | 120  | 250   | 200   | 180   | 350   | 350  |
| Maximum clock                                                                                                                  | 5    | 6.3   | 12   | 14    | 16.8  | 18.5  | 24    | 32   |
| frequency F <sub>clk</sub> (GHz)                                                                                               |      |       |      |       |       |       |       |      |
| Accumulator size A (bit)                                                                                                       | 24   | 9     | 9    | 8     | 8     | 12    | 12    | 8    |
| <b>DAC resolution B</b> (bit)                                                                                                  | 10   | 8     | 8    | 5     | 6     | 6     | 7.5   | 5    |
| Worst case SFDR (dBc)                                                                                                          | 42   | 26    | 22   | 24.8  | 20    | 15    | 30.7  | 21.6 |
| Power consumption P (W)                                                                                                        | 4.7  | 2.5   | 1.9  | 2.4   | 0.49  | 1.5   | 19.8  | 9.45 |
| <b>Die area</b> $(mm^2)$                                                                                                       | 11.1 | 5.76  | 9    | 3.52  | 1.15  | 3.96  | 16.5  | 4.05 |
| Transistor numbers                                                                                                             | -    | 13500 | 9600 | 2122  | 1351  | 4550  | -     | 1891 |
| Sine mapping block                                                                                                             | NL-  | NL-   | NL-  | NL-   | TSC   | TSC   | Digi. | NL-  |
|                                                                                                                                | DAC  | DAC   | DAC  | DAC   | based | based | logic | DAC  |
| Phase control unit                                                                                                             | Yes  | No    | No   | No    | No    | Yes   | No    | No   |
| $ \begin{aligned} \mathbf{FOM1} &= \left( \frac{\mathbf{F_{clk}}}{\mathbf{f_T}} \right) \times \ 100 \\ & (\%) \end{aligned} $ | 2.5  | 5.3   | 10   | 5.6   | 5.8   | 10.3  | 6.5   | 10.6 |
|                                                                                                                                | 53.3 | 39.3  | 83.3 | 23.15 | 165.2 | 74    | 9.6   | 8.4  |
| $FOM3 = \left(\frac{\mathbf{F}_{clk}}{\mathbf{P}}\right)$ (GHz/W)                                                              | 1.1  | 2.5   | 6.3  | 5.8   | 34.5  | 12    | 1.2   | 3.4  |

 Table 8.1: Performance comparison of high-speed DDS circuits (including this work) in the order of increasing maximum clock frequency

The state-of-the-art DDS in various technology is presented in the order of the increasing frequency in Table8.1. This DDS has the highest clock (18.5GHz) in SiGe technology and third-highest clock (after [3], [4]) among all other technology. One of the main reasons [3], [4] have higher clock because of faster node, which is  $f_T = 350$  GHz compared to 180 GHz in this DDS. The first figure of merit (FOM1) presents the scalability of the maximum clock frequency with respect to the cut-off transit frequency of the given technology. This work shows  $F_{clk}$ :  $f_T$  of 10.3 %, the best performance among all other published DDS. It is the first DDS having a clock > 10 GHz that consists of a phase control unit. The FOM2 includes most of the important parameter (except phase control unit) such as maximum clock frequency, phase resolution, amplitude resolution, SFDR, power consumption, and the speed of the technology. The FOM2 is the second-best (72) after the DDS reported in [5]. The FOM3 depicts the clock frequency at the expensive of power consumption measure. The FOM3 of this work accounts for 12, the second-best among other DDS. Besides, this DDS has the measured phase noise of -115 dBc/Hz for 4 GHz output at 1 kHz offset for a 12 GHz clock. This is the first reported measured phase noise of the output of the high-speed (>10 GHz clock) DDS so far.

The DDS work of Laemmle et al. [5] and this work have several similarities. Both have been designed in the same SiGe technology, shares the same ROM-less TSC based sine mapping architecture, and operates in a GHz range of  $F_{clk}$ . Therefore, they need a head-to-head comparison. This work achieves  $F_{clk}$  of 18.5 GHz, 1.7 GHz higher than the Laemmle et al. Both have similar worst-case SFDR characteristics, as they have the same number of truncated accumulator output and DAC bits. Other hands, DDS of Laemmle et al. shows a remarkably low power consumption of 486 mW, with a record ( $F_{clk} : P$ ) figure of merit. In this work has a total power consumption of 1500 mW. Additional power consumption compared to Laemmle et al. comes from the extra 4-bit in the accumulator and the addition of the phase control unit. This leads to the increment of full adders, buffers, and clock drivers and consequently power consumption. This DDS can provide 2048 number of frequency points while Laemmle et al. can generate only 128.

For a fair comparison in terms of power consumption between these two DDSs, the DDS having 8-bit accumulator is implemented using the cell designed in this DDS. And its power consumption is calculated. The full adder number is decreased from 12 to 8 bits, clock feed lines decreased from 16 to 8, and PCW is eliminated. This results power consumptions: accumulator = 450 mW, DAC = 214 mW, and TSC = 50 mW. It means 8-bit accumulator adapted DDS in this design would have consumed 790 mW only. The power consumption can be further optimized, especially in the over-powered driver and emitter followers. For example, the current levels of emitter follower to each switch of the DAC are reduced. It does not affect the performance of the DAC, but the power consumption of the DAC is decreased by at least 60 mW. Thus, the total power consumption of the DDS is only 720 mW. Such optimization can be carried out in accumulator block as well which accounts reduction of 50-60 mW. Finally, the 8-bit accumulator adapted DDS shows the power consumption of only 654 mW, comparable to [5]. Some difference in power consumption may arise from the difference in the technology itself, despite being the same SiGe technology. Their collector current versus  $f_T$  gain characteristics might be different. Laemmle et al. have a slight advantage because they use a 20 GHz faster technology. To reach  $F_{clk}$  of 16.8 GHz, the CML logic can be relaxed for lower tail currents, while this work has to push up tail current even more to reach  $F_{clk}$  of 18.5 GHz despite having a slower node. Nevertheless, the FOM3 ( $F_{clk}$ : P) for the 8-bit accumulator adapted DDS is 28.3 GHz/W comparable to 34.5 GHz/W in [5]. One of the applications of the DDS is also demonstrated. This DDS is used for the 5-10 GHz Doppler radar for the detection of the small vibration of the spacemen (here the speaker membrane). The Doppler radar is constructed by using the commercial RF components (such as mixer, amplifier, multiplier, filter, antenna) where the DDS is the key component. At first 12 GHz clocked DDS is used to produce a 3 GHz clean signal. It is doubled using a multiplier to generate 6 GHz carrier signal of the radar. The vibration of the specimen is varied from 1 Hz to 100 Hz. The radar correctly detects the frequency of the vibration of the speaker. In the next measurement, the DDS output frequency is changed to 4 GHz and it is doubled to 8 GHz by a multiplier. Again, a vibration of the specimen is varied at from 1 Hz to 100 Hz and detected frequency from the radar recorded. It correctly detects the vibrating frequencies. In the final measurement, it is shown that this radar can detect the multi-tone vibrations with narrowly spaced in the frequency of only 0.3 Hz correctly, which is due to the excellent phase noise characteristic of the DDS output signal. It implies that two objects with very closely spaced frequencies can be reliably detected that is useful in a biomedical application such as heartbeat detection & monitoring.

#### **Future work**

The high-speed clock is needed to increase the DDS output bandwidth and to set the maximum possible output frequency. One short path to achieve such goal is to add onchip frequency doubler after the TSC block, though one should remember that the TSC is very sensitive to biasing and loading effect. Thus proper buffering between TSC and doubler is mandatory. On the downside, the doubler is a non-linear device and introduces more spurs. Also, the minimum frequency (and frequency resolution ) will be effectively doubled.

Another method for speed improvement is the use of different accumulator architecture. The speed limitation comes from all three main blocks, and the accumulator is the most critical one. Using a full pipeline would help to increase speed. However, it is not recommended due to increased area, power, complexity on clock tree implementation.

Using this technology, the  $F_{clk}$ :  $f_T$  ratio is already on the limit, search for the smaller node for the high-speed DDS is inevitable. The smaller node intrinsically has a lower propagation delay and switching time, these allow operating accumulator, DAC, and TSC faster. Due to smaller node and smaller voltage headroom, power consumption can be reduced as well. Another important benefit is from the possibility of the larger DAC size. In chapter 5, it is already shown that 8-bit partially segmented DAC design and it can be transferred into faster technology. This means the phase accumulator can be truncated to 9-bit output. This results in SFDR of both accumulator and DAC approximately 44 dBc. Note that for an ideal triangular input TSC already has SFDR of in the range of 40 dBc, in the faster technology it can be further improved. Thus the DDS output with the SFDR of 42-44 dBc is possible compared to 38 dBc in this DDS. The current DDS has low output power levels ( $\approx$  - 20 dBm), this is insufficient in the real application. A trans-impedance-stage can be inserted on the collector branch on the TSC to increase the gain, though this task is cumbersome due to bias sensitive TSC. Alternatively, on-chip separate amplifier after the TSC or doubler can be used to generate reasonable output power.

The RF test board design for the DDS also plays a major role in the DDS result. For example, improper differential inputs have introduced a problem in clock transition. Other hands, proper calibration is also necessary for the future, to analyze if the reduction in DDS performance stems from the DDS circuits itself or due to the artefact.

The possibilities mentioned above are focused on the ROM-less TSC based DDS architecture only. One of the major drawbacks of this architecture is that the DDS output frequency from  $\frac{1}{4_{th}}$  to half of the clock has poor SFDR due to insufficient sampling points. Other possible NL-based DDS or logics based DDS relaxes on this problem since extraction of sinewave depends on the weighed sine current source matrix rather than a sampling point. Nevertheless, power consumption measures should be taken carefully.

Another important future consideration is the application of DDS in advanced modulation scheme. Both linear and nonlinear frequency ramp can be generated using this DDS. It is verified by the DDS simulation in chapter 7. The measured result of such dynamic ramp is not carried out due to ongoing work on the FPGA (field-programmable gate array) control board development. However, the static input FCW versus DDS output is verified for both linear and nonlinear frequency ramp set up, where the input FCW are changed manually, and the corresponding frequency is recorded. It can generate FM chirp with a frequency resolution of  $\approx 5$  MHz and 2048 frequency points, with a switching speed up to 3.33 GHz. In future, the FPGA can be programmed to perform the available digital modulation. This DDS can be digitally controlled using FPGA. It enables the phase control unit along with the frequency control unit simultaneously. It can produce phase modulation (PM), phase-shift keying (PSK), quadrature-PSK (QPSK), continuous-phasemodulation (CPM), and frequency-shift-keying (FSK). The simple frequency modulation (FM) and continuous-phase-FSK (CPFSK) might be a problem due to the exclusion of the pre-skewing registers in the accumulator block. However, if the phase jump during the frequency switching is calculated, this DDS can be even used in radar applications for FM and CPFSK modulation. Thus, this DDS can be adapted in in compact FMCW (frequency modulated continuous wave) radar systems, and also in MIMO(Multiple-Input Multiple-Output) radar and communications systems.

\* \* \*

# Bibliography

- T. Tired, J. Wernehag, W. Ahmad, I. ud Din, P. Sandrup, M. Toermaenen, and H. S. Sjoeland, "A 1.5 v 28 ghz beam steering sige pll for an 81-86 ghz e-band transmitter," *IEEE Microwave and Wireless Components Letters*, vol. 26, no. 10, pp. 843–845, 2016. i, 1
- [2] S. E. Turner, M. E. Stuenkel, G. M. Madison, J. A. Cartwright, R. L. Harwood, J. D. Cali, S. A. Chadwick, M. Oh, J. T. Matta, J. M. Meredith, J. M. Byrd, and L. J. Kushner, "Direct digital synthesizer with 14 gs/s sampling rate heterogeneously integrated in inp hbt and gan hemt on cmos," in 2019 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), pp. 115–118, 2019. ii, 15, 144
- [3] S. E. Turner and D. E. Kotecki, "Direct digital synthesizer with sine-weighted dac at 32-ghz clock frequency in inp dhbt technology," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 10, pp. 2284–2290, 2006. iii, 6, 8, 16, 62, 106, 145
- [4] S. E. Turner, R. T. Chan, and J. T. Feng, "Rom-based direct digital synthesizer at 24 ghz clock frequency in inp dhbt technology," *IEEE Microwave and Wireless Components Letters*, vol. 18, no. 8, pp. 566–568, 2008. iii, 6, 9, 145
- [5] B. Laemmle, C. Wagner, H. Knapp, H. Jaeger, L. Maurer, and R. Weigel, "A differential pair-based direct digital synthesizer mmic with 16.8-ghz clock and 488-mw power consumption," *IEEE Transactions on Microwave Theory and Techniques*, vol. 58, no. 5, pp. 1375–1383, 2010. iv, 6, 8, 9, 18, 63, 86, 94, 145, 146
- [6] A. Devices, A Technical Tutorial on Digital Signal Synthesis. Analog Devices, Inc., NORWOOD, MASSACHUSSETTS, 1 ed., 1 1999. Technical Note. x, 13, 14
- [7] J. Choi, D. Yoon, D. Jung, K. Seong, J. Han, W. Lee, and K. Baek, "Design and analysis of low power and high sfdr direct digital frequency synthesizer," *IEEE Access*, vol. 8, pp. 67581–67590, 2020. x, 15, 54
- [8] B. S. Jensen, M. M. Khafaji, T. K. Johansen, V. Krozer, and J. C. Scheytt, "Twelvebit 20-ghz reduced size pipeline accumulator in 0.25 um sige:c technology for direct digital synthesiser applications," *IET Circuits, Devices Systems*, vol. 6, no. 1, pp. 19– 27, 2012. x, 8, 9, 17, 58, 62
- [9] P. Michalik, D. Fernández, M. Wietstruck, M. Kaynak, and J. Madrenas, "Experiments on mems integration in 0.25 um cmos process," *Sensors*, vol. 18, p. 2111, Jun 2018. x, 21

- [10] N. Solati, "Biasing for high linearity base-station pre-driver," Master's thesis, Electronic Devices, Dept. of Electrical Engineering, Linkoeping University, 581–83 Linkoeping, Sweden, 8 2013. Master Thesis. x, 25, 26
- [11] S. E. Turner, High-speed digital and mixed-signal components for x- and ku-band direct digital synthesizers in Indium Phosphide DHBT technology. PhD thesis, University of Maine, 168 College Ave, Orono, ME 04469, United States, May 2006. x, 17, 26, 27, 49, 51
- [12] S. Voinigescu, *High-Frequency Integrated Circuits*. High-frequency Integrated Circuits, Cambridge University Press, 2013. x, 7, 23, 27, 28, 29, 30
- [13] F. Pera, Y. Savaria, and G. Bois, "Time delay measurement methods for integrated transmission lines and high speed cell characterization," in 1997 IEEE International Symposium on Circuits and Systems (ISCAS), vol. 1, pp. 293–296 vol.1, 1997. x, 31
- [14] T. Floyd, Digital Fundamentals. Prentice Hall, 2003. xi, 45, 46, 47
- [15] D. Johns and K. Martin, Analog Integrated Circuit Design. Wiley, 1997. xi, 64, 65
- [16] H. Khorramabadi, Analog-Digital Interface Integrated Circuits. Department of Electrical Engineering and Computer Sciences, UC Berkeley, Bekerly, 1 ed., 1 2010. EE247 Lecture 14. xi, 66, 69
- B. D. Smith, "Coding by feedback methods," *Proceedings of the IRE*, vol. 41, no. 8, pp. 1053–1058, 1953. xi, 67, 68
- [18] S. Thuries, E. Tournier, A. Cathelin, S. Godet, and J. Graffeuil, "A 6-ghz low-power bicmos sige: 0.25 um direct digital synthesizer," *IEEE Microwave and Wireless Components Letters*, vol. 18, pp. 46–48, 2008. xii, 17, 94, 95
- [19] G. M. Brooker, "Understanding millimetre wave fmcw radars," in 1 st International Conference on Sensing Technology, IEEE, New Zealand, 2005, pp. 152–157. xv, 134
- [20] J. F. Federici, D. Gary, R. Barat, and Z. Michalopoulou, "T-rays vs. terrorists," *IEEE Spectrum*, vol. 44, no. 7, pp. 47–52, 2007. 1
- [21] V. Radisic, X. B. Mei, W. R. Deal, W. Yoshida, P. H. Liu, J. Uyeda, M. Barsky, L. Samoska, A. Fung, T. Gaier, and R. Lai, "Demonstration of sub-millimeter wave fundamental oscillators using 35-nm inp hemt technology," *IEEE Microwave and Wireless Components Letters*, vol. 17, no. 3, pp. 223–225, 2007. 1
- [22] P. H. Siegel, "Terahertz technology," IEEE Transactions on Microwave Theory and Techniques, vol. 50, no. 3, pp. 910–928, 2002.
- [23] S. Kim, K. Choi, D. Park, J. Kim, S. Han, and S. Lee, "0.5 and 1.5 thz monolithic imagers in a 65 nm cmos adopting a vco-based signal processing," in 2017 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp. 149–152, 2017. 1

- [24] O. Momeni and E. Afshari, "High power terahertz and millimeter-wave oscillator design: A systematic approach," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 3, pp. 583–597, 2011.
- [25] S. Prasad, H. Schumacher, and A. Gopinath, *High-Speed Electronics and Optoelec*tronics: Devices and Circuits. Cambridge University Press, 2009. 1, 23
- [26] W. Tan, G. Chen, and H. Zhang, "A 1-ghz lc voltage-controlled oscillator with high linearity and wide range," in 2008 IEEE International Conference on Electron Devices and Solid-State Circuits, pp. 1–4, 2008. 1
- [27] F. Gardner, *Phaselock Techniques*. Wiley, 2005. 1
- [28] Y. Mizunuma, T. Ohgihara, H. Nakano, T. Okamoto, and Y. Murakami, "A 13-ghz yig-film tuned oscillator for vsat applications," in 1988., IEEE MTT-S International Microwave Symposium Digest, pp. 1085–1088 vol.2, 1988.
- [29] D. Huang, T. R. LaRocca, L. Samoska, A. Fung, and M. F. Chang, "324ghz cmos frequency generator using linear superposition technique," in 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, pp. 476–629, 2008. 2
- [30] L. Xiaodong, S. Yanyan, and Li Shubo, "A mcu-based arbitrary waveform generator for slh power amplifier using dds technique," in 2007 8th International Conference on Electronic Measurement and Instruments, pp. 4–895–4–899, 2007. 2
- [31] W. Dong, Q. Liu, S. Peng, and H. Li, "Design and realization of arbitrary radar waveform generator based on dds and sopc technology," in 2009 9th International Conference on Electronic Measurement Instruments, pp. 1–534–1–537, 2009. 2
- [32] J. Vankka, M. Waltari, M. Kosunen, and K. A. I. Halonen, "A direct digital synthesizer with an on-chip d/a-converter," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 2, pp. 218–227, 1998. 2
- [33] Y. Mao, E. Shiju, K. Schmalz, J. Borngräber, and J. C. Scheytt, "A novel 245 ghz 4thindex push-push vco," in 2019 IEEE International Symposium on Radio-Frequency Integration Technology (RFIT), pp. 1–3, 2019. 2
- [34] X. Yi, Z. Liang, G. Feng, C. C. Boon, and F. Meng, "A 93.4-to-104.8 ghz 57 mw fractional-n cascaded sub-sampling pll with true in-phase injection-coupled qvco in 65 nm cmos," in 2016 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), pp. 122–125, 2016. 2
- [35] S. Shahramian, A. Hart, A. Chan Carusone, P. Garcia, P. Chevalier, and S. P. Voinigescu, "A d-band pll covering the 81–82 ghz, 86–92 ghz and 162–164 ghz bands," in 2010 IEEE Radio Frequency Integrated Circuits Symposium, pp. 53–56, 2010.
- [36] M. v. Delden, N. Pohl, K. Aufinger, and T. Musch, "A 32-48 ghz differential yig oscillator with low phase noise based on a sige mmic," in 2019 IEEE Radio and Wireless Symposium (RWS), pp. 1–3, 2019. 2

- [37] W. Stein, F. Huber, S. Bildik, M. Aigle, and M. Vossiek, "An improved ultra-lowphase noise tunable yig oscillator operating in the 6–12 ghz range," in 2017 47th European Microwave Conference (EuMC), pp. 767–770, 2017. 2
- [38] A. Gutierrez-Aitken, J. Matsui, E. N. Kaneshiro, B. K. Oyama, D. Sawdai, A. K. Oki, and D. C. Streit, "Ultrahigh-speed direct digital synthesizer using inp dhbt technology," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 9, pp. 1115–1119, 2002. 2
- [39] X. Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, "An 11-bit 8.6 ghz direct digital synthesizer mmic with 10-bit segmented sine-weighted dac," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 2, pp. 300–313, 2010. 2, 14, 17, 18, 48, 72
- [40] T. Shivan, M. Hossain, R. Doerner, S. Schulz, T. Johansen, S. Boppel, W. Heinrich, and V. Krozer, "Highly linear 90-170 ghz spdt switch with high isolation for fully integrated inp transceivers," in 2019 IEEE MTT-S International Microwave Symposium (IMS), pp. 1011–1014, 2019. 2
- [41] H. Siethoff, J. Voelkl, D. Gerthsen, and H. G. Brion, "The lower yield point of inp and gaas," *Physica Status Solidi A*, *Applied Research*, vol. 101, no. 1, pp. K13–K18, May 1987. 2
- [42] J. Vankka and K. Halonen, "Direct digital synthesizers: Theory, design and applications," 2001. 4, 51, 53
- [43] X. Li, Y. Zhang, Z. Wang, Y. Zhang, M. Zhang, W. Cheng, and H. Gao, "A 14-ghz 8-bit direct digital synthesizer in inp dhbt technology," in 2019 IEEE International Symposium on Radio-Frequency Integration Technology (RFIT), pp. 1–3, 2019. 4, 6, 8, 9, 18, 106, 145
- [44] S. Ayhan, V. Vu-Duy, P. Pahl, S. Scherr, M. Hübner, J. Becker, and T. Zwick, "Fpga controlled dds based frequency sweep generation of high linearity for fmcw radar systems," in 2012 The 7th German Microwave Conference, pp. 1–4, 2012. 5, 10
- [45] C. Mincong, L. Ruiyu, and H. Tao, "Study of optimized design of dds based on fpga," in 2019 14th IEEE International Conference on Electronic Measurement Instruments (ICEMI), pp. 150–154, 2019. 5, 10
- [46] J. Moll, B. Hils, A. Shrestha, A. Ehlert, V. Krozer, K. Thurn, M. Vossiek, M. Hrobak, M. Hossain, W. Heinrich, M. Resch, and J. Bosse, "Panel design of a mimo imaging radar at w-band for space applications," in 2017 European Radar Conference (EURAD), pp. 126–129, 2017. 5
- [47] C. Talarico, G. D'Amato, G. Coviello, and G. Avitabile, "A high precision phase control unit for dds-based plls for 2.4-ghz ism band applications," in 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1–4, 2015. 5

- [48] T. Finateu, F. Badets, Y. Deval, J. B. Begueret, and D. Belot, "A 65nm cmos 2.4 ghz phase shifter based direct digital synthesizer," in 2012 IEEE 11th International Conference on Solid-State and Integrated Circuit Technology, pp. 1–3, 2012. 5
- [49] B. Laemmle, C. Wagner, H. Knapp, L. Maurer, and R. Weigel, "High speed low power phase accumulators for dds applications in sige bipolar technology," in 2009 IEEE Bipolar/BiCMOS Circuits and Technology Meeting, pp. 162–165, 2009. 6, 17, 58
- [50] X. Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, "A 5 ghz direct digital synthesizer mmic with direct modulation and spur randomization," in 2009 IEEE Radio Frequency Integrated Circuits Symposium, pp. 419–422, 2009. 6, 17, 18, 145
- [51] X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, "A 9-bit quadrature direct digital synthesizer implemented in 0.18-μm sige bicmos technology," *IEEE Transactions on Microwave Theory and Techniques*, vol. 56, no. 5, pp. 1257–1266, 2008. 6, 50, 62, 145
- [52] X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, "A 12 ghz 1.9 w direct digital synthesizer mmic implemented in 0.18 μm sige bicmos technology," *IEEE Journal* of Solid-State Circuits, vol. 43, no. 6, pp. 1384–1393, 2008. 6, 34, 55, 62, 106, 145
- [53] J. S. Dunn, D. C. Ahlgren, D. D. Coolbaugh, N. B. Feilchenfeld, G. Freeman, D. R. Greenberg, R. A. Groves, F. J. Guarin, Y. Hammad, A. J. Joseph, L. D. Lanzerotti, S. A. St.Onge, B. A. Orner, J. . Rieh, K. J. Stein, S. H. Voldman, P. . Wang, M. J. Zierak, S. Subbanna, D. L. Harame, D. A. Herman, and B. S. Meyerson, "Foundation of rf cmos and sige bicmos technologies," *IBM Journal of Research and Development*, vol. 47, no. 2.3, pp. 101–138, 2003. 7, 88
- [54] A. Pawlikiewicz and D. Hess, "Mixed-signal design choosing rf cmos or sige bicmos in mixed-signal design," 2005. 7
- [55] O. Llopis and G. Cibiel, "Phase noise metrology and modeling of microwave transistor applications to the design of state-of-the-art dielectric resonator oscillators," in *Noise in Devices and Circuits* (M. J. Deen, Z. Celik-Butler, and M. E. Levinshtein, eds.), vol. 5113, pp. 179 – 191, International Society for Optics and Photonics, SPIE, 2003. 7
- [56] A. M. Alonso, X. Yuan, M. Miyahara, and A. Matsuzawa, "A 2 gs/s 118 mw digitalmapping direct digital frequency synthesizer in 65nm cmos," in 2017 12th European Microwave Integrated Circuits Conference (EuMIC), pp. 228–231, 2017. 8
- [57] N. Zerounian, F. Aniel, B. Barbalat, P. Chevalier, and A. Chantre, "500 ghz cutoff frequency sige hbts," *Electronics Letters*, vol. 43, no. 14, 2007. 8
- [58] H. Rücker and B. Heinemann, "High-performance SiGe HBTs for next generation BiCMOS technology," *Semiconductor Science and Technology*, vol. 33, p. 114003, oct 2018. 8, 93

- [59] A. Shrestha, J. Moll, A. Raemer, M. Hrobak, and V. Krozer, "20 ghz clock frequency rom-less direct digital synthesizer comprising unique phase control unit in 0.25 um sige technology," in 2018 13th European Microwave Integrated Circuits Conference (EuMIC), pp. 206–209, 2018. 9, 18, 50, 63
- [60] B. Goldberg, Digital Frequency Synthesis Demystified. Elsevier Science, 2000. 14, 45
- [61] J. E. Volder, "The cordic trigonometric computing technique," IRE Transactions on Electronic Computers, vol. EC-8, no. 3, pp. 330–334, 1959. 15
- [62] C. Y. Kang and E. Swartzlander, "Digit-pipelined direct digital frequency synthesis based on differential cordic," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 53, pp. 1035–1044, 2006. 15
- [63] L. Cordesses, "Direct digital synthesis: a tool for periodic wave generation (part 1)," IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 50–54, 2004. 15
- [64] Loke Kun Tan, E. W. Roth, G. E. Yee, and H. Samueli, "An 800 mhz quadrature digital synthesizer with ecl-compatible output drivers in 0.8 um cmos," *IEEE Journal of Solid-State Circuits*, vol. 30, no. 12, pp. 1463–1473, 1995. 16
- [65] H. Chen, J. Chen, Y. Li, and Y. Wu, "A quarter-rom dds using phase accumulator with subtraction," in 2018 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), pp. 1–2, 2018. 16
- [66] J. Gorski-Popiel, I. of Electrical, E. E. A. Board, and I. E. A. Board, Frequency Synthesis: Techniques and Applications. IEEE Press, 1975. 16
- [67] H. T. Nicholas and H. Samueli, "A 150-mhz direct digital frequency synthesizer in 1.25 um cmos with -90dbc spurious performance," in 1991 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pp. 42–286, 1991. 16
- [68] D. A. Sunderland, R. A. Strauch, S. S. Wharfield, H. T. Peterson, and C. R. Cole, "Cmos/sos frequency synthesizer lsi circuit for spread spectrum communications," *IEEE Journal of Solid-State Circuits*, vol. 19, no. 4, pp. 497–506, 1984. 16
- [69] A. Bellaouar, M. Obrecht, A. Fahim, and M. I. Elmasry, "A low-power direct digital frequency synthesizer architecture for wireless communications," in *Proceedings* of the IEEE 1999 Custom Integrated Circuits Conference (Cat. No.99CH36327), pp. 593–596, 1999. 16
- [70] M. M. El Said and M. L. Elmasry, "An improved rom compression technique for direct digital frequency synthesizers," in 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353), vol. 5, pp. V–V, 2002. 16
- [71] M. Ghosh, L. S. J. Chimakurthy, F. F. Dai, and R. C. Jaeger, "A novel dds architecture using nonlinear rom addressing with improved compression ratio and quantisation noise," in 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), vol. 2, pp. II–705, 2004. 16

- [72] L. S. J. Chimakurthy, M. Ghosh, F. F. Dai, and R. C. Jaeger, "A novel dds using nonlinear rom addressing with improved compression ratio and quantization noise," *IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control*, vol. 53, no. 2, pp. 274–283, 2006. 16
- [73] J. M. P. Langlois and D. Al-Khalili, "Phase to sinusoid amplitude conversion techniques for direct digital frequency synthesis," *IEE Proceedings - Circuits, Devices* and Systems, vol. 151, no. 6, pp. 519–528, 2004. 16
- [74] S. E. Turner and D. E. Kotecki, "Direct digital synthesizer with rom-less architecture at 13-ghz clock frequency in inp dhbt technology," *IEEE Microwave and Wireless Components Letters*, vol. 16, no. 5, pp. 296–298, 2006. 16, 62
- [75] Jiandong Jiang and E. K. F. Lee, "A low-power segmented nonlinear dac-based direct digital frequency synthesizer," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 10, pp. 1326–1330, 2002. 17
- [76] J. Jiang and E. Lee, "A rom-less direct digital frequency synthesizer using segmented nonlinear digital-to-analog converter," *Proceedings of the IEEE 2001 Custom Inte*grated Circuits Conference (Cat. No.01CH37169), pp. 165–168, 2001. 17
- [77] S. Mortezapour and E. K. F. Lee, "Design of low-power rom-less direct digital frequency synthesizer using nonlinear digital-to-analog converter," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 10, pp. 1350–1359, 1999. 17
- [78] W. Wichmann, *IHP SG25H4 Process Specification*. Innovations for High Performance Microelectronics, Im Technologiepark 25, 15236 Frankfurt (Oder), 4 2017. page 22. 23, 31
- [79] G. Fischer, "Analysis and modeling of the long-term ageing rate of sige hbts under mixed-mode stress," in 2016 IEEE Bipolar/BiCMOS Circuits and Technology Meeting (BCTM), pp. 106–109, 2016. 24
- [80] F. Gerhard, Analysis and Simulation of SiGe HBT Ageing. Innovations for High Performance Microelectronics, Im Technologiepark 25, 15236 Frankfurt (Oder), 1 ed., 11 2015. hbt. 24
- [81] M. Khafaji, H. Gustat, B. Sedighi, F. Ellinger, and J. C. Scheytt, "A 6-bit fully binary digital-to-analog converter in 0.25-μm sige bicmos for optical communications," *IEEE Transactions on Microwave Theory and Techniques*, vol. 59, no. 9, pp. 2254–2264, 2011. 25, 54, 63, 71, 77, 91, 92
- [82] T. J. Farmer, A. M. Darwish, B. Huebschman, E. Viveiros, and M. E. Zaghloul, "High power density sige millimeter-wave power amplifiers," *International Journal of Microwave and Wireless Technologies*, vol. 3, pp. 615–620, 2011. 25
- [83] T. J. Farmer, A. Darwish, and M. Zaghloul, "A 2.4 ghz sige hbt high voltage/high power amplifier," *IEEE Microwave and Wireless Components Letters*, vol. 20, pp. 286–288, 2010. 25

- [84] F. Ellinger, Radio Frequency Integrated Circuits and Technologies. Springer, 2008.
   26
- [85] P. Gray, P. Hurst, S. Lewis, and R. Meyer, Analysis and Design of Analog Integrated Circuits. Wiley, 2017. 26
- [86] H. Gustat, U. Jagdhold, F. Winkler, M. Appel, and G. Kell, "Differential ecl/cml synthesis for sige bicmos," in 2008 IEEE Compound Semiconductor Integrated Circuits Symposium, pp. 1–4, 2008. 29
- [87] E. Mavrek, I. Loncaric, I. Poljak, M. Koricic, and T. Suligoj, "Effect of parasitic rlc parameters in bias networks on ecl delay time," in *The 33rd International Convention MIPRO*, pp. 73–77, 2010. 29
- [88] A. Arbabian, S. Callender, S. Kang, B. Afshar, J. Chien, and A. M. Niknejad, "A 90 ghz hybrid switching pulsed-transmitter for medical imaging," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 12, pp. 2667–2681, 2010. 29
- [89] T. O. Dickson, R. Beerkens, and S. P. Voinigescu, "A 2.5-v 45-gb/s decision circuit using sige bicmos logic," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 4, pp. 994– 1003, 2005. 29
- [90] M.-q. Yao and L.-b. Zhang, "Emitter-couple logic circuit design based on the threshold-arithmetic algebraic system," *Journal of Zhejiang University SCIENCE C*, vol. 14, pp. 808–814, Oct 2013. 29
- [91] A. Sedra, D. Sedra, K. Smith, and P. Smith, *Microelectronic Circuits*. Oxford series in electrical and computer engineering, Oxford University Press, 1998. 30, 45
- [92] H. Rein and M. Moller, "Design considerations for very-high-speed si-bipolar ic's operating up to 50 gb/s," *IEEE Journal of Solid-State Circuits*, vol. 31, no. 8, pp. 1076–1090, 1996. 31
- [93] W. Fang, "Accurate analytical delay expressions for ecl and cml circuits and their applications to optimizing high-speed bipolar circuits," *IEEE Journal of Solid-State Circuits*, vol. 25, no. 2, pp. 572–583, 1990. 34
- [94] V. Stojanovic and V. G. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 4, pp. 536–548, 1999. 35
- [95] H. Wang, L. Sun, L. Huang, and L. Cai, "Design of uwb circuits with inductive peaking technique," in 2012 International Conference on Microwave and Millimeter Wave Technology (ICMMT), vol. 1, pp. 1–4, 2012. 36
- [96] P. Payandehnia, H. Maghami, S. Sheikhaei, A. Abbasfar, B. Forouzandeh, and K. Nanbakhsh, "High speed cml latch using active inductor in 0.18um cmos technology," in 2011 19th Iranian Conference on Electrical Engineering, pp. 1–4, 2011. 36

- [97] Y. Takahashi, K. Eguchi, A. Itoh, and K. Ishii, "Analysis of propagation-delays in high-speed bipolar gates," 2015 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 327–330, 2015. 37
- [98] M. Usama and T. Kwasniewski, "New cml latch structure for high speed prescaler design," in *Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513)*, vol. 4, pp. 1915–1918 Vol.4, 2004. 39
- [99] M. Alioto, R. Mita, and G. Palumbo, "Analysis and comparison of low-voltage cml d-latch," in 9th International Conference on Electronics, Circuits and Systems, vol. 2, pp. 737–740 vol.2, 2002. 40
- [100] M. Usama, "mode logic latch and prescaler design optimization in 0.18um cmos technology," Master's thesis, Ottawa-Carleton Institute for Electrical Engineering Department of Electronics, Carleton University Ottawa, 1125 Colonel By Dr, Ottawa, ON K1S 5B6, Canada, 01 2005. Master thesis. 40
- [101] W. Gothmann, Digital Electronics: An Introduction to Theory and Practice. Prentice-Hall, 1982. 45
- [102] A. Gutub, M. Ibrahim, and M. A. Araman, "Super pipelined digit serial adders for multimedia and e-security," 2004. 48, 51
- [103] T. Mathew, S. Jaganathan, D. Scott, S. Krishnan, Y. Wei, M. Urteaga, M. Rodwell, and S. Long, "2-bit adder carry and sum logic circuits clocking at 19 ghz clock frequency in transferred substrate hbt technology," in *Conference Proceedings. 2001 International Conference on Indium Phosphide and Related Materials. 13th IPRM* (Cat. No.01CH37198), pp. 505–508, 2001. 49
- [104] A. M. Sodagar and G. R. Lahiji, "A pipelined rom-less architecture for sine-output direct digital frequency synthesizers using the second-order parabolic approximation," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 48, no. 9, pp. 850–857, 2001. 52
- [105] P. O'Leary and F. Maloberti, "A direct-digital synthesizer with improved spectral performance," *IEEE Transactions on Communications*, vol. 39, no. 7, pp. 1046– 1048, 1991. 54
- [106] Fa Foster Dai, Weining Ni, Shi Yin, and R. C. Jaeger, "A direct digital frequency synthesizer with fourth-order phase domain /spl delta//spl sigma/ noise shaper and 12-bit current-steering dac," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 4, pp. 839–850, 2006. 54
- [107] A. Ashrafi and R. Adhami, "Comments on "a 13-bit resolution rom-less direct digital frequency synthesizer based on a trigonometric quadruple angle formula"," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 13, no. 9, pp. 1096–1098, 2005. 54
- [108] A. J. M. Behesti, "A 2-ghz rom-less direct digital frequency synthesizer based on an analog sine mapper circuit, 24th icee," 2016. 55

- [109] K. Han, A. B. Kahng, and J. Li, "Optimal generalized h-tree topology and buffering for high-performance and low-power clock distribution," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 39, no. 2, pp. 478– 491, 2020. 59
- [110] S. E. Turner, R. Elder, D. Jansen, and D. Kotecki, "4-bit adder-accumulator at 41ghz clock frequency in inp dhbt technology," *IEEE Microwave and Wireless Components Letters*, vol. 15, pp. 144–146, 2005. 62
- [111] F. F. Dai, L. S. J. Chimakurthy, D. Yang, J. Huang, and R. C. Jaeger, "A low power 5 ghz direct digital synthesizer designed in sige technology," in *Digest of Papers*. 2004 Topical Meeting onSilicon Monolithic Integrated Circuits in RF Systems, 2004., pp. 21–24, 2004. 62
- [112] A. R. Bugeja, B. . Song, P. L. Rakers, and S. F. Gillig, "A 14-b, 100-ms/s cmos dac designed for spectral performance," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 12, pp. 1719–1732, 1999. 63, 87
- [113] N. M. Alaja Kumari Sadda, "A study of output impedance effects in currentsteering digital-to-analog converters," Master's thesis, Linkoeping University, 581 83 Linkoeping, Sweden, 7 2013. Master Thesis. 63
- [114] A. Balteanu, P. Schvan, and S. P. Voinigescu, "A 6-bit segmented dac architecture with up to 56-ghz sampling clock and 6-v<sub>pp</sub> differential swing," *IEEE Transactions* on Microwave Theory and Techniques, vol. 64, no. 3, pp. 881–891, 2016. 63, 91
- [115] S. Halder and H. Gustat, "A 30 gs/s 4-bit binary weighted dac in sige bicmos technology," in 2007 IEEE Bipolar/BiCMOS Circuits and Technology Meeting, pp. 46–49, 2007. 63, 91
- [116] P. Schvan, D. Pollex, and T. Bellingrath, "A 22gs/s 6b dac with integrated digital ramp generator," in ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005., pp. 122–588 Vol. 1, 2005. 63, 91
- [117] K. O. Andersson, "Modeling and implementation of current-steering digital-toanalog converters," 2005. 64
- [118] P. Karthika and T. Chelladurai, "Reduction of glitch energy in binary weighted current steering dac: Survey," 2016. 67
- [119] S. Manandhar, S. E. Turner, and D. E. Kotecki, "36-ghz, 16× 6-bit rom in inp dhbt technology suitable for dds application," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 2, pp. 451–456, 2007. 69
- [120] K. Wang, High Speed Circuits For Lightwave Communications, Selected Topics In Electronics And Systems, Vol 1. Selected Topics In Electronics And Systems, World Scientific Publishing Company, 1999. 72

- [121] M. Albiol, J. L. Gonzalez, and E. Alarcon, "Mismatch and dynamic modeling of current sources in current-steering cmos d/a converters: an extended design procedure," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 51, no. 1, pp. 159–169, 2004. 77
- [122] A. R. Bugeja and Bang-Sup Song, "A self-trimming 14-b 100-ms/s cmos dac," IEEE Journal of Solid-State Circuits, vol. 35, no. 12, pp. 1841–1852, 2000. 87
- [123] D. Baranauskas and D. Zelenin, "A 0.36w 6b up to 20gs/s dac for uwb wave formation," in 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers, pp. 2380–2389, 2006. 91
- [124] M. Khafaji, H. Gustat, and C. J. Scheytt, "A 6 bit linear binary rf dac in 0.25um sige bicmos for communication systems," 2010 IEEE MTT-S International Microwave Symposium, pp. 916–919, 2010. 91
- [125] Y. Fu, "Low power circuit topologies for digital-to-analog converters with a mmwave sampling clock," 2015. 91
- [126] M. A. Kelly and C. E. Tyler, A Second-Generation ESCA Spectrometer. HEWLETT-PACKARD JOURNAL, hp, 1 ed., 7 1973. HP. 94
- [127] W. A. Evans and J. S. Williams, "The multi-tanh circuit as a triwave-to-sine converter," *IEE Journal on Electronic Circuits and Systems*, vol. 3, no. 3, pp. 90–92, 1979. 94
- [128] A. B. Grebene, "Monolithic waveform generation," *IEEE Spectrum*, vol. 9, no. 4, pp. 34–40, 1972. 94
- [129] C. Yang, J. Weng, and H. Chang, "A 5-ghz direct digital frequency synthesizer using an analog-sine-mapping technique in 0.35- μm sige bicmos," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 9, pp. 2064–2072, 2011. 94
- [130] R. G. Meyer, W. M. C. Sansen, and S. Peeters, "The differential pair as a trianglesine wave converter," *IEEE Journal of Solid-State Circuits*, vol. 11, no. 3, pp. 418– 420, 1976. 94, 95
- [131] U. Nukala, "Design of a temperature independent mosfet-only current reference," 2011. 106
- [132] X. Guo, M. Cai, and X. He, "A low power op-ampless bandgap reference with second-order compensation," in 2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC), pp. 1–2, 2017. 106
- [133] F. Serra-Graells and J. L. Huertas, "Sub-1-v cmos proportional-to-absolute temperature references," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 1, pp. 84–88, 2003. 106
- [134] B. Razavi, Design of Analog CMOS Integrated Circuits. IRWIN ELECTRONICS & COMPUTER E, McGraw-Hill Education, 2016. 106

- [135] H. Wang, F. Dai, and M. Hamilton, "A high order compensated op-amp-less bandgap reference with 39 ppm/c over -260 to 125 c temperature range and -50 db psrr," arXiv: Applied Physics, 2018. 106, 107
- [136] Barrie Gilbert, "Circuits for the precise synthesis of the sine function," *Electronics Letters*, vol. 13, no. 17, pp. 506–508, 1977. 107
- [137] Rogers Corporation, RO4000 R Series High Frequency Circuit Materials, 2018. Rev. D. 127
- [138] A. Tessmann, S. Kudszus, T. Feltgen, M. Riessle, C. Sklarczyk, and W. H. Haydl, "A 94 ghz single-chip fmcw radar module for commercial sensor applications," in 2002 IEEE MTT-S International Microwave Symposium Digest (Cat. No.02CH37278), vol. 3, pp. 1851–1854 vol.3, 2002. 134
- [139] P. S. Girão, O. Postolache, G. Postolache, P. M. Ramos, and J. M. D. Pereira, "Microwave doppler radar in unobtrusive health monitoring," *Journal of Physics: Conference Series*, vol. 588, p. 012046, feb 2015. 134
- [140] S. Pisa, E. Pittella, and E. Piuzzi, "A survey of radar systems for medical applications," *IEEE Aerospace and Electronic Systems Magazine*, vol. 31, no. 11, pp. 64–81, 2016. 134
- [141] G. Wang, C. Gu, T. Inoue, and C. Li, "A hybrid fmcw-interferometry radar for indoor precise positioning and versatile life activity monitoring," *IEEE Transactions* on Microwave Theory and Techniques, vol. 62, pp. 2812–2822, Nov. 2014. Publisher Copyright: © 1963-2012 IEEE. Copyright: Copyright 2014 Elsevier B.V., All rights reserved. 134
- [142] M. A. Hein, "Ultra-wideband radar sensors for biomedical diagnostics and imaging," in 2012 IEEE International Conference on Ultra-Wideband, pp. 486–490, 2012. 134
- [143] J. Rosen and L. Gothard, Encyclopedia of Physical Science. Facts on File Science Library, Facts On File, 2010. 134
- [144] C. Neipp, A. Hernández, J. Rodes, A. Márquez, T. Beléndez, and A. Beléndez, "An analysis of the classical doppler effect," *European Journal of Physics*, vol. 24, pp. 497–505, 2003. 135
- [145] G. Wang, J. Muñoz-Ferreras, C. Gu, C. Li, and R. Gómez-García, "Application of linear-frequency-modulated continuous-wave (lfmcw) radars for tracking of vital signs," *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 6, pp. 1387–1399, 2014. 135
- [146] G. Wang, C. Gu, T. Inoue, and C. Li, "A hybrid fmcw-interferometry radar for indoor precise positioning and versatile life activity monitoring," *IEEE Transactions* on Microwave Theory and Techniques, vol. 62, no. 11, pp. 2812–2822, 2014. 135
- [147] E. Mogi and T. Ohtsuki, "Heartbeat detection with doppler radar based on spectrogram," 2017 IEEE International Conference on Communications (ICC), pp. 1–6, 2017. 135

# Acknowledgements

I would like to thank Prof. Viktor Krozer for providing me with an opportunity to work in his group and kindly guiding me throughout my PhD years, both academically and personally. It was pleasant to work with Dr Jochen Moll, especially during a RAMMS project.

I am thankful to Prof. Lars Hedrich, without him to get full cadence environment would have been difficult and to Prof. Hartmut Roskos's for his suggestions during talks. Mr Bernd Hills, Ms Simone Sanchez, and Dr Giacomo Ulisse; I appreciate their countless help. I thank Dr Andreas Wentzel for helping me to finish my thesis on time. I would like to remember my colleagues at the Goethe University: Hui, Marie, Qamar, Dovile, Robert, Daniel, Hai and rest of our colleagues; for Barbeque & New-year-party, Hiking, Picnic, Hotpot, Game-evening, summer schools, and a great time altogether.

Thanks to German Aerospace Center (DLR) and Federal Ministry for Economic Affairs and Energy (Germany) for providing funding for the project MMIRAW (50RA1326) and RAMMS (03SX422B) respectively. I worked on these projects to accomplish my PhD work.

I am grateful to my parents, sister, uncle, and Uru for everything.