# Design Automation and Analysis of Resonant Clocking Technologies

A Thesis

Submitted to the Faculty

of

Drexel University

by

Vinayak Honkote

in partial fulfillment of the

requirements for the degree

of

Doctor of Philosophy in Electrical and Computer Engineering

June 2010

© Copyright 2010 Vinayak Honkote. All Rights Reserved.

# Table of Contents

| Li | st of  | Tables   | 5          |                                              | vii |
|----|--------|----------|------------|----------------------------------------------|-----|
| Li | st of  | Figure   | es         |                                              | ix  |
| Ał | ostrac | :t       |            |                                              | xiv |
| 1. | Intro  | oduction | n          |                                              | 1   |
|    | 1.1    | Proble   | em Staten  | nent                                         | 2   |
|    | 1.2    | Contri   | butions of | f this Work                                  | 3   |
|    | 1.3    | Solutio  | on Metho   | dology                                       | 5   |
|    | 1.4    | Organ    | ization of | the Dissertation                             | 7   |
| 2. | Over   | rview of | f Resonar  | t Clocking Technologies                      | 9   |
|    | 2.1    | Rotary   | v Clockin  | g Technology                                 | 13  |
|    |        | 2.1.1    | Rotary     | Wave Generation and Operation (RWO)          | 13  |
|    |        |          | 2.1.1.1    | Crossover Points                             | 14  |
|    |        |          | 2.1.1.2    | Tapping Points                               | 14  |
|    |        | 2.1.2    | Rotary '   | Topology                                     | 15  |
|    |        | 2.1.3    | Properti   | ies of Rotary Clocking                       | 16  |
|    |        |          | 2.1.3.1    | Adiabaticity                                 | 17  |
|    |        |          | 2.1.3.2    | Power                                        | 17  |
|    |        |          | 2.1.3.3    | Rotary Timing                                | 18  |
|    |        |          | 2.1.3.4    | Capacitance Balancing                        | 20  |
|    | 2.2    | Mobiu    | s Standir  | ng Wave Technology                           | 21  |
|    |        | 2.2.1    | Mobius     | Standing Wave Generation and Operation (SWO) | 21  |
|    |        |          | 2.2.1.1    | Crossover Points                             | 22  |
|    |        |          | 2.2.1.2    | Connection Points                            | 23  |
|    |        | 2.2.2    | Mobius     | Standing Wave Oscillator Topology            | 23  |
|    |        | 2.2.3    | Properti   | ies of Mobius Standing Wave Oscillator       | 23  |
|    |        |          | 2.2.3.1    | Adiabaticity                                 | 23  |
|    |        |          | 2.2.3.2    | Power                                        | 24  |

|    |     |                 | 2.2.3.3 SWO Timing                                                               | 25 |
|----|-----|-----------------|----------------------------------------------------------------------------------|----|
|    |     |                 | 2.2.3.4 SWO Capacitance Balancing                                                | 25 |
|    | 2.3 | Litera          | ture Review and Delay Model                                                      | 26 |
|    |     | 2.3.1           | Topology Related Work                                                            | 26 |
|    |     | 2.3.2           | Timing and Physical Design Related Work                                          | 26 |
|    |     | 2.3.3           | Parasitic Analysis Related Work                                                  | 28 |
|    |     | 2.3.4           | Physical Implementation                                                          | 29 |
|    |     | 2.3.5           | Tapping Delay Model                                                              | 29 |
| 3. | Nov | el Topo         | logies for Rotary Clocking Technology                                            | 33 |
|    | 3.1 | CROA<br>Clocki  | A: A Novel Custom Rotary Oscillatory Array Topology for Rotary<br>ing Technology | 33 |
|    |     | 3.1.1           | Motivational Example                                                             | 34 |
|    |     | 3.1.2           | Algorithm for the Custom Ring Implementation                                     | 36 |
|    |     | 3.1.3           | Experimental Results                                                             | 40 |
|    |     | 3.1.4           | Summary                                                                          | 42 |
|    | 3.2 | ZeRO.<br>nology | A: Zero Clock Skew Synchronization with Rotary Clocking Tech-                    | 43 |
|    |     | 3.2.1           | Zero Clock Skew                                                                  | 44 |
|    |     | 3.2.2           | Motivational Example                                                             | 45 |
|    |     | 3.2.3           | ZCS: Proposed Methodology for Zero Clock Skew Synchronization                    | 49 |
|    |     | 3.2.4           | Experimental Results                                                             | 51 |
|    |     | 3.2.5           | Summary                                                                          | 54 |
|    | 3.3 | Review          | w of Rotary Topology Synchronization with Tree Subnetworks                       | 56 |
|    |     | 3.3.1           | Tree Subnetworks for Custom Rings                                                | 56 |
|    |     | 3.3.2           | Capacitance-aware Tree Subnetworks for ROA                                       | 57 |
|    |     | 3.3.3           | Summary                                                                          | 59 |
| 4. | Tim | ing Ana         | alysis and Optimization for Rotary Clocking Technology                           | 60 |
|    | 4.1 | Bound           | led Skew Constraint Methodology for Rotary Clocking                              | 60 |
|    |     | 4.1.1           | Timing Framework                                                                 | 61 |

|     | 4.1.2            | Skew A                  | nalysis                                                                        | 62  |
|-----|------------------|-------------------------|--------------------------------------------------------------------------------|-----|
|     | 4.1.3            | Bounde                  | d Skew Constraint Implementation                                               | 64  |
|     |                  | 4.1.3.1                 | Motivational Example                                                           | 64  |
|     |                  | 4.1.3.2                 | General Methodology                                                            | 65  |
|     | 4.1.4            | Experim                 | nental Results                                                                 | 66  |
|     | 4.1.5            | Summar                  | ry                                                                             | 68  |
| 4.2 | Analy<br>Oscilla | sis, Desig<br>atory Arr | n and Simulation of Capacitive Load Balanced Rotary<br>ay                      | 70  |
|     | 4.2.1            | Effects of              | of Unbalanced Capacitive Load                                                  | 71  |
|     | 4.2.2            | Capacit                 | ive Load Balancing On ROA                                                      | 73  |
|     |                  | 4.2.2.1                 | Problem OCLB: Optimal Capacitive Load Balancing                                | 73  |
|     |                  | 4.2.2.2                 | Experimental Results for OCLB                                                  | 75  |
|     | 4.2.3            | Minimiz                 | ing Wirelength Across Capacitance Balanced ROA                                 | 78  |
|     |                  | 4.2.3.1                 | SOCLB: Sub-optimal Capacitive Load Balancing for<br>Minimum Tapping Wirelength | 78  |
|     |                  | 4.2.3.2                 | Experimental Results for SOCLB                                                 | 79  |
|     | 4.2.4            | Summar                  | ry                                                                             | 81  |
| 4.3 | Skew-<br>Skew    | Aware C<br>Rotary O     | apacitive Load Balancing for Low-Power Zero Clock<br>scillatory Array          | 83  |
|     | 4.3.1            | Motivat                 | ion                                                                            | 84  |
|     | 4.3.2            | Propose                 | d Methodology                                                                  | 88  |
|     |                  | 4.3.2.1                 | SkCLB: Skew Aware Capacitive Load Balancing on ROA                             | 90  |
|     |                  | 4.3.2.2                 | <b>ZCSCLB</b> : Zero Clock Skew Synchronization with Capacitance Balanced ROA  | 91  |
|     |                  | 4.3.2.3                 | Power Analysis                                                                 | 92  |
|     | 4.3.3            | Experin                 | nental Results                                                                 | 93  |
|     |                  | 4.3.3.1                 | SkCLB Results                                                                  | 94  |
|     |                  | 4.3.3.2                 | ZCSCLB Results                                                                 | 97  |
|     |                  | 4.3.3.3                 | Power Analysis Results                                                         | 100 |
|     | 4.3.4            | Summar                  | ry                                                                             | 100 |

| 5. | Tim<br>Star | ing Ana<br>nding W | alysis and<br>/ave Oscil | Optimization for Mobius Implementation of Resonant<br>lator | .02 |
|----|-------------|--------------------|--------------------------|-------------------------------------------------------------|-----|
|    | 5.1         | Design             | n Automa                 | tion Scheme for SWO 1                                       | .02 |
|    |             | 5.1.1              | Propose                  | d Methodology1                                              | .02 |
|    |             | 5.1.2              | Tapping                  | Wirelength Comparison 1                                     | .04 |
|    |             | 5.1.3              | Summar                   | y1                                                          | .05 |
|    | 5.2         | Capac              | itive Load               | d Balancing for SWO 1                                       | .08 |
|    |             | 5.2.1              | Problem                  | Formulation 1                                               | .08 |
|    |             | 5.2.2              | Capaciti                 | ve Load Balancing Results for SWO 1                         | .09 |
|    |             | 5.2.3              | Summar                   | y1                                                          | 13  |
|    | 5.3         | Skew .             | Analysis f               | for SWO                                                     | .14 |
|    |             | 5.3.1              | Propose                  | d Methodology1                                              | .14 |
|    |             | 5.3.2              | Skew An                  | alysis Results1                                             | 16  |
|    |             | 5.3.3              | Summar                   | y1                                                          | .20 |
| 6. | Inte        | rconnec            | t Modelin                | ng and Parasitic Analysis for Rotary Clocking1              | .21 |
|    | 6.1         | PEEC               | Based Ir                 | terconnect Modeling and Parasitic Analysis                  | .21 |
|    |             | 6.1.1              | PEEC E                   | Based Parasitic Analysis 1                                  | 22  |
|    |             |                    | 6.1.1.1                  | Corners                                                     | .23 |
|    |             |                    | 6.1.1.2                  | Gap1                                                        | 25  |
|    |             |                    | 6.1.1.3                  | Custom Ring Topologies of CROA 1                            | .26 |
|    |             | 6.1.2              | Experim                  | ental Results 1                                             | .27 |
|    |             |                    | 6.1.2.1                  | PEEC Analysis 1                                             | .28 |
|    |             |                    | 6.1.2.2                  | Simulation Results with SPICE 1                             | .31 |
|    |             | 6.1.3              | Impact                   | on the Oscillation Frequency 1                              | .35 |
|    |             | 6.1.4              | Power A                  | nalysis1                                                    | .37 |
|    |             | 6.1.5              | Summar                   | y1                                                          | .40 |
|    | 6.2         | Parasi<br>conne    | tic Analy                | sis–Revisited: 3-D Parasitic Modeling for Rotary Inter-     | .41 |
|    |             | 6.2.1              | Modelin                  | g Interconnect Parasitics for 3-D Based Extraction 1        | .41 |

|         | 6.2.1.1                                                                                                                                                                                                                                         | Straight Segments                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                              |
|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
|         | 6.2.1.2                                                                                                                                                                                                                                         | Corner Segments                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                              |
|         | 6.2.1.3                                                                                                                                                                                                                                         | Crossover Segments                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                              |
|         | 6.2.1.4                                                                                                                                                                                                                                         | Gap Segments                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                              |
| 6.2.2   | Parasiti                                                                                                                                                                                                                                        | c Analysis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                              |
| 6.2.3   | Experim                                                                                                                                                                                                                                         | nental Results                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                              |
|         | 6.2.3.1                                                                                                                                                                                                                                         | Results for Interconnect Segment Modeling                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                              |
| 6.2.4   | Power A                                                                                                                                                                                                                                         | nalysis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                              |
| 6.2.5   | Discussi                                                                                                                                                                                                                                        | on on Oscillation Frequency and Phase Velocity .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                              |
| 6.2.6   | Summar                                                                                                                                                                                                                                          | у                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 152                                          |
| clusion | and Futu                                                                                                                                                                                                                                        | re Directions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 153                                          |
| Concl   | usion                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 153                                          |
| 7.1.1   | Topolog                                                                                                                                                                                                                                         | y Related Work                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 153                                          |
| 7.1.2   | Timing                                                                                                                                                                                                                                          | Analysis and Optimization Related Work                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                              |
| 7.1.3   | Parasiti                                                                                                                                                                                                                                        | c Analysis Related Work                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 155                                          |
| Future  | e Directio                                                                                                                                                                                                                                      | ns                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 156                                          |
| 7.2.1   | Synchro                                                                                                                                                                                                                                         | nization Between the Custom Rings in CROA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 156                                          |
| 7.2.2   | Optimal                                                                                                                                                                                                                                         | Placement and Sizing of the Inverter Pairs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 158                                          |
| 7.2.3   | Fabricat                                                                                                                                                                                                                                        | ion of Rotary Rings                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 158                                          |
| raphy   |                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 159                                          |
|         | <ul> <li>6.2.2</li> <li>6.2.3</li> <li>6.2.4</li> <li>6.2.5</li> <li>6.2.6</li> <li>aclusion</li> <li>Concl</li> <li>7.1.1</li> <li>7.1.2</li> <li>7.1.3</li> <li>Future</li> <li>7.2.1</li> <li>7.2.2</li> <li>7.2.3</li> <li>raphy</li> </ul> | 6.2.1.1         6.2.1.2         6.2.1.3         6.2.1.4         6.2.2         6.2.3         Experim         6.2.3         Experim         6.2.3         Experim         6.2.3         Experim         6.2.3         Experim         6.2.3         Experim         6.2.4         Power A         6.2.5         Discussi         6.2.6         Summan         Inclusion and Future         Conclusion         7.1.1         Topolog         7.1.2         Timing         7.1.3         Parasition         Future Direction         7.2.1       Synchrow         7.2.2       Optimal         7.2.3       Fabricat         raphy | <ul> <li>6.2.1.1 Straight Segments</li></ul> |

# List of Tables

| 3.1 | Tapping wirelength comparison for CROA VS ROA                                                                                            | 41  |
|-----|------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 3.2 | Tapping wirelengths for the rotary rings implemented with non zero skew R1-R5 circuits.                                                  | 53  |
| 3.3 | Tapping wirelength comparison                                                                                                            | 53  |
| 3.4 | Change in tapping locations for non-zero skew registers compared with<br>the zero skew registers                                         | 54  |
| 4.1 | Skew (as a % of total clock) and tapping wirelength results for wire snaking based methodology compared with the traditional methodology | 68  |
| 4.2 | Results for OCLB formulation                                                                                                             | 76  |
| 4.3 | Wirelength improvement for SOCLB methodology                                                                                             | 80  |
| 4.4 | Skew mismatch results with ZCS                                                                                                           | 85  |
| 4.5 | Skew aware capacitive balancing results (SkCLB)                                                                                          | 94  |
| 4.6 | Normalized tapping wirelength and skew comparison using SkCLB and ZCSCLB.                                                                | 97  |
| 4.7 | Power dissipation results.                                                                                                               | 101 |
| 5.1 | Tapping wirelength comparison for R1-R5 circuits with varying $\#$ of tapping points.                                                    | 107 |
| 5.2 | Capacitive load balancing results.                                                                                                       | 111 |
| 5.3 | Capacitance variation for R1-R5 circuits.                                                                                                | 111 |

| 6.1 | Comparison of $F_{th_1}$ (frequency without PEEC parasitics) and $F_{th_2}$ (frequency with PEEC parasitics), as approximated by (2.3)                                                                  |
|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 6.2 | Comparison of frequency $F_{sim_1}$ without PEEC parasitics and frequency $F_{sim_2}$ with PEEC parasitics, as simulated in HSPICE                                                                      |
| 6.3 | Comparison of simulated frequency $F_{sim_2}$ (Corner parasitics, SPICE) with<br>the theoretical frequency $F_{th_2}$ (Corner parasitics, PEEC) and with the<br>theoretical frequency $F_{th_1}$ (PEEC) |
| 6.4 | Different interconnect segment modeling topologies and analysis methods 147                                                                                                                             |

6.5  $\,$  Power dissipation on the ring with different segments used for simulation. . 151

# List of Figures

| 1.1  | Various trends of power and frequency for processors                                                                              | 2  |
|------|-----------------------------------------------------------------------------------------------------------------------------------|----|
| 2.1  | A coupled $LC$ oscillator model                                                                                                   | 9  |
| 2.2  | An idealized schematic chart of distributed oscillator                                                                            | 10 |
| 2.3  | A standing wave oscillator with three cross coupled pairs                                                                         | 11 |
| 2.4  | A traveling wave oscillator called rotary clocking                                                                                | 12 |
| 2.5  | Operating principle of rotary clocking                                                                                            | 14 |
| 2.6  | The rotary traveling wave oscillator theory                                                                                       | 15 |
| 2.7  | Basic rotary clock architecture (ROA).                                                                                            | 16 |
| 2.8  | Cross-connected inverter pairs in rotary interconnects                                                                            | 17 |
| 2.9  | ROA interconnect parameters.                                                                                                      | 19 |
| 2.10 | Generation and operating principle of standing wave oscillator                                                                    | 22 |
| 2.11 | Mobius implementation of standing wave on a grid structure                                                                        | 24 |
| 2.12 | Register tapping onto the ring                                                                                                    | 30 |
| 2.13 | Tapping delay model                                                                                                               | 31 |
| 3.1  | Custom rotary clock architecture (CROA)                                                                                           | 34 |
| 3.2  | Standard and custom rotary ring topologies for a sample circuit with 45 registers shown as $(X)$ on a grid size of $5 \times 5$ . | 35 |

| 3.3  | Pseudo-code for the custom router, including an excerpt from the main function and the three functions FindSource, FormRings and Wirelength | 38 |
|------|---------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.4  | "Tapping" the register $R_k$ on to the custom ring at the tapping point P11.                                                                | 39 |
| 3.5  | Clock skew in a clock network.                                                                                                              | 44 |
| 3.6  | Rotary clocking technology implemented on non-zero skew and zero skew circuits                                                              | 46 |
| 3.7  | Rotary ring on non-zero skew circuits                                                                                                       | 47 |
| 3.8  | Tapping point selection for $R_k(x, y)$                                                                                                     | 50 |
| 3.9  | Zero skew requirement of IP block on a System On Chip (SoC)                                                                                 | 52 |
| 3.10 | Custom rings synchronized with tree sub-networks                                                                                            | 57 |
| 3.11 | Capacitance-aware tree subnetworks synchronized with the ROA                                                                                | 58 |
| 4.1  | Distribution of skew mismatch for R1- R5 circuits                                                                                           | 63 |
| 4.2  | Bounded skew constraint methodology to reduce the skew mismatch                                                                             | 65 |
| 4.3  | Limited wire snaking for improved skew                                                                                                      | 67 |
| 4.4  | Distribution of skew mismatch for R1- R5 circuits using a 3.5% bounded skew constraint                                                      | 69 |
| 4.5  | SPICE simulations for unbalanced capacitance distribution on the five (5) rings of the ROA resulting in a frequency variation of 30.31%     | 72 |
| 4.6  | ILP formulation for Problem OCLB.                                                                                                           | 74 |

| 4.7  | Capacitance distribution of R1 on a $5 \times 5$ grid with proposed OCLB formulation. The total capacitance is 286.49 $pF$                                                            | 76 |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.8  | SPICE simulation results for OCLB formulation. Frequency variation by $0.30\%$ for the capacitance imbalance of $k = 2.21$ for R1 circuit                                             | 77 |
| 4.9  | ILP formulation for SOCLB.                                                                                                                                                            | 79 |
| 4.10 | Capacitance distribution of R1 on a $5 \times 5$ grid with the proposed SOCLB formulation. The total capacitance is 116.27 <i>pF</i> .                                                | 80 |
| 4.11 | SPICE simulation results for SOCLB. Frequency variation by 2.40% for the capacitance imbalance of $k = 4.42$ for R1 circuit.                                                          | 82 |
| 4.12 | Capacitance distribution of R1 on a $5 \times 5$ ROA grid with zero clock skew synchronization (ZCS). The total capacitance is 238.480 <i>pF</i>                                      | 85 |
| 4.13 | SPICE simulation results for the ZCS methodology on the five (5) rings of the ROA, resulting in a frequency variation of $10.14\%$ for the capacitance imbalance of $k = 15.490 \ pF$ | 87 |
| 4.14 | SPICE simulation results for the ZCS methodology on the five (5) rings of the ROA, with dummy capacitances to balance the loads                                                       | 89 |
| 4.15 | MIP formulation for SkCLB                                                                                                                                                             | 90 |
| 4.16 | MIP formulation for ZCSCLB.                                                                                                                                                           | 92 |
| 4.17 | Capacitance distribution of R1 on a $5 \times 5$ ROA grid with SkCLB. The total capacitance is 260.060 $pF$                                                                           | 95 |
| 4.18 | SPICE simulation results for the SkCLB formulation. Frequency variation is 2.12% for the capacitance imbalance of $k = 1.296 \ pF$ on R1 circuit                                      | 96 |
| 4.19 | Capacitance distribution of R1 on a 5 $\times$ 5 ROA grid with ZCSCLB. The total capacitance is 145.490 $pF$                                                                          | 98 |

| 4.20 | SPICE simulation results for the ZCSCLB formulation. Frequency variation<br>is 3.62% for the capacitance imbalance of $k = 2.592 \ pF$ on R1 circuit 99                                                                                    |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 5.1  | Registers connecting to rotary ring and standing wave ring implemented<br>on zero skew circuits                                                                                                                                            |
| 5.2  | ILP formulation for capacitive load balancing on SWO110                                                                                                                                                                                    |
| 5.3  | Capacitance distribution of R1 on a $5 \times 5$ grid with proposed ILP formulation.112                                                                                                                                                    |
| 5.4  | Capacitance distribution of $R1$ on a 5 $\times$ 5 grid without capacitive load balancing consideration113                                                                                                                                 |
| 5.5  | Distribution of skew mismatch for R1- R5 circuits without capacitive load balancing                                                                                                                                                        |
| 5.6  | Distribution of skew mismatch for R1- R5 circuits after capacitive load bal-<br>ancing. Circled regions include the registers with non-zero skews for <i>SWO</i> . 119                                                                     |
| 6.1  | Mutual inductance computation                                                                                                                                                                                                              |
| 6.2  | Corners and gap in a custom ring                                                                                                                                                                                                           |
| 6.3  | Possible custom ring topologies with $P_r = 12$ grids                                                                                                                                                                                      |
| 6.4  | Change in mutual inductance when corner segments $P$ and $Q$ are compared with a regular segment $R$                                                                                                                                       |
| 6.5  | Mutual inductance for varying "gap" with a segment length of 1000 units 129                                                                                                                                                                |
| 6.6  | Overall increase in the mutual inductance of a custom ring with an addi-<br>tional corner pair compared with the overall mutual inductance of a regular<br>ring. Note that, the vertical axis is in % (e.g. $0.9\%$ for s=25,w=5units) 129 |
| 6.7  | A portion of the SPICE simulation schematic                                                                                                                                                                                                |

| 6.8  | Clock signal simulated for a rotary ring with no parasitics at the corners 133                           |
|------|----------------------------------------------------------------------------------------------------------|
| 6.9  | Clock signals obtained for CROA topologies with varying number of corner segments                        |
| 6.10 | Percentage increase in power with varying number of corners                                              |
| 6.11 | Total power dissipation on the custom ring with varying number of corners compared with the regular ring |
| 6.12 | Segments on the regular and custom rings142                                                              |
| 6.13 | Different types of segments on a rotary ring                                                             |
| 6.14 | Different types of interconnect modeling topologies146                                                   |
| 6.15 | Mutual inductance for varying "gap". 1<br>unit=1 $\mu m$                                                 |
| 6.16 | Clock waveforms with different rotary geometries                                                         |
| 7.1  | Synchronization of regular rings in ROA using 4 port network                                             |
| 7.2  | Synchronization of custom rings in CROA using 3 port network157                                          |

# Abstract Design Automation and Analysis of Resonant Clocking Technologies

Vinayak Honkote Advisor: Baris Taskin, Ph.D.

The complex structure of clock distribution networks has an increasing impact on timing and power budgets of the modern integrated circuits. Particularly, with the ongoing trend towards higher frequencies and low power, the process of globally distributing the clock signals with high integrity becomes increasingly difficult to implement. To this end, resonant clocking is an attractive alternative to satisfy the high-complexity timing requirements of high-performance VLSI circuits. The adiabatic switching property offers an appealing solution to the limitations of the conventional clocking techniques by circulating the used energy back in the circuit. Resonant clocking technologies, which work on adiabatic switching principles, can generate very high frequency clock signals at a very low power dissipation rate.

This dissertation work is concentrated towards building design automation algorithms and analysis of the rotary and standing wave type resonant clocking technologies. The following critical design aspects are addressed proving the superiority of these technologies when integrated into the mainstream integrated circuit (IC) design flow: i) Topology design for clock generation and distribution of the rotary clocking technology, ii) Synchronization of non-zero clock skew circuits and zero clock skew circuits with the rotary clocking technology, iii) Timing analysis and load balancing for mobius implementation of resonant standing wave technology, iv) Interconnect modeling and parasitic analysis for the rotary clocking technology, v) Power analysis for the rotary clocking technology.

### 1. Introduction

Advances in deep-submicron (DSM) circuit design have led to impressive performance gains in modern digital VLSI circuits. However, the complex structure of clock distribution networks has increasing impact on timing and power budgets of modern integrated circuits. Particularly, with the ongoing trend towards higher frequencies and low power, the process of globally distributing the clock signals with high integrity becomes increasingly difficult to implement. In Fig. 1.1(a), the power dissipation due to different components of the microprocessors is shown [1]. In high performance applications, 15% - 50% of the total power dissipation is attributed to the clock distribution network [1, 2]. With the increase in clock frequency, the power dissipation increases, affecting the performance of the microprocessors adversely [3]. In Fig. 1.1(b), the trends of variation in clock frequency, power, number of transistor and performance/clock for various families of Intel microprocessors over the years are shown [3]. Note that in Fig. 1.1(b), the clock frequency of the processors is saturated at around 4 *GHz* due to the prohibitive increase in power dissipation.

The prevailing methodology to generate high-frequency clock signals is to use onchip frequency multiplication with phase-locked loop (PLL) components [4–7]. The on-chip PLL components occupy chip area and lead to problems with signal reflections, capacitive loading and power dissipation that effectively limit the maximum operating frequency [8]. Physical limitations on the frequency achievable with the conventional clocking further complicate the clock distribution process due to increased jitter and skew.



(a) Approximate power breakdown in the micro- (b) Clock frequency, power, transistors and perprocessors 21064 and 21164 [1].

formance/clock trends for Intel microprocessors [3].

Figure 1.1: Various trends of power and frequency for processors.

#### **Problem Statement** 1.1

The trends in power and frequency limitations of the conventional clocking techniques point towards a need for novel clocking methodologies. To this end, resonant clocking technology, alongside RF-band clocking [9, 10] and optical clocking [11–13], is one of the alternative clocking technologies that are being investigated to satisfy the high-complexity timing requirements of high-performance nano-scale integrated circuits [14, 15]. The adiabatic switching [16] property offers an appealing solution to the limitations of the conventional clocking techniques by circulating the used energy back in the circuit. Resonant clocking technologies, which work on adiabatic switching principles, can generate very high frequency clock signals at a very low power dissipation rate. Furthermore, the resonant clocking technologies [16–30] eliminate the necessity to use a complicated on-chip PLL component.

Rotary traveling wave [17] and mobius standing wave [19] resonant clocking technologies are investigated in this dissertation. Resonant rotary clocking technology is more promising as it provides a constant magnitude clock signal with multi-phase, non-zero clock skew operation. The non-zero clock skew operation is proved to aid in higher operating frequency and circuit performance [31–34]. However, the integration of rotary clocking with the mainstream IC design flow requires extensive design automation and analysis due to the resonant oscillation (which depends on the circuit parasitics) and multi-phase, non-zero clock skew implementation. To this end, the majority of this dissertation work is concentrated towards building design automation algorithms and analysis of the rotary clocking technology. The design automation and synchronization algorithms proposed for rotary clocking are extended to the mobius implementation of standing wave oscillators as well.

The main issues addressed in this dissertation are:

- 1. Topology design for clock generation and distribution in rotary clocking,
- 2. Synchronization of non-zero clock skew circuits and zero clock skew circuits with the rotary clocking technology,
- 3. Timing analysis and load balancing for mobius implementation of resonant standing wave oscillators,
- 4. Interconnect modeling and parasitic analysis for rotary clocking technology,
- 5. Power analysis for rotary clocking technology.

### 1.2 Contributions of this Work

In order to address the issues listed in Section 1.1, the following tasks are performed. The publications for the dissemination of these research tasks are also indicated with each task.

- Development of a novel methodology called Custom Rotary Oscillatory Array (CROA) for the generation and distribution of rotary clocking, published in [35].
- Implementation of a physical design flow for synchronizing "non-zero skew" components with the clock signals generated using CROA, published in [36].
- Implementation of a novel physical design methodology called zero clock skew synchronization (ZCS) for the synchronization of "zero clock skew" components with rotary oscillatory arrays, published in [37].
- Implementation of a bounded skew constraint methodology for rotary clocking, published in [38].
- Development of two novel capacitance balancing methodologies called optimal capacitive load balancing (OCLB) and sub-optimal capacitive load balancing (SOCLB) for the capacitive balance between the rings of the ROA grid, published in [39].
- Implementation of skew-aware capacitive load balancing for low-power zero clock skew rotary oscillatory array, submitted for review in [40].
- Extension of skew analysis and capacitive load balancing techniques to mobius implementation of standing wave oscillators, published in [41–43].
- Analysis of capacitance and mutual inductance on the oscillation frequency of the rotary generated square waves using Partial Element Equivalent Circuit (PEEC) models, published in [44].
- Development of a 3-D finite element based full wave electromagnetic analysis technique for the parasitic modeling of rotary interconnects, submitted for review in [45, 46].

• Development of SPICE based simulation models to verify the operation of the proposed design methodologies for rotary topology, rotary timing and optimization, and rotary power analysis, published in [44] and submitted for review in [45, 46].

#### 1.3 Solution Methodology

In addressing the contemporary issues in timing and design of rotary clocking technology, different concepts in combinatorial algorithms [47], optimization algorithms and operational research are considered. Routing technique such as maze router is used in a novel design methodology for the clock generation using rotary rings [48]. Linear programming (LP) [49] and mixed integer programming (MIP) [50, 51] based methodologies are developed for timing synchronization and wirelength optimization. Mixed signal based design techniques [52–55] are incorporated for interconnect modeling and parasitic characterization for rotary clocking. SPICE circuit simulations [56, 57], 3-D finite element method (FEM) based simulations [58], and C++ high level language [59] based simulations are used in verifying the design methodologies and timing synchronization.

Traditionally, rotary clocking is implemented using regular shaped rings on an array (grid) topology called rotary oscillatory arrays (ROA). The synchronous components are connected to the *preplaced regular square* rings for timing closure. In this dissertation, a novel methodology for rotary clocking design called custom rotary oscillatory array (CROA) is proposed. CROA is designed using the maze router based technique in design automation. A physical design flow for connecting non-zero skew registers on to the custom rings so as to satisfy the register skew requirements is described. An elmore delay based tapping delay model is used for computing delay numbers. A methodology for tapping wirelength savings using combinatorial opti-

mization algorithms is designed and implemented in C++ high level programming language. SPICE based simulation techniques are used to verify the correct operation of CROA design methodology.

Rotary clocking necessitates a non-zero clock skew operation due to the "traveling" nature of the clock signal. However, the typical mainstream design flow usually calls for a zero clock skew implementation. Hence, to cater for zero clock skew synchronization needs, a systematic approach for zero clock skew synchronization with rotary clocking is presented. Further, to minimize the wirelength, a tree based clock routing technique is reviewed for rotary oscillatory arrays.

Timing closure, in terms of satisfying the timing requirements of each local data path and the minimization of clock skew, are important objectives in the design of high performance VLSI systems. A timing framework to analyze skew mismatch and a bounded skew constraint methodology to reduce the skew mismatch in rotary clock synchronization are presented.

The design methodology for the rotary clocking technology requires the analysis of parasitics. Capacitive balance between the rings of the ROA grid is an integral part of operation for rotary clocking due to the implications on clock resonance. The oscillation frequency in CROA is sensitive to change in mutual inductance due to the custom topologies. Mixed signal design techniques are proposed to characterize the oscillation frequency, capacitive loading and inductive effects. To account for unbalanced capacitance, two novel capacitance balancing methodologies using mixed integer programming (MIP) technique are presented. To study the effects of parasitics on the custom ring topologies, a Partial Element Equivalent Circuit (PEEC) [60– 62] based analysis is presented. Further, to characterize the rotary parasitics more accurately, 3-D full wave electromagnetic mixed signal analysis and simulations are incorporated. Rotary clocking is an ultra-low power clocking technology. SPICE simulations are incorporated to verify the generated clock signals and to estimate the power dissipation with the proposed design methodologies for rotary clocking.

#### 1.4 Organization of the Dissertation

This dissertation is organized as follows. In Chapter 2, the resonant rotary and mobius standing wave clocking technologies are reviewed and previous work on these resonant clocking technologies is briefed and the delay model used for rotary clocking implementation is presented. In Chapter 3, the rotary topology related work is presented describing novel CROA and ZeROA topologies, for non-regular ring generation and zero clock skew synchronization, respectively. In Chapter 4, timing analysis and optimization based work is presented describing novel bounded skew constraint and capacitive load balancing methodologies with wirelength optimization techniques for rotary clocking. In Chapter 5, the concepts explained in Chapter 4 are extended for timing analysis and optimization of mobius standing wave oscillators. In Chapter 6, the interconnect modeling and parasitic analysis based work is presented describing PEEC and 3-D based modeling and parasitic analysis for rotary interconnects. Finally, a summary of the dissertation is presented in Chapter 7.

This dissertation can be divided and read along multiple axes depending on the interest of the reader. The multitude of topics presented in this dissertation targeting the two resonant clocking technologies presents cohesive subsets of analysis among the seven chapters for different stand points of the reader. These include a non-zero clock skew integration stand point, zero clock skew integration stand point, system level design stand point and an IC-timing stand point.

From the non-zero clock skew integration stand point, a complete design automation flow for non-zero clock skew synchronization with rotary clocking is presented in Chapters 3 and 4. In Chapter 3, a novel scheme (CROA) is presented for non-regular rotary ring generation. The timing and optimization concerns for the non-zero clock skew synchronization are addressed with a bounded skew constraint methodology and capacitive load balancing methodologies (OCLB and SOCLB) in Chapter 4.

From the zero clock skew integration stand point, a complete design automation flow for zero clock skew synchronization with rotary clocking is presented in Chapters 3 and 4. In Chapter 3, a novel scheme called ZeROA is presented for zero clock skew synchronization. The timing and optimization related concerns for the zero clock skew synchronization are addressed with SkCLB and ZCSCLB, skew-aware capacitive load balancing methodologies in Chapter 4.

From a system level design stand point—readers will find direction in Chapters 3, 4 and 6. In particular, analyses are presented on the generation of rotary rings, timing analysis and optimization, parasitic modeling and verification of oscillations (and operational frequency) and power analysis through simulations in Chapters 3, 4 and 6, respectively.

If a reader views this dissertation with an IC-timing stand point, in Chapter 4, a pedagogical point of view is adopted in presenting timing analysis and optimization based work for rotary clocking. First, a bounded skew constraint methodology is presented from timing perspective. Next, the capacitive load balancing requirements are identified for stable frequency and operation of rotary clock signals. Finally, the bounded skew constraint and capacitive load balancing techniques are integrated towards a robust operation of low-power rotary oscillators.

All these integration methodologies described for the readers with different stand points constitute the design automation and analysis flow for the resonant clocking technologies. These building blocks for the proposed design flow are verified with the SPICE based simulation models for rotary clocking.

### 2. Overview of Resonant Clocking Technologies

Resonant clocking technology is categorized into four main types, based on their resonating components and the generated clock signal pattern:

- 1. Coupled LC oscillator [20–22],
- 2. Distributed oscillator [23, 26, 63–65],
- 3. Standing wave oscillator [16, 19, 24, 25, 66],
- 4. Traveling wave oscillator [17, 27, 67].



Figure 2.1: A coupled LC oscillator model.

*Coupled LC oscillator* based resonant clocking technology is generated by using coupled inductance and capacitive elements at the internal nodes of a clock tree topology. The matched impedance of the coupled LC elements provides a constant magnitude clock signal with a constant phase. A sample coupled LC oscillator is



Figure 2.2: An idealized schematic chart of distributed oscillator.

shown in Fig. 2.1 [20]. A clock signal with constant magnitude and constant phase is similar to the conventional clock signals that are delivered using conventional clock tree networks. The main advantage of coupled LC oscillator based resonant clocking technology over other resonant clocking technologies is that coupled LC oscillator based clocking provides the desired clock signal without any change to the conventional design flows. Higher circuit performances are achievable solely by replacing the clock distribution network with the coupled LC oscillator based resonant clocking technology distribution network.

Distributed oscillator based resonant clocking technology is generated by using coupled inductance and capacitance elements at the leaves (distributed) on the clock tree topology. Similar to coupled LC oscillators, distributed oscillator based resonant clocking technology provides a constant magnitude clock signal with identical phase. Different distributed oscillator configurations such as distributed oscillators using differential amplifiers, distributed oscillators using differential inductors are introduced in [65] and [64], respectively. A sample distributed oscillator using differential amplifiers is shown in Fig. 2.2 [65]. Distributed oscillators overcome the implementational disadvantages of coupled LC oscillators.



Figure 2.3: A standing wave oscillator with three cross coupled pairs.

Standing wave oscillator based resonant clocking technology is generated by sending an incident wave down the transmission line and reflecting it back with a loss-less termination such as a short circuit. Standing wave oscillator provides a varying amplitude clock signal with a constant phase. A sample standing wave oscillator with three cross coupled pairs is as shown in Fig. 2.3 [25]. These designs achieve lowjitter clocks and low power due to the resonance between the clock wire inductance and clock capacitance. Since the clock phase is constant, this technology does not require drastic modifications to the conventional design flows. However, the varying amplitude of the clock signals may result in skew or make the clock buffering more complex.



Figure 2.4: A traveling wave oscillator called rotary clocking.

Traveling wave oscillator based resonant clocking technology, also called rotary clocking technology is the resonant clocking technology of interest in this dissertation. The rotary clocking oscillators rely on the wave traveling principle of transmission lines to generate high frequency clock signals with constant magnitude and varying phase. These oscillators store the energy in the inductors during the discharging stage so that this stored energy can be re-circulated during the charging stage–thus minimizing the dynamic power consumption. A traveling wave oscillator implemented in [17] is as shown in Fig. 2.4.

#### 2.1 Rotary Clocking Technology

The details on the resonant rotary clocking technology are presented in this section. The resonant rotary clocking operation is explained in Section 2.1.1. The topology for rotary clocking operation is presented in Section 2.1.2. The properties of rotary clocking and the reported performance are presented in Sections 2.1.3 and 2.3.4, respectively.

#### 2.1.1 Rotary Wave Generation and Operation (RWO)

A rotary ring consists of a double loop made of interconnects as shown in Fig. 2.5. When the transmission line is excited from one or more points, the traveling wave is established on the cross-connected line. This voltage wave can travel along the transmission lines formed by the parallel interconnects of inner and outer loops. Distributed CMOS inverters are placed uniformly along the transmission lines in anti-parallel configuration. These anti-parallel inverter pairs save power and ensure rotational clock. The anti-parallel inverter pairs serve as transmission line amplifiers to regenerate the square wave. Each pair of anti-parallel inverters on the path of the traveling signal turns on after some time, stimulating the same process at the neighboring pair of anti-parallel inverters in the direction of the wave. Thus, the anti-parallel inverters feed the traveling wave in the stronger direction, up to a stable oscillation frequency. The operation of an individual rotary oscillator [17] is illustrated in Fig. 2.6. Fig. 2.6(a) shows the open loop that conceptually occurs. Fig. 2.6(b) shows the closed loop in steady state of operation where overlap of the traveling waves causes signal negation.



Figure 2.5: Operating principle of rotary clocking.

#### 2.1.1.1 Crossover Points

The traveling wave is inverted on the crossover points, generating different phases of the square wave. In Fig. 2.5, a crossover point is shown as A. The phases at different points on the rotary ring, for a square wave generated with a the crossover are as shown in Fig 2.5. The duty cycles of the multiple clock phases are determined by the location of the crossovers on the ring.

### 2.1.1.2 Tapping Points

The synchronous components of the VLSI circuit can be connected to the rotary rings at certain locations called *tapping points*. These tapping points are marked



Figure 2.6: The rotary traveling wave oscillator theory.

uniformly on the rotary ring. In Fig. 2.5, tapping points are marked as TP1-TP15. The different phases of the square wave at tapping points TP1-TP16 are as shown in Fig. 2.5. Uniform capacitance distribution at these tapping points ensures a constant operational frequency of the rotary ring.

#### 2.1.2 Rotary Topology

Rotary clocking technology is traditionally implemented with a regular array (grid) topology, as shown in Fig. 2.7. ROAs are generated on the cross-connected transmission lines formed by regular IC interconnects. In Fig. 2.7, the ROA topology is implemented on a 3x3 grid with five (5) rotary rings. An oscillation on these rings can start spontaneously upon any noise event or stimulated by a start up circuit for controlled operation [17]. Oscillations on the ROA rings are locked in phase, minimizing the effects of jitter. A four-port network is formed at each junction of two rotary rings on the ROA, which is used to synchronize the oscillations between the respective rotary rings [17]. The synchronization capability is enhanced, providing support for



Figure 2.7: Basic rotary clock architecture (ROA).

both single and multi-phase clocking schemes. A uniform capacitance distribution on each of the rotary rings ensures a constant operational frequency across the ROA. The frequency of the clock signal generated by the rotary clocking technology is limited only by the cutoff frequency of the integrated circuit technology used, and can be manipulated by changing the length or adjusting the loading impedances of the rings in the ROA topology.

### 2.1.3 Properties of Rotary Clocking

The adiabaticity, power dissipation, timing and capacitive balancing properties of rotary clocking technology are described in Sections 2.1.3.1, 2.1.3.2, 2.1.3.3 and 2.1.3.4, respectively.



Figure 2.8: Cross-connected inverter pairs in rotary interconnects.

#### 2.1.3.1 Adiabaticity

The current paths along the cross-connected transmission lines are terminated to each other. The energy that goes into charging and discharging the MOS gate capacitance of the inverters-switching energy- becomes transmission line energy, which in turn is circulated in the closed electromagnetic path. Such conservation of energy is enabled by adiabatic switching [68, 69], in terminating the current path to the transmission line, instead of ground. The coherent switching occurs only in the direction of the traveling path. The inverter switching action is shown in Fig 2.8. An equal amount of energy is launched in the reverse direction, however the latches in this direction are already switched, thus this energy simply serves to reinforce the previous switching events on these registers.

### 2.1.3.2 Power

Once the traveling wave is established in a rotary ring, it takes little power to sustain it. This is due to the adiabaticity as explained in Section 2.1.3.1. The capacitive loads do not contribute towards the power dissipation as they themselves are clocked elements. Thus, the dissipated power on the ring is mainly resistive, given by the  $I^2R$  expression instead of the conventional  $CV^2f$  expression for dynamic power dissipation. The resistive losses are small compared to capacitive losses and hence the total power dissipated for the rotary clocking technology is very low.

## 2.1.3.3 Rotary Timing

For the rotary clocking implementation, the phase and the frequency information are critical. For the phase information, an arbitrary point on the ring is identified as the reference point with a clock signal delay t = 0 and phase  $\theta = 0^{\circ}$ . The clock signal travels along the ring and reaches back the reference point with a phase  $\theta = 360^{\circ}$ . A phase of 360° is defined for notational convenience and is associated with a clock delay equivalent to the clock period. For example, 90° of phase corresponds to T/4units of delay, where T is the clock period. At any point on the ring, the clock signal delay t and the clock signal phase  $\theta$  are correlated through:

$$\frac{\theta}{360} = \frac{t}{T}.\tag{2.1}$$

For the frequency information of the rotary clock signal, the capacitive and inductive properties of the rotary rings need to be identified. A simplification is offered in [17] in modeling the rotary ring as an LC circuit where the total inductance and capacitance of the rotary ring are lumped into  $L_T$  and  $C_T$ , respectively. Assuming a uniform distribution of inductance and capacitance along the ring for simplicity [17], the phase velocity  $v_p$  of the wave is calculated using the per-unit-length differential inductance  $L_l$  and capacitance  $C_l$  as:

$$v_p = \frac{1}{\sqrt{L_l C_l}},\tag{2.2}$$

where  $L_l$  and  $C_l$  are computed from the lumped LC model of the entire rotary ring.



Figure 2.9: ROA interconnect parameters.

Since the traveling wave requires two rotations to complete a clock period, the oscillation frequency is approximated as:

$$f_{osc} \approx \frac{v_p}{2l} = \frac{1}{2\sqrt{L_T C_T}},\tag{2.3}$$

where l is the length of the rotary ring [17]. Note that, a distributed LC model can be used to compute the local phase velocity for improved accuracy. In this work, the simplification of the lumped LC model and the uniform phase distribution are adopted for the ease of presentation. Extension of the current model to the distributed LC model is trivial.

In (2.3), the total inductance  $L_T$  is estimated as [17]:

$$L_T \approx \frac{P\mu_0}{\pi} \log\left[\left(\frac{\pi s}{w+t}\right) + 1\right],$$
 (2.4)

where P, s, w, t, and  $\mu_0$  are the perimeter of the ring, wire separation, wire width, wire thickness and permeability in vacuum, respectively. The total capacitance  $C_T$  is estimated by:

$$C_T \approx \sum C_{reg} + \sum C_{inv} + \sum C_{ring} + \sum C_{wire}, \qquad (2.5)$$

where  $C_{reg}$ ,  $C_{inv}$ ,  $C_{ring}$  and  $C_{wire}$ , are capacitances contributed by the registers, the inverters between the transmission lines, the ring transmission line interconnects and the register tapping wires, respectively. The capacitances  $C_{reg}$  and  $C_{inv}$  are defined based on the types and sizes of the register and inverter components, respectively. The ring capacitance  $C_{ring}$  includes the self and coupling capacitances between the transmission line interconnects. The tapping wire capacitance  $C_{wire}$  depends on the distance between the rotary ring and the registers (clock sinks). The wires used in rotary clocking technology are wide enough such that the wire resistance is negligible.

#### 2.1.3.4 Capacitance Balancing

The operating frequency of a rotary oscillatory array (ROA) depends on the total estimated inductance  $(L_T)$  and capacitance  $(C_T)$  in the system shown in (2.3). Since all the rotary rings on the ROA have same perimeter and structural properties, the inductance of each rotary ring on the ROA is identical. However, the total capacitance, which is composed of four (4) components as explained in (2.3) is not identical due to  $C_{reg}$  and  $C_{wire}$ .  $C_{ring}$  and  $C_{inv}$  which depend on perimeter and structural properties are identical across the rings of the ROA.  $C_{reg}$  and  $C_{wire}$  depend on the number of registers connected to each ring as well as their physical proximity to the ring (which affects tapping wirelength and hence  $C_{wire}$ ). This potential variation of total capacitance on each ring of the ROA affects the stability and robustness of the rotary operation.
## 2.2 Mobius Standing Wave Technology

The details on the mobius implementation of resonant standing wave clocking technology is presented in this section. The mobius standing wave operation is explained in Section 2.2.1. The topology for the mobius standing wave operation is presented in Section 2.2.2. The properties of mobius standing wave technology are presented in Section 2.2.3.

### 2.2.1 Mobius Standing Wave Generation and Operation (SWO)

Standing wave technology is a resonant clocking technology formed by the superposition of two traveling waves of identical magnitude and frequency, but traveling in the opposite directions. A simple way to generate a standing wave is as shown in Fig. 2.10(a). A voltage wave is sent down the differential transmission line and reflected back with a loss-less termination (short). In practice the standing wave generated has amplitude and phase mismatches due to the wire losses [25]. However, by using a shorter wire, wire losses can be minimized. A sample  $\frac{\lambda}{4}$  mode standing wave oscillator design using tapered transmission lines is described in [66]. As shown in Fig. 2.10(a), differential transmission lines are used with one end shorted and the other end connected to a cross coupled inverter pair [66]. The energy injected by the cross coupled inverter pairs propagates in the forward traveling waves and is reflected back at the short circuit termination in the reverse traveling waves. The forward and reverse traveling waves superimpose to form the standing wave.

The standing wave technology provides constant phase clock signals with varying amplitude. Since the phase of the standing wave is constant, this technology is ideally a zero skew technology and hence doesn't need drastic modifications to the conventional design flow. In this work, a new methodology for the standing wave technology called the mobius implementation of standing wave oscillator [19] is considered.



Figure 2.10: Generation and operating principle of standing wave oscillator.

A mobius standing wave oscillator (SWO) is first implemented in [19]. In [19], the standing wave oscillator is implemented using a cross coupled wiring pair. The schematic is shown in Fig. 2.10(b). In this implementation, instead of a short circuit termination, a mobius termination is used. This implementation uses adiabatic clocking to minimize the power loss. The mobius standing wave implementation combines the energy recycling feature of traveling wave oscillator (rotary clocking [17]) with the constant phase feature of standing wave oscillator [25, 66]. A clock recovery circuit is used to recover the clock signal, which has a varying amplitude in the standing wave implementation.

#### 2.2.1.1 Crossover Points

Similar to the traveling wave, the mobius standing wave is inverted on the crossover points. However, since there is only one inverter pair, the clock signal is dual phased. In Fig. 2.10(b), a crossover point is shown as A. Note that, due to the mobius con-

nection, the clock signal is dual phased. Hence, the clock recovery circuits on the top side of Fig. 2.10(b) will have the polarity opposite to the polarity of clock recovery circuits on the bottom side of Fig. 2.10(b).

# 2.2.1.2 Connection Points

On the mobius ring, the points at which the clock recovery circuits recover the clock signal are used to connect the registers to the ring. In Fig. 2.10(b), 24 such points are uniformly marked. These points are called *connection points*. These connection points are the potential locations from where the registers can derive the clock signals. Note that, these connection points have identical phase and amplitude (amplitude at the clock recovery circuit output) properties due to the properties of standing wave.

# 2.2.2 Mobius Standing Wave Oscillator Topology

Similar to the rotary oscillatory arrays (ROA) in [17], mobius rings in the standing wave technology can also be implemented on a grid structure as shown in Fig. 2.11.

## 2.2.3 Properties of Mobius Standing Wave Oscillator

The adiabaticity, power dissipation, timing and capacitive balancing properties of mobius standing wave technology are described in Sections 2.2.3.1, 2.2.3.2, 2.2.3.3 and 2.2.3.4, respectively.

## 2.2.3.1 Adiabaticity

The adiabaticity of the mobius standing wave oscillator is identical to that of the rotary traveling wave oscillator described in 2.1.3.1.



Figure 2.11: Mobius implementation of standing wave on a grid structure.

## 2.2.3.2 Power

Once the standing wave is established in a mobius structure, it takes little power to sustain it. However, the power consumption on the mobius standing wave oscillators consists of two major components. First is the resistive power loss similar to the power dissipated on a rotary oscillator as explained in Section 2.1.3.2. However, the resistive losses are higher in mobius standing wave as the clock signal is not replenished as frequently as in rotary clocking. Next, apart from the resistive losses on the ring, the additional power dissipation is due to the clock recovery circuits which are necessary for faithful square wave generation. Hence, the power dissipated on a mobius standing wave oscillator is more than the power dissipated on a resonant rotary oscillator.

## 2.2.3.3 SWO Timing

The oscillation frequency of the free running standing wave (with no wire losses) can be modeled similar to the frequency of oscillation described in [17], which is expressed as:

$$f_{osc} = \frac{1}{2\sqrt{L_T C_T}},\tag{2.6}$$

where  $L_T$  and  $C_T$  are the total inductance and total capacitance respectively, along the path of the clock signal on the mobius ring. The inductance of the mobius ring depends primarily on the interconnect geometry and is identical for each ring. The capacitance for the mobius ring is composed of four (4) different components similar to the rotary oscillator estimated by (2.5).

## 2.2.3.4 SWO Capacitance Balancing

The stability of operational frequency is partially characterized by the inductance and capacitance distribution on each mobius ring of the standing wave technology implemented on a grid structure (Fig. 2.11) as given in (2.6). The capacitance for the mobius ring is composed of four (4) different components similar to the rotary oscillator capacitance estimated by (2.5). Similar to the rotary oscillator, the inductance,  $C_{ring}$  and  $C_{inv}$  are identical for different rings across the mobius standing wave grid. However, the capacitance balance across the different rings varies depending on the number of registers connected to each ring ( $\sum C_{reg}$ ) as well as their physical proximity to the ring (which affects tapping wirelength and hence  $\sum C_{wire}$ ).

## 2.3 Literature Review and Delay Model

In this section, the previous work on the rotary clocking and the mobius standing wave oscillator is presented. The previous work in these resonant clocking is presented with the perspective of topology, timing, parasitic analysis and physical implementation in Sections 2.3.1, 2.3.2, 2.3.3 and 2.3.4, respectively. The delay model for the ring based resonant clocking implementation is briefed in Section 2.3.5.

### 2.3.1 Topology Related Work

The rotary clocking technology was introduced in [17], with an array of regular rotary rings for clock generation and distribution. In all the previous work focusing on topology and design flow for rotary clocking technology [18, 70–74], as well as the pioneering work in rotary clocking [17], the rotary ring considered is *regular*, and is placed at the geographical center of the chip area using the ROA topology. In this dissertation, it is proposed that the oscillations can be sustained for *non-regular* structures as well. A methodology for the design of custom rings called CROA is proposed. In register synchronization with ROA, the registers are either moved closer or away from the pre-placed ring such that when connected to the tapping points on the ring, the register timing constraints are satisfied with minimal wirelength.

A mobius standing wave oscillator (SWO) technology is first implemented in [19]. Other than [19], there is no published work on the mobius implementation of resonant standing wave oscillator.

## 2.3.2 Timing and Physical Design Related Work

Rotary clocking is conventionally envisioned as a multi-phase non-zero clock skew technology due to the traveling nature of the clock signals generated [17]. Previous studies in [70–73] discuss slightly improved design methodologies for timing synchronization with the rotary clocking technology, while keeping the conventional ROA topology intact. All of these previous studies related to timing address the "non-zero skew" requirement of rotary clocking; some utilize this property positively to improve circuit operation [72, 73], where as most others build the physical design methodology to counteract the effects of non-zero clock skew [70, 71]. The non-zero clock skew has the added advantage of 30% higher clock frequencies on average [75].

In order to *counteract* the effects of non-zero clock skew, a physical design flow with circuit partitioning and register placement is presented in [70]. The common conception of the non-zero skew requirement for rotary clocking is used as a part of the proposed design flow. Towards this end, the registers are pre-placed underneath the rotary ring such that, a fixed number of registers are available for any given clock phase in order to efficiently implement a non-zero skew circuit. In [71], an incremental placement and skew optimization algorithm is presented, where the non-zero skew registers are placed at near optimal locations with respect to a regular rotary ring. A min-cost network flow model is devised to tap all the registers on to the different rotary rings on the ROA such that the total tapping cost is minimized. In [72], the design of rotary based circuits is proposed using retiming and padding concepts to satisfy the non-zero clock skew timing requirements of register components. In order to *utilize* the traveling nature of the rotary clock signal for performance improvement, a methodology to deliver the optimal clock skew schedule to the synchronous components of a circuit is described in [73]. In [73], a sub-optimal clock skew scheduling application is devised, where registers are tapped to available clock phases in a geometrically partitioned (e.g. rings) circuit. In [74], a geometric programming (GP) compatible models are used to implement a single rotary clocking ring with geometric parameters for low power and a desired frequency operation. In [18], regular rotary ring structures are analyzed in detail with the objective of power minimization. Various cases of mutual inductance are considered in choosing the efficient design parameters to minimize power dissipation.

In all the previous work with timing consideration [70–73], rotary clocking is envisioned as a non-zero clock skew technology. In this work, it is demonstrated that the rotary clocking technology can be used to synchronize the zero clock skew circuits as well. A complete design automation analysis flow for zero clock skew synchronization with rotary clocking is presented. A bounded skew constraint methodology is presented for rotary clocking. Further, in this work, it is shown for the first time that maintaining the total capacitive load balance between the rings of the ROA is an important design goal to maintain robust operation (for e.g. resonance).

There is no published design automation work for the mobius standing wave oscillator technology. The physical design and automation methodologies proposed for rotary integration are extended to the mobius implementation standing wave oscillator as well, towards integration with the mainstream IC design flow.

## 2.3.3 Parasitic Analysis Related Work

In the previous work related to parasitics and power analysis [18], regular rotary ring structures are analyzed in detail with the objective of power minimization. In [18], only the mutual inductance based on the regular rotary geometry is considered. Towards this end, presented in this work are interconnect modeling techniques for accurate characterization of parasitics for both custom and regular rotary rings. A PEEC based technique and a 3-D electromagnetic based parasitic extraction techniques are proposed to account for various interconnect geometries of rotary rings. Based on the parasitics extracted, SPICE simulation models are proposed for clock generation and power analysis on rotary clocking.

#### 2.3.4 Physical Implementation

The rotary oscillator generated square waves present low jitter, controllable skew and phase properties. The rotary traveling waves of frequencies as high as 3.4 GHz and 18 GHz are implemented in [17] and [76], respectively, and up to 80% power savings are reported in [17, 77]. The transmission line impedance is on the order of  $10\Omega$  and the differential on-resistance of the anti-parallel connected inverters are in the  $100\Omega - 1k\Omega$  range for a  $0.25\mu m$  technology [17]. Further, a rotary clock based finiteimpulse response (FIR) design is presented in [78], demonstrating a 34.6% saving in clock power and a 12.8% saving in overall circuit power.

Till date, there is no physical implementational work reported for the mobius standing wave oscillators.

#### 2.3.5 Tapping Delay Model

In Fig. 2.12, a sample tapping wire—connecting a register  $R_j$  at location (x,y) to the tapping point TP7 at location  $(x_7,y_7)$  on the rotary ring—is shown. Consider the register  $R_j$  at location (x, y). Let the tapping locations TP1 to TP8, have coordinates  $(x_0,y_0)$  to  $(x_7,y_7)$ , respectively. For simplicity, only a section of the custom ring is shown in Fig. 2.12.

In the simplest case, the register  $R_j$  is connected to the closest tapping point which satisfies the phase requirement for the register  $R_j$ . In Fig. 2.12, assume that the tapping point *TP7* satisfies the phase requirement of the register  $R_j$ . Hence  $R_j$ is connected to *TP7*. The tapping wirelength for  $R_j$  in this case is  $|x - x_7| + |y - y_7|$ . Similarly, each register is connected to the corresponding tapping point and the total tapping wirelength is computed as the sum of individual tapping wirelength of each register.



Figure 2.12: Register tapping onto the ring.

For a more accurate delay computation, an elmore delay model [79] can be used to compute the delays along the tapping wirelengths. In practical applications, a higher order delay model can be used for increased accuracy.

In Fig. 2.13(a), tapping wire—connecting the register to the tapping point TP1 is shown. The tapping wirelength is calculated by adding the rectangular coordinates. In Fig. 2.13(b), RC equivalent circuit for the tapping wire is shown. In Fig. 2.13(b),  $R_w$  and  $C_w$  are resistance and capacitance contributed by the tapping wire, respectively.  $C_{R_j}$  is the input capacitance of the register  $R_j$ . If r, c are per unit resistance and per unit capacitance, respectively, of the tapping wire and l is the tapping wirelength, then,

$$R_w = r.l;$$
$$C_w = c.l;$$



(a) Register tapping onto the ring.



(b) Distributed wire model for tapping wire delay computation.

Figure 2.13: Tapping wire delay model.

Using the elmore delay model, the tapping wire delay t is computed as:

$$t = R_w \left( \frac{C_w}{2} + C_{R_j} \right)$$
  
$$= \frac{R_w C_w}{2} + C_{R_j} R_w$$
  
$$= \frac{rl.cl}{2} + rl.C_{R_j}$$
  
$$= \frac{1}{2} rcl^2 + rlC_{R_j}.$$
 (2.7)

Thus, using the elmore delay model, the delay t of the tapping wire is given by:

$$t = \frac{1}{2}rcl^2 + rlC_{R_j}.$$
 (2.8)

Note that, the clock delay at the register sink is a function of the (clock) delay induced by this tapping wirelength as well as the clock phase (degree) of the tapping point on the ring. On each rotary the register must be connected to a tapping point such that the register phase requirement is satisfied with the minimum tapping wirelength.

The total phase  $\Theta_i(x, y)$  for a register at (x, y) from each tapping point *i* is computed as:

$$\Theta_i(x,y) = \theta_i + \phi \left[ l_i(x,y) \right], \qquad (2.9)$$

where  $\theta_i$  is the phase at any point *i* on the ring and  $\phi[l_i(x, y)]$  is the phase of the tapping wire. The length of the tapping wire  $l_i$  is a function of *x* and *y* given by:

$$l_i(x,y) = |x - x_i| + |y - y_i|.$$
(2.10)

The phase of the tapping wire  $\phi [l_i(x, y)]$  is computed as:

$$\phi \left[ l_i \left( x, y \right) \right] = \left\{ \frac{t_i \left[ l_i \left( x, y \right) \right]}{T} \right\} 360^\circ, \tag{2.11}$$

where T is the clock period and  $t_i [l_i (x, y)]$  is the delay of an interconnect of length  $l_i$ . Using the elmore delay model in (2.8), the total phases at the register sinks are computed as:

$$\Theta_{i}(x,y) = \theta_{i} + \left\{ \frac{\frac{1}{2}rc\left[l_{i}(x,y)\right]^{2} + rl_{i}(x,y)C_{R_{j}}}{T} \right\} 360^{\circ}.$$
 (2.12)

#### 3. Novel Topologies for Rotary Clocking Technology

In this chapter, novel topologies are proposed for rotary implementation. First, a topology based on classic maze-router is proposed in Section 3.1 for minimized tapping wirelength. In Section 3.2, a novel design scheme for zero clock skew synchronization with rotary clocking is presented. In Section 3.3, methodologies to synchronize rotary topology with tree topology are reviewed.

# 3.1 CROA: A Novel Custom Rotary Oscillatory Array Topology for Rotary Clocking Technology

Rotary clocking technology is traditionally implemented with a regular array (grid) topology called *rotary oscillatory arrays (ROAs)* as shown in Fig. 2.7. A number of studies have been performed on the traditional rotary clocking technology with regards to the physical design flow and design automation [70–72, 74]. In all of the previous research, the focus is on devising a design automation scheme for rotary clocking by placing the synchronous components with respect to a grid of *preplaced regular square* rings of the ROA topology. In this dissertation, a custom rotary oscillatory array (CROA) topology is proposed for rotary clocking. The novel CROA topology is proposed on the following premises:

- 1. The rotary rings are permitted to have non-regular, custom shapes.
- 2. The rotary rings are not fixed at the geographical center but are drawn to cover the high register density areas, reducing the tapping wirelengths.
- 3. The synchronous circuit is built as a non-zero skew system in order to be synchronized by the varying phase of the rotary ring.



Figure 3.1: Custom rotary clock architecture (CROA).

Similar to the conventional ROA topology, the CROA topology is implemented on a grid based scheme as shown in Fig. 3.1. Each partition in CROA topology is termed as a *major grid*. The size of each major grid in the CROA topology is determined based on the perimeter  $P_r$  and the placement information for each register. Based on the equations (2.3), (2.4) and (2.5), the ring perimeter is fixed to  $P_r$  corresponding to a traveling wave clock frequency  $f_r$ . For every rotary ring, the perimeter  $P_r$  is kept constant so as to maintain the constant frequency on each ring and minimize jitter.

#### 3.1.1 Motivational Example

Consider a sample major grid consisting of 45 registers preplaced over a  $5 \times 5$  square grid structure as depicted in Fig. 3.2. Let the operating frequency of the circuit be f GHz, which requires a perimeter P of eight (8) grid units [follows from equa-



(a) Standard rotary ring.

(b) Custom rotary ring.

Figure 3.2: Standard and custom rotary ring topologies for a sample circuit with 45 registers shown as (X) on a grid size of 5x5.

tions (2.3) and (2.4)]. The sample rotary rings synchronized with the standard ring topology and the custom ring topology are shown in Fig. 3.2(a) and Fig. 3.2(b), respectively. The registers in Fig. 3.2(a) have the same placement and phase requirements as the registers in Fig. 3.2(b). In order to aid in capacitive load balancing, all the preplaced synchronous components are permitted to connect (tap) onto the ring at pre-selected nodes called *tapping points* as explained in Section 2.1.1.2 of Chapter 2. In Fig. 3.2, each tapping point has two tapping locations, one in the inner differential line and one on the outer differential line, separated by 180°. For simplicity, only the inner differential lines are shown. In Fig. 3.2(a), the eight (8) tapping points on the regular ring, *TP9* through *TP16*, provide clock phases between  $0^{\circ} - 360^{\circ}$  with a  $360^{\circ}/(8\mathbf{x}2) = 22.5^{\circ}$  interval. In Fig. 3.2(b), unlike the square ring in Fig. 3.2(a), the custom ring is drawn closer to the grids having heavy register density. The eight (8) tapping points on the custom ring, *TP1* through *TP8*, provide the clock phases between  $70^{\circ} - 430^{\circ}$ , also with a  $360^{\circ}/8\mathbf{x}2 = 22.5^{\circ}$  interval. The interval of the clock phases between the tapping points is identical, however, the reference phase is shifted to better accommodate the clock phase (delay) requirements of the register sinks to minimize the total tapping wirelength. For instance, the tapping wirelength with the standard rotary ring implementation is 32 grid units. The tapping wirelength with the custom rotary ring approach is 24 grid units, which is a 25% improvement over the tapping wirelength in Fig. 3.2(a). The improvement in the tapping wirelength is obtained by choosing the non-regular ring shapes and placing the custom rings based on the register placement data.

## 3.1.2 Algorithm for the Custom Ring Implementation

The custom ring implementation is inspired from the popular maze router algorithm [48] in the IC design. Based on the perimeter  $P_r$  and the placement information for each synchronous component (e.g. register), the circuit is partitioned into major square grids. The CROA topology is implemented on the major square grids, where each major grid holds one custom rotary ring. The objective of the original maze router algorithm is to find the shortest path from a source node to a target node on a gridded plane. A novel custom router algorithm is developed in order to find a closed path (ring) for the given path length on a gridded plane. The proposed custom router algorithm has similar mechanics and same complexity [O(mn)] for a grid with m rows and n columns] as the original maze router, however, has significantly different objectives.

The well known, maze router algorithm [48] comprises of three stages. The first stage, *wave propagation*, consists of expanding a wave from the source node to the target node. The second stage, *backtrace*, consists of tracing back a path from the target node to the source node. The third stage, *clean up*, consists of removing all the cells that are not a part of the path found in *backtrace* stage. In order to facilitate the maze router based implementation, each major grid is further divided into minor

square grids. The granularity of the minor grid plane is determined based on the number of tapping points on the custom ring. The minimum number of minor grids is limited by the number of tapping points, such that each minor grid holds exactly one tapping point, permitting the application of a maze router like strategy. Higher number of minor grids can be used in order to increase the quality of the final result.

The proposed custom router algorithm comprises of three stages. The three stages of the algorithm, implemented as shown in the pseudo-code in Fig. 3.3, perform the following functions:

- 1. *FindSource*: Identifying the source grid,
- 2. FormRings: Generating all possible rings,
- 3. Wirelength: Computing the wirelengths to find the best possible custom ring.

In the **FindSource** stage, a source grid S(x, y) is identified among the minor grids in each major grid. In the original maze router algorithm [48], the waves are grown from the source node towards the target node. To incorporate a similar style, a source cell is heuristically identified in the custom router. In each major grid (i.e. custom rotary ring), for the given register placement vector  $\langle x, y \rangle$  and the phase information b[phase][j], the minor grid with the most number of registers having the same phase requirement is selected to be the source cell S(x, y).

The **FormRings** stage of the proposed custom router algorithm is similar to the wave propagation stage of maze routing. Waves from the source grid S(x, y) are propagated exhaustively over the entire minor grid structure in each major grid, until the source grid is traced again, thus forming a closed path. The entire solution space consisting of all possible custom rings for the perimeter p is consolidated in vector <Rings >.

S(x, y) **FindSource**(minor grids, vector  $\langle x, y \rangle$ ) { for (j = 1; j < minor grids; j + +) $b[phase][j] \leftarrow \text{form b phase bins};$  $S(x,y) \leftarrow \max_{\forall \ phase.j} \{b[phase][j]\};$ } vector < Rings > FormRings(S(x, y), minor grids, perimeter p)create wavefronts starting from source S(x, y); form closed paths of length p by tracing the source S(x, y) back; vector  $\langle Rings \rangle \leftarrow$  consolidate all the rings with perimeter p; }  $struct(WL_{min}, Ring_i)$  Wirelength(vector < Rings >, minor grids) { for (k = 1; k < vector < Rings > .size(); k + +)determine tapping points;  $WL_k \leftarrow$  compute total tapping wirelength;  $WL_{min} \leftarrow \min_{\forall k} \{WL_k\};$  $Ring_i \leftarrow \text{Ring with the min tapping wirelength } WL_{min};$ } int **main** (int argc, char\* argv[]) { . . . for (i = 1; i < major grids; i + +)S(x, y) =**FindSource**(minor grids, vector  $\langle x, y \rangle$ ); vector  $\langle Rings \rangle =$ **FormRings**(S(x, y), minor grids, perimeter p);  $WL_{min} =$ **Wirelength**(vector < *Rings* >, minor grids); } ... }

Figure 3.3: Pseudo-code for the custom router, including an excerpt from the main function and the three functions FindSource, FormRings and Wirelength.

The **Wirelength** stage of the custom router implementation algorithm is similar to the *cleanup* stage of maze routing technique. The objective of this stage is to search the exhaustive solution space of all possible rings to find the best ring  $Ring_i$ . In order to identify the best ring, first the registers are connected to all the possible rings in the solution space, at a fixed number of points marked as tapping points. Then, the total register tapping wirelengths for all the possible rings are computed. The best ring  $Ring_i$  is defined as the ring that requires the minimum total register tapping wirelength  $WL_{min}$ .



Figure 3.4: "Tapping" the register  $R_k$  on to the custom ring at the tapping point P11.

The three stages of the custom ring implementation algorithm are designed to provide the best possible clock phases to the synchronous components. The proposed CROA design methodology involves non-zero clock skew system design, where the required clock delay at each synchronous component can be different. The required clock phases are provided by "tapping" the synchronous components on to the ring depending on the phase of the clock signal around the ring and the tapping wire. A zero-clock skew system can be similarly synchronized, however, longer tapping wires may be necessary.

On the custom ring of the CROA topology, a fixed number of points are marked as potential tapping points. The definition of tapping points helps in simplifying the physical design in timing and load balancing during oscillation. These tapping locations have phases uniformly distributed between 0° and the clock period T. Consider a register  $R_k$  at  $(x_k, y_k)$  as shown in Fig. 3.4, which has a phase requirement of  $\psi_k$ . This register can be connected to any tapping point, marked P1-P12, on the rotary ring as shown in Fig. 3.4.

Two constraints in connecting this register to a tapping point are the phase requirement of the register and the tapping wirelength. The total phase  $\Theta_i(x_k, y_k)$  from each tapping point  $P_i$  to the register  $R_k$  placed at location  $(x_k, y_k)$  is computed using (2.9). The phase of the tapping wire  $\phi [l_i (x_k, y_k)]$  is given by (2.11). Using (2.9) and (2.11), the register  $R_k$  is connected to tapping point with the phase of  $\theta_{11} = \psi - \phi$ , as the remaining  $\phi$  is provided by the tapping wire. In Fig. 3.4, the register  $R_k$  is connected to  $P_{11}$ . Similar procedure is employed to connect the registers to the best tapping locations. The possibility of connecting every register to all the potential rings in a major grid is evaluated. For all these rings in a major grid, the tapping wirelengths are computed. The ring that results in the minimum tapping wirelength is chosen as the custom ring in the corresponding major grid. Similarly, the best custom ring is selected from each major grid to form the CROA.

#### 3.1.3 Experimental Results

The CROA topology design is tested on the IBM R1-R5 benchmark circuits, which have a number of clock sinks ranging from 267 to 3101. The R1-R5 circuits include only the placement information for each synchronous component and not the phase information. The phase values— ranging from 0° to 360°—are randomly generated for the register sinks in the benchmark circuit files. Note that, the R1-R5 benchmark circuits use a generic unit of length. Thus, the same generic unit is used in order to represent the physical dimensions of wires and transmission lines.

Note that, the presented setup is for the experimentation on R1-R5 benchmark circuits only. In general, for the physical design and timing, data for any circuit can

| Benchmark | Grid           | ROA        | CROA       | Improvement |
|-----------|----------------|------------|------------|-------------|
| R1        | $5 \times 5$   | 2,773,150  | 1,567,220  | 43.49%      |
| R2        | $6 \times 6$   | 6,552,330  | 4,059,290  | 38.05%      |
| R3        | $7 \times 7$   | 8,827,920  | 5,308,910  | 39.86%      |
| R4        | $9 \times 9$   | 19,962,400 | 12,281,800 | 38.58%      |
| R5        | $10 \times 10$ | 33,035,200 | 21,054,800 | 36.27%      |
| Average   |                |            |            | 39.25%      |

Table 3.1: Tapping wirelength comparison for CROA VS ROA.

be used as an input to the proposed CROA design methodology. This includes nonzero skew circuits, where a standard delay format (SDF) file is generated by running a clock skew scheduler on the design, which would provide required clock phases  $\theta_i$ that vary significantly. A zero clock skew circuit, generated by a mainstream physical design flow would provide clock phases  $\theta_i$  that are relatively similar to each other.

For each custom ring in the CROA topology, the perimeter  $P_r$  is selected so as to maintain the constant frequency of oscillation  $f_r$ . The first order delay model is used for delay computation purposes. The grid size of the ROA topology is determined based on the perimeter  $P_r$  and placement information for each register. For example, the IBM benchmark circuit R1 is partitioned into  $5 \times 5$  major ROA grids, considering the size of the circuit and the perimeter computed for a simulated frequency of  $f_r =$  $4.7 \ GHz$ . In order to facilitate the custom ring router, each partition is further divided into  $12 \times 12$  minor grids. For the same frequency, benchmark circuit R5 is partitioned into  $10 \times 10$  major ROA grids and each partition is further divided into a  $12 \times 12$  minor grid structure.

The custom rings are drawn and the register tapping wirelengths are computed based on the algorithm presented in Section 3.1.2. To compare the tapping wirelengths obtained in CROA, regular rings are drawn at the geographical center of each partition to form the traditional ROA. The register tapping wirelength for each ring in ROA is computed. In Table 3.1, the register tapping wirelengths for ROA and CROA are compared. With CROA, 39.25% of the tapping wirelength can be saved on average when compared to regular ROA. This wirelength saving is a direct result of the custom topology design to draw the rings closer to the high register density areas. The tapping wirelength improvement is relatively constant (36.27% to 43.49%) over different size of circuits. An average of 39.25% less tapping wirelength is very significant in reducing the overall power dissipation and the routing congestion of the integrated circuit.

#### 3.1.4 Summary

In this chapter, a novel design methodology called the custom rotary oscillatory array (CROA) is presented. Unlike all the previous research work presented in rotary clocking [18, 70–72, 74], the proposed algorithm takes into account the non-regular ring topologies. With the CROA methodology, a tapping wirelength saving of 39.25% over the traditional ROA is demonstrated.

# 3.2 ZeROA: Zero Clock Skew Synchronization with Rotary Clocking Technology

The square wave generated from the rotary operation is a continuously traveling wave, which provides different phases of the clock signal on the rotary ring. It is commonly conceived that these multiple phases on the rotary rings necessitate a non-zero clock skew operation. The majority of the previous work on rotary clocking deal with the unique timing properties of this technology [36, 70–73]. All of these previous studies address the non-zero skew requirement of rotary clocking; some utilize this property positively to improve circuit operation [72, 73], whereas most others build physical design methodologies to counteract the effects of non-zero clock skew [36, 70, 71].

While it is true that the clock phases differ on the rotary ring, the delay/phase contributed by the tapping wire is often unused. In this chapter, it is shown that such a requirement to provide non-zero clock skew operation is not necessary for rotary clock synchronization. The rotary clock design methodology is investigated to demonstrate that zero clock skew circuits can be built without any change in the physical design stages of placement and routing. As discussed in Section 2.3.5, the clock phase at a register is given by (2.9). The tapping wire delay at the tapping locations can be manipulated (*e.g.* fitted) in order to provide zero clock skew synchronization.

In Section 3.2.1, the zero clock skew is introduced. In Section 3.2.2, a motivational example for zero clock skew synchronization is presented. In Section 3.2.3, the methodology for zero clock skew synchronization with rotary clocking technology is proposed. The experimental results are presented in 3.2.4. A summary of the chapter is presented in 3.2.5.

#### 3.2.1 Zero Clock Skew

Conventional synchronous circuit design is simplified by assuming that the clock signal arrives at all synchronous elements at the same time with minimal skew. Clock skew can be defined as the difference in clock arrival time between two points in a clock network. An example of clock skew is shown in Fig. 3.5. Here, clock signal at



Figure 3.5: Clock skew in a clock network.

c1 leads the clock signal at c2. Clock skew can evaluate to zero, negative or positive depending on the clock arrival times at different points on the clock network. In Fig. 3.5, if  $t_{c1}$  and  $t_{c2}$  are the delays of clock signals c1 and c2 from a common clock source, then  $t_{skew} = t_{c1} - t_{c2}$ . In Fig. 3.5, the clock skew observed is negative. Negative clock skew can improve the minimum clock period of the circuit [32, 80]. On the other hand positive clock skew has a limiting effect on the maximum operating frequency of a synchronous circuit [32, 80]. Clock skew is generally caused by device/interconnect mismatch or process variation.

The clock network design of high performance VLSI systems is focused on minimizing or nullifying the clock skew. Consequently, most of the design automation tool flows are also optimized towards a zero clock skew implementation. It has been previously conceived that the rotary-clock synchronized implementations cannot utilize these zero-or-minimal clock skew driven design automation flows and the inherent non-zero clock skew operation has been deemed a major disadvantage for rotary clock synchronization. In this chapter, it is demonstrated that rotary clocking technology can be effectively used to synchronize zero clock skew circuits as well. It is shown that the tapping locations can be selected *optimally* in order to provide the delivery of the clock signal with identical phases to all the synchronous components. Towards this end, a design automation methodology is developed which entails the placement of rotary rings on a given circuit and the computation of the tapping locations on the ring for a zero skew clock network. It is shown that such zero clock skew network design does not lead to major degradations in the operating scheme (e.q. primarily in terms of wirelength which affects frequency and power profiles). The results are significant in demonstrating that the rotary clocking technology can be used to synchronize zero clock skew circuits. The results are also significant in demonstrating that all legacy designs with a zero-clock skew scheme can be easily redesigned for a rotary clock synchronization, without modifying their physical floor-planning, placement and routing.

#### 3.2.2 Motivational Example

Consider a rotary ring drawn on a sample circuit of 25 registers pre-placed over a square grid as depicted in Fig. 3.6. Let the operating frequency of the circuit be  $f_r \ GHz$ , which requires a perimeter  $P_r$  [follows from (2.3) and (2.4)]. All the pre-placed synchronous components are permitted to connect (tap) onto the rotary ring at pre-selected nodes called *tapping points* (explained in Section 2.1.1.2). Varying clock phases at the tapping points on the rotary ring are shown in Fig. 3.6(a) and



(a) Non zero skew registers tapping on to the ro- (b) Zero skew registers tapping on to the rotary tary ring.

Figure 3.6: Rotary clocking technology implemented on non-zero skew and zero skew circuits.

Fig. 3.6(b). The tapping points are distributed evenly on the ring, thus, the interval of the clock phases between the tapping points is identical. Consider that the register components in Fig. 3.6(a) and Fig. 3.6(b) have non-zero and zero clock skew requirements, respectively. The placement and routing of circuit components for the non-zero and zero skew implementations are identical; only the tapping interconnects to the rings change in order to satisfy the timing constraints.

A typical flow for non-zero skew implementation of rotary ring is shown in Fig. 3.7. The design and implementation of the rotary rings are performed independent of the register timing and placement. On one side, the ring parameters are identified to generate the desired frequency given by (2.3). On the other side, the register timing constraints are determined during the register clock skew scheduling stage and the registers are placed using a placement tool. Thus, the timing requirements for the registers are independent of the phases available on the rotary ring. The



Figure 3.7: Rotary ring on non-zero skew circuits.

previous study in [71] proposes an iterative methodology partially addressing this gap. However, the flow in Fig. 3.7 remains accurate when a design automation tool flow is adopted. Such an adoption also enables and—for practical reasons—requires zero clock skew synchronization.

Consider the non-zero skew design shown in Fig. 3.6(a). Each register  $R_i$  on the circuit has a phase requirement  $\psi_i$ . For instance, consider the registers marked as A and B, which have phase requirements of 65° and 225°, respectively. To satisfy the timing requirements, register A and register B are connected to the tapping points marked as 45° and 135°, respectively. This is so, because the tapping wire (the dotted line) that connects register A and register B to their respective tapping points also

contributes towards the phase delay. Hence, register A is connected to the tapping point at  $45^{\circ}$  with the wire contributing for the remaining  $20^{\circ}$  delay. Register B is connected to the tapping point at  $135^{\circ}$  with the wire contributing for the remaining  $90^{\circ}$  delay. The tapping wires for registers A and B are 25 units and 35 units, respectively.

Consider the zero skew design shown in Fig. 3.6(b). Each register  $R_i$  of the circuit has a *constant* phase requirement  $\psi$ . For illustration purposes, consider the registers have the phase requirement of  $\psi = 90^{\circ}$ . To satisfy the timing requirement, registers A and B [and all the registers in Fig. 3.6(b)] were to be connected to the tapping point giving 90° phase, if tapping wires were very short. However, according to (2.9), depending on the (x, y) locations of the registers, the phase contributed by the tapping wire changes. Thus, all the registers are not connected to the 90° point. In the sample circuit, register A is connected to the tapping wire. Similarly, register B is connected to the tapping wire. Note that, incidentally, the tapping wires for registers A and B are the same length, 30 units, in this example.

In this example, the total tapping wirelength for the non-zero skew synchronization is 500 units. The total tapping wirelength for the zero skew synchronization is 525 units. It is seen that the rotary clocking technology can be used to synchronize the zero clock skew implementation as well, with minimal (or zero) compromise in the total register tapping wirelength. The change in the total register tapping wirelength in this demonstrative example is 5%.

## 3.2.3 ZCS: Proposed Methodology for Zero Clock Skew Synchronization

The proposed methodology to generate zero clock skew synchronization is based on the principle that the tapping wires to connect each register onto the ROA network have intrinsic delays that affect the delivery phase to each register. The design and implementation of the rotary rings in the ROA topology are performed independent of the register timing requirements. The clock phase (e.q. insertion delay) at each register is the sum of the phase of the tapping point on the ROA and the phase contributed by the tapping wire. Some previous work, such as [70, 71], propose to either place the registers close to the ring or once the ring is placed, to move the registers closer to the ring. These methodologies propose significant changes to the physical design flow to implement, which keep the tapping wires relatively short. In terms of the timing requirements, the delivery of clock phases are governed mostly by the tapping location on the ROA due to the relatively short tapping wires. While excessively long wires are detrimental to the overall power dissipation and cause routing congestion, some wirelength is necessary to maintain the oscillation on the ROA [17]. In simplest terms, without altering the physical design flow results, only the tapping locations on the ROA can be changed for each register to provide identical clock phases to be delivered at each register. The delivered clock phases in this case are governed by the tapping location on the ROA as well as the delay induced by the tapping wire.

The ROA topology is implemented to draw rotary rings for a given frequency  $f_r$ . Based on the equations (2.3), (2.4) and (2.5), the ring perimeter is fixed to  $P_r$  corresponding to the frequency  $f_r$ . For each rotary ring, the perimeter  $P_r$  is kept constant so as to maintain the frequency. The grid size of the ROA topology is determined based on the perimeter  $P_r$  and the floor-plan of the circuit.



Figure 3.8: Tapping point selection for  $R_k(x, y)$ .

On every rotary ring corresponding to the square ROA grid, tapping points are marked. As explained in the Chapter 2, an arbitrary point on the rotary ring is marked as the reference point with the phase 0°. Starting from the reference point, various tapping locations are marked on the ring at uniform distances with the corresponding phase values. For every register in a partition, a suitable tapping point is selected based on the phase requirement for each register and the tapping wire phase.

A discussion of the case in Fig. 3.8 exemplifies how the optimal tapping point is selected. Consider the register  $R_k$  at location (x, y), which has a phase requirement of  $\psi$ . This register can be connected to any of the sixteen tapping points on the rotary ring. Assume these tapping locations provide clock phases  $\theta_1 = 0^\circ$  through  $\theta_{16} = 360^\circ$ , respectively, with a 22.5° interval as shown in Fig. 3.8. The total phase  $\Theta_i(x, y)$  to a register  $R_k(x, y)$  from a tapping point *i* is computed using (2.9). The phase of the tapping wire  $\phi [l_i(x, y)]$  is computed using (2.11). Thus, the register  $R_k$  with phase requirement  $\psi$  is connected to tapping point with  $\theta_k = \psi - \phi$  phase using (2.9), as the remaining  $\phi$  phase is provided by the tapping wire. For zero clock skew operation, the tapping points for each register are selected in a similar manner. Note that, in non-zero skew circuits, phase requirements for different registers may be different whereas in zero skew circuits, phase requirement for all the registers will be identical. For a practical bounded skew application, tapping locations are selected such that the required skew is delivered within an upper bound of phase mismatch, while choosing the tapping location that results in the shortest tapping wirelength. Finally, the total tapping wirelength for each ring is computed by adding the tapping wirelength for individual registers to complete the solution for ZCS.

#### 3.2.4 Experimental Results

The rotary clock router algorithm is implemented in C++. The router is tested on a 2GHz x86 processor with a 1GB RAM. The ring perimeter is fixed to  $P_r$  based on the frequency  $f_r$ , which is selected as 3.4 GHz (simulated frequency in [17]). The test data are the IBM R1-R5 benchmark circuits. Input capacitance for the synchronous components, per unit resistance and per unit capacitance of the tapping wires are obtained from the R1-R5 benchmark circuits.

The R1-R5 circuits are considered zero skew circuits and hence include only the placement information for each synchronous component and not the phase information. To facilitate the *non-zero* skew implementation, phase values— ranging from 0° to 360°—are randomly generated for the register sinks in the benchmark circuit files. The random generation of the phase values does not affect the solutions in a particular direction, and to ensure increased generality, averages of multiple random runs are reported.



Figure 3.9: Zero skew requirement of IP block on a System On Chip (SoC).

To illustrate the zero-skew implementations, each register in the R1-R5 benchmark files is assigned with the identical clock phase of  $0^{\circ}$ . Note that, if all the circuit components are synchronized with the rotary clocking, any phase value can be selected to generate the zero clock skew scheme. This is true because zero skew is defined as the difference of the phase values (e.g. clock delays). However, if rotary clocking is used only on an IP block in a system-on-chip (SoC) circuit as shown in Fig. 3.9, the tapping registers on the I/O ports of the IP block should adhere to the overall timing specification of the circuit. A popular method for such SoC block timing is to set the clock skew between I/O ports of the IPs to zero such that, the zero skew scheme is satisfied between the IP blocks. The clock delays at the I/O ports, can similarly take any value, as long as the zero clock skew scheme is satisfied. For convenience, the delays at the I/O ports are set to a constant multiple of clock period T, which translates to a clock phase of  $0^{\circ}$  in rotary clocking. So the clock phase at each synchronous component in the IP block is set to  $0^{\circ}$ . In general, any constant phase  $\theta$ can be used for all synchronous components on the SoC to demonstrate the zero skew operation.

| Benchmark | Grid           | Tapping wirelength for <i>non-zero</i> skew implementation (in microns) |                 |                 |            |            |            |  |
|-----------|----------------|-------------------------------------------------------------------------|-----------------|-----------------|------------|------------|------------|--|
|           |                | Run1                                                                    | Run2            | Run3            | Run4       | Run5       | Run6       |  |
| R1        | $5 \times 5$   | 2,773,150                                                               | $2,\!869,\!830$ | 2,791,590       | 2,803,170  | 2,805,200  | 2,772,280  |  |
| R2        | $6 \times 6$   | $6,\!552,\!330$                                                         | $6,\!415,\!460$ | $6,\!455,\!780$ | 6,619,290  | 6,403,660  | 6,493,790  |  |
| R3        | $7 \times 7$   | 8,827,920                                                               | 8,921,040       | 9,011,010       | 9,065,960  | 8,933,650  | 9,107,250  |  |
| R4        | $9 \times 9$   | 19,962,400                                                              | 19,965,700      | 20,044,900      | 20,115,300 | 19,992,500 | 20,076,900 |  |
| R5        | $10 \times 10$ | 32,878,400                                                              | 32,440,100      | 32,805,400      | 32,843,700 | 32,603,900 | 32,819,300 |  |

Table 3.2: Tapping wirelengths for the rotary rings implemented with non zero skew R1-R5 circuits.

Table 3.3: Tapping wirelength comparison.

| Benchmark | Grid size      | Zero skew wirelength | Non-zero skew wirelength | Change |
|-----------|----------------|----------------------|--------------------------|--------|
| R1        | $5 \times 5$   | 2,780,090            | 2,802,536                | -0.80% |
| R2        | $6 \times 6$   | 6,512,780            | 6,490,051                | 0.35%  |
| R3        | $7 \times 7$   | 8,969,600            | 8,977,805                | -0.09% |
| R4        | $9 \times 9$   | 20,017,400           | 20,026,283               | -0.04% |
| R5        | $10 \times 10$ | 33,208,700           | 32,731,800               | 1.44%  |

Experiments are carried out to compare the tapping wirelength obtained from the rotary clocking implementation on the zero skew circuits with the tapping wirelength on the non-zero skew circuits. In Table 3.2, register tapping wirelengths obtained for R1-R5 circuits are shown. Depending on the placement dimensions and the frequency of the rotary ring, the benchmark circuits R1, R2, R3, R4 and R5 are partitioned into grid sizes  $(5 \times 5)$ ,  $(6 \times 6)$ ,  $(7 \times 7)$ ,  $(9 \times 9)$  and  $(10 \times 10)$ , respectively. The average tapping wirelength for non zero skew circuits is computed over six (6) different runs based on the varying skew requirements for each register.

Similarly, experiments are carried out to compute the tapping wirelength values for zero skew R1-R5 benchmark circuits. In Table 3.3, the tapping wirelength for non-zero skew circuits and the zero skew circuits are compared. For R1, R3 and R4 circuits, tapping wirelengths for zero skew circuits are marginally shorter than nonzero skew circuits by 0.80%, 0.09% and 0.04%, respectively. However, for R2 and R5

| Benchmark | Grid           | Regs | # of registers whose tapping locations change |      |      |      |      | Change |      |        |
|-----------|----------------|------|-----------------------------------------------|------|------|------|------|--------|------|--------|
|           |                |      | Run1                                          | Run2 | Run3 | Run4 | Run5 | Run6   | Avg  |        |
| R1        | $5 \times 5$   | 267  | 229                                           | 225  | 226  | 227  | 231  | 229    | 228  | 85.22% |
| R2        | $6 \times 6$   | 598  | 495                                           | 510  | 511  | 499  | 508  | 515    | 506  | 84.67% |
| R3        | $7 \times 7$   | 862  | 735                                           | 749  | 747  | 744  | 736  | 723    | 739  | 85.73% |
| R4        | $9 \times 9$   | 1903 | 1646                                          | 1620 | 1594 | 1606 | 1625 | 1598   | 1615 | 84.86% |
| R5        | $10 \times 10$ | 3101 | 2651                                          | 2637 | 2670 | 2687 | 2655 | 2646   | 2658 | 85.70% |

Table 3.4: Change in tapping locations for non-zero skew registers compared with the zero skew registers.

circuits, the zero skew tapping wirelengths are marginally longer by 0.35% and 1.44%, respectively. With a crude approximation, a  $\pm 1.5\%$  tapping wirelength variation is observed in implementations on zero clock skew circuits. Thus, tapping wirelength results for zero skew circuits are found to be very close to those for non zero skew circuits, confirming the minimal degradation in total register tapping wirelength.

To identify the consequences of zero vs. non-zero clock skew synchronization for each register tapping wire, the registers which changed their tapping locations moving from non-zero skew implementation to zero skew implementation are identified. The results are reported in Table 3.4. For R1-R5 benchmark circuits, around 85% of the registers changed their tapping points going from a non-zero skew implementation to a zero skew implementation. Even with such a high percentage of registers changing their tapping points, the percentage change in the total register tapping wirelength is below 1.5%, which is marginal. Such a result can be considered as a consequence of the independence of the rotary placement and the circuit placement routines as shown in Fig. 3.7.

# 3.2.5 Summary

In this chapter, synchronization of zero clock skew circuits with rotary clocking technology is shown. In all the previous research on rotary clocking [36, 70–72], clocking technology is envisioned exclusively as a non-zero clock skew synchronization technology. With the proposed methodology, the feasibility of using rotary clocking as a zero clock skew synchronization technology is investigated. With the experiments on IBM R1-R5 benchmark circuits, the tapping wirelength results are found to be very close to those for non zero skew circuits. In particular, the tapping wirelength variation of  $\pm 1.5\%$  is observed, demonstrating minimal degradation in register tapping wirelength. These results are encouraging in proving the feasibility of using industrial tool flows (placement and routing) targeting zero clock skew implementations in rotary-clock-synchronized-circuits.

## 3.3 Review of Rotary Topology Synchronization with Tree Subnetworks

The connection of registers to the rotary ring is a tedious process. The conventional approach is to connect registers individually to the tapping points. Some studies investigate moving the registers closer to the rings in order to shorten the tapping interconnects [70, 72]. An appealing technique is to create tree subnetworks connecting a number of registers to the same tapping point on the ROA. Such synchronization of rotary topology with the tree subnetworks is investigated in [30] and [81].

#### 3.3.1 Tree Subnetworks for Custom Rings

In [30], a two-step process is proposed for synchronization of custom rotary rings with the tree based subnetworks. First, the *tree generation* step is employed, where the tree subnetworks are built for the register components of a circuit, such that, the root node of each tree requires a specific clock delay (skew). Next, the *ring generation* step is employed, where the custom rotary rings are generated so as to satisfy the delay requirements for the root nodes of the tree subnetworks. In Fig. 3.10, a custom ring synchronized with the tree subnetworks is shown.

In the tree generation, the chip area is partitioned in to multiple clusters of the registers sinks using a multilevel hypergraph partitioning tool hMETIS [82]. A variation of the Binary Search Tree/Deferred Merge Embedding (BST/DME) [83] algorithm proposed in [84] is used to generate a binary clock tree. The Minimum Wirelength Prescribed Skew Routing Tree Problem [84] algorithm makes use of the clustering technique described in [85] to generate clock trees with required clock skew (non-zero skew) at each sink with the minimal total wirelength. This step is applied iteratively in [30], to find the tree subnetwork with the minimal total wirelength.

In the *ring generation*, the roots of the trees generated in the *tree generation* stage are evaluated by the required skew at each root. A greedy *collision detection*


Figure 3.10: Custom rings synchronized with tree sub-networks.

and avoidance heuristic is employed to draw the custom ring. The custom ring is then drawn such that the required skew delivered to the roots of the tree subnetworks is within a user-defined skew threshold. The experimental results in [30] demonstrate a 38.6% improvement on average in wirelength compared to the conventional tree topologies.

## 3.3.2 Capacitance-aware Tree Subnetworks for ROA

In [81], a three-step process is proposed for synchronization of rotary rings with the tree based subnetworks. First, the *ROA generation* step is employed to draw the rotary rings. Next, the *Tree generation* step is employed, where the tree subnetworks are built for the register components of a circuit. Finally, the *Routing* step is employed to connect the tree subnetworks to the ROA, such that, the capacitive load on each tapping point is balanced. In Fig. 3.11, a balanced tree subnetwork synchronized with the ROA is shown.



Figure 3.11: Capacitance-aware tree subnetworks synchronized with the ROA.

In the *ROA generation* step, the rotary rings are generated using the methodology presented in 3.2. The tapping points are identified on the ring where the tree subnetworks will get connected.

In the *Tree generation* step, the BST/DME [83] is employed but modified, in order to obtain the tree subnetworks with balanced capacitive loads and minimum tree length. A bottom up clustering algorithm is developed to achieve the tree subnetwork design task.

In the *Routing* step, the tree subnetworks are connected to the ROA tapping points such that, the capacitive load at each tapping point is balanced. A *Balanced Tapping Points Assignment Algorithm* is developed [86–89] in order to obtain a oneto-one mapping between the tapping points identified in *ROA generation* step and the Tree subnetworks generated in *Tree generation* step. In addition to the balanced capacitive loading, the skew requirements at the tapping points are satisfied with the limited tapping wires.

The experiments in [81] demonstrate an 82.1% improvement in tapping wirelength compared to the wirelength results in the best known previous work [39]. Further, the clock waveforms simulated show minimal variation in the frequency (0.1%) and a 15.8% power savings on average due to the tree based rotary clock routing.

#### 3.3.3 Summary

In this section, the synchronization of rotary topology with the tree subnetworks investigated in [30] and [81] is reviewed. The methodologies presented here provide a blueprint for the rotary tree routing, however, can be improved for better operation and performance. In the remainder of this dissertation, more important issues concerning rotary clocking such as, timing analysis and optimization, interconnect modeling and parasitic extraction, power analysis are investigated in detail to enable easy integration of this emerging technology with the mainstream IC flow.

## 4. Timing Analysis and Optimization for Rotary Clocking Technology

*Timing closure*, in terms of satisfying the timing requirements of each local data path and the minimization of clock skew, are important objectives in the design of high performance VLSI systems. Rotary clocking being an emerging technology for high frequency and low power clock signals, necessitates further research into the design automation, timing analysis, verification methodologies for integration into the mainstream IC design flow. To this end, first, a bounded skew constraint methodology is presented in Section 4.1. Next, in Section 4.2, the negative effects of the unbalanced capacitive loading on the ROA are demonstrated and methodologies to achieve the balanced capacitive loading are presented. Finally, in Section 4.3, the bounded skew constraint and the capacitive load balancing are integrated to achieve a skew-aware load-balanced zero clock skew rotary oscillatory array.

#### 4.1 Bounded Skew Constraint Methodology for Rotary Clocking

The square wave generated from the rotary operation with adiabatic switching is a continuously traveling wave, which provides multiple phases of the clock signal on the rotary ring. The design automation of this multi-phase rotary clocking is investigated in [36, 71, 72], however, with the major design simplifications of the phase assignments for scalability. The effects of these simplifications are analyzed with a timing framework and a skew analysis. A limited amount of research has been done on rotary clocking since it was introduced in [17]. In all the previous work related to timing (presented in Section 2.3.2), the common simplification strategy is identifying a finite number of tapping points on the ring for register tapping. In this section, the effects of the simplification on skew are analyzed. Further, the results of the skew analysis are used in devising a bounded skew constraint technique to reduce the overall skew mismatch.

In Section 4.1.1, a timing framework is developed for skew analysis. In Section 4.1.2, skew analysis is presented for non-zero clock skew and zero clock skew synchronization with rotary clocking. It is shown that the current design automation methods may lead to skew. Consequently a bounded skew constraint methodology is presented in Section 4.1.3 for rotary clock synchronization. The experimental results are presented in Section 4.1.4. A summary of the section is presented in Section 4.1.5.

#### 4.1.1 Timing Framework

Consider a register  $R_j$  located at (x, y). The register  $R_j$  taps on to the rotary ring at a tapping point  $TP_i$  which satisfies the phase requirement of the register. The selection of the tapping point for  $R_j(x, y)$  depends on:

- 1.  $\Theta_{R_j}$  the phase requirement of the register  $R_j$ ,
- 2.  $\Theta_{TP_i}$  the phase available at the tapping point  $TP_i$ ,
- 3.  $\Theta_{l_i}$  the phase attributed to the tapping wire  $l_i$ .

The tapping point for the register  $R_j$  is chosen such that the following relation is satisfied:

$$\Theta_{R_j} = \Theta_{TP_i} + \Theta_{l_i}. \tag{4.1}$$

The phase requirement  $\Theta_{R_j}$  of each register  $R_j$  is obtained from the clock skew scheduling tool in design automation or can be selected to be identical for all registers for a zero clock skew implementation. The tapping point phases  $\Theta_{TP_i}$  at various tapping points  $TP_i$  are distributed between 0° and 360° due to the traveling nature of the rotary clock as explained in Section 2.1.3.3. The phase contributed by the tapping wire  $\Theta_{l_i}$  depends on the tapping wire  $l_i$ . The function  $t(l_i)$  is the delay computation function to compute the delay contributed by the tapping wire of length  $l_i$ . Based on the desired level of accuracy the delay computation function of any model order can be chosen.

In the ideal case, the register  $R_j$  is connected to the tapping point  $TP_i$  which satisfies (4.1). This is only possible if the rotary ring has infinite tapping points. However, for scalability, a finite number of points spread uniformly through out the ring are identified as potential tapping points. The registers are connected to these tapping points such that the skew mismatch is minimal. The skew mismatch in connecting the register  $R_j$  to the tapping point  $TP_i$  is termed  $S_{j,i}$  and computed as:

$$S_{j,i} = \Theta_{R_j} - (\Theta_{TP_i} + \Theta_{l_i}). \tag{4.2}$$

The worst  $S_{j,i}$  for all  $R_j$  and  $TP_i$  pairs defines the overall skew (i.e. global skew).

## 4.1.2 Skew Analysis

The skew values are analyzed for the ROA topology described in Section 2.1.2. The ROA topology is implemented to draw rotary rings for a given frequency  $f_r$ on the IBM R1 to R5 benchmark circuits. Based on the equation (2.3), the ring perimeter is fixed to  $P_r$ , corresponding to the frequency  $f_r$ . For each rotary ring, the perimeter  $P_r$  is kept constant so as to maintain the constant frequency. The grid size of the ROA topology is determined based on the perimeter  $P_r$  and the placement information (x, y) for each register. For example, the IBM benchmark circuit R1 is partitioned into  $5 \times 5$  grids depending on the size of the circuit and the perimeter computed for a frequency of  $f_r = 3.4 \ GHz$  of a rotary-clock synchronized circuit.

Depending on the register phase requirement  $\Theta_{R_j}$ , the phase available at the tapping point  $\Theta_{TP_i}$  and the phase generated by the tapping wire  $\Theta_{l_i}$ , the tapping points are chosen for the registers such that the skew generated  $S_{j,i}$  is minimal computed



Figure 4.1: Distribution of skew mismatch for R1- R5 circuits.

by (4.2). Previous works in [36, 72] also adopt this procedure. In Fig. 4.1, the skew distribution by using the methodology described in [36, 72] is shown for the IBM R1 through R5 benchmark circuits. Note that, criticality in the static timing analysis is defined for the worst case skew mismatch, thus, for the tail ends of the distributions in Fig. 4.1. For the IBM R1-R5 benchmark circuits, an average (worst-case) skew mismatch of 5.56% of the clock period is observed. This observation is used as a motivation to propose the bounded skew constraint methodology described in Section 4.1.3.

#### 4.1.3 Bounded Skew Constraint Implementation

The bounded skew constraint methodology is devised to reduce the skew mismatch based on the skew distribution obtained in Section 4.1.2. A motivational example demonstrating the method is presented in Section 4.1.3.1. The methodology is formally presented in Section 4.1.3.2.

#### 4.1.3.1 Motivational Example

Consider a phase requirement of  $\Theta_{R_J} = 135^{\circ}$  for the register  $R_j$  at (x, y) in Fig. 4.2. This register can be connected to any of the available tapping points on the rotary ring. For simplicity, only a section of the ring is shown in Fig. 4.2 and only the tapping points TP4 and TP5 are considered. The phases available at tapping points TP4 and TP5 are, 90° and 120°, respectively. If the register is connected to tapping point TP4then the phase contributed by the tapping wire is 20°. The skew mismatch in this case is  $S_{j,4} = 135^{\circ} - 90^{\circ} - 20^{\circ} = 25^{\circ}$ . If the register is connected to tapping point TP5then the phase contributed by the tapping wire is 25°. The skew mismatch in this case is  $S_{j,5} = 135^{\circ} - 120^{\circ} - 25^{\circ} = -10^{\circ}$ . Based on the analysis in Section 4.1.2, the register is connected to the tapping point TP5 as the skew mismatch is the smallest. Note that, if the skew mismatch is positive then the tapping wirelength can be artificially increased to nullify the total skew mismatch.

For instance, at the tapping point TP4, the skew mismatch is positive. Hence a wirelength corresponding to  $25^{\circ}$  is added to achieve a perfect skew match. The total wirelength in this case is 50 units. On the other hand, if the skew mismatch is negative, then based on the double loop design of the rotary ring as explained in Section 2.1.1, register can be connected to the other differential line, adding a phase of  $180^{\circ}$  to  $\Theta_{TP'_i}$ , where  $TP'_i$  is the pair for  $TP_i$ . Then, similar procedure of wiresnaking can be used to nullify the skew mismatch. In Fig. 4.2, at the tapping



Figure 4.2: Bounded skew constraint methodology to reduce the skew mismatch.

point TP5 the skew mismatch is negative. Hence a wirelength corresponding to phase  $(-10^{\circ} + 180^{\circ} = 170^{\circ})$  is added to achieve a perfect skew match. The total wirelength in this case is 100 units.

Since the new total tapping wirelength for TP4 is less than that for TP5, the register  $R_j$  is connected to TP4. In this case, the wirelength is (100%) greater than the original tapping wirelength using the method described in Section 4.1.2 (25 units). However, a perfect skew is achieved. Often times, a less than perfect skew can be achieved with a less than the worst case 100% wirelength increase, which can be formidable in most high performance designs. For instance, at the tapping point TP4, for a practical skew upper bound of 4% of the total clock period, the wirelength required is 35 units, which is approximately 40% increase in the original wirelength.

#### 4.1.3.2 General Methodology

Consider a rotary clocking implementation with tapping points  $TP_i$  distributed uniformly along the rotary ring similar to the simplification methodology adopted in [72] and [36]. According to the analysis in Section 4.1.2, the skew mismatch for connecting the register  $R_j$  to a tapping point  $TP_i$  is  $S_{j,i}$ . To maintain the skew within a practical bound, an upper bound for skew  $S_{UB}$  is chosen such that  $S_{j,i} < S_{UB}$ . For the perfect skew balance, the upper bound  $S_{UB} = 0$ . Let the minimum tapping wirelength required to connect a register  $R_j$  to a tapping point  $TP_i$  be  $WL_i$ .

For each tapping point  $TP_i$ , the amount of wire snaking required to keep the skew within an upper bound  $S_{UB}$  is computed as  $WS_i$ . Let the tapping wirelength for each tapping point with wire snaking be  $WL_i^*$ . Then:

$$WL_i^* = WL_i + WS_i. \tag{4.3}$$

Let the total register tapping wirelength without wire snaking and with wire snaking be  $\Sigma WL_i$  and  $\Sigma WL_i^*$ , respectively. The proposed design methodology reduces the skew mismatch and for an upper bound of  $S_{UB} = 0$ , achieves a perfect skew match. However, as explained in Section 4.1.3.1,  $\Sigma WL_i^*$  can be more compared to  $\Sigma WL_i$ based on the skew mismatch. A more practical approach to wire snaking involves the improvement of clock skew (sub-optimally) with an upper bound  $S_{UB}$  for a moderate wirelength increase.

#### 4.1.4 Experimental Results

The proposed methodology for the reduced skew mismatch is tested on the IBM R1-R5 benchmark circuits. In the proposed methodology, the skew upper bound  $S_{UB}$  is varied from 5% (of the total clock period) to 0 (perfect skew). For each case, the additional tapping wirelength required to achieve the skew  $S_{UB}$  is computed. In Fig. 4.3, the wirelengths required for the IBM R1-R5 benchmark circuits, to keep the skew under 5% (of the total clock period) to 0% (perfect skew), are plotted. It is observed that the register tapping wirelength is indirectly proportional to the permitted skew.



Figure 4.3: Limited wire snaking for improved skew.

The skew distribution in Fig. 4.1 and the wire snaking wirelengths in Fig. 4.3 are further analyzed in the light of the fact that the worst case skew mismatch is a bottleneck in timing analysis. It is observed from Fig. 4.1 that the majority of the registers have low skew mismatch and only a low number of registers at the tail end of the distributions cause the worst case skew. Thus, a practical application of the proposed wire snaking approach is devised to primarily target these registers. In this practical application, the skew mismatch is significantly improved while the wirelength increase due to wire snaking is significantly limited. Note the flat curve on the left hand side of Fig. 4.3. The wirelength increase is minimal until the skew is 3.5% (of the total clock). A vertical line signifying this observation is drawn in Fig. 4.3. The wirelength increase for a skew of below 3.5% is very high (up to 135.11% increase for perfect skew balance on the R1 benchmark circuit). Hence, for practical consideration, a skew of 3.5% is chosen with the minimal increase in wirelength (around 1.25%) compared to the wirelength computed using the methodology presented in [36, 72].

The skew distributions for IBM R1-R5 benchmark circuits are plotted for the bounded skew constraint implementation with  $S_{UB} = 3.5\%$  (of the total clock pe-

| Benchmark | Traditional [36, 72] | Perfect Skew  |             | Practical Skew |             |
|-----------|----------------------|---------------|-------------|----------------|-------------|
| Circuit   | Skew                 | $Skew_{zero}$ | WL increase | $Skew_{prac}$  | WL increase |
| R1        | 5.00%                | 0.00%         | 135.11%     | 3.50%          | 1.37%       |
| R2        | 5.00%                | 0.00%         | 127.68%     | 3.50%          | 1.23%       |
| R3        | 5.56%                | 0.00%         | 138.08%     | 3.50%          | 1.49%       |
| R4        | 5.28%                | 0.00%         | 134.19%     | 3.50%          | 0.96%       |
| R5        | 5.56%                | 0.00%         | 133.71%     | 3.50%          | 1.22%       |
| Average   | 5.28%                | 0.00%         | 133.75%     | 3.50%          | 1.25%       |

Table 4.1: Skew (as a % of total clock) and tapping wirelength results for wire snaking based methodology compared with the traditional methodology.

riod) in Fig. 4.4. In Table 4.1, the wirelength increase required for practical skew and the ideal skew results obtained are compared with the skew and wirelength computed using the methodology in [36, 72]. The average skew variation observed using the methodology presented in [36, 72] is 5.28% (of the total clock). For an ideal skew implementation, a wirelength increase of 133.75% is observed on average, when compared to the wirelength computed in the methodology presented in [36, 72]. The increase in the total wirelength is very high. However, the advantage is that an ideal skew (0.00% skew of the total clock) implementation is possible. For a more practical skew implementation, a skew of 3.50% is chosen. With this methodology, the wirelength increase is minimal at approximately 1.25% on average. Hence, by using the bounded skew constraint methodology, the skew variation can be reduced from 5.28% to 3.50%, with a minimal increase in the total tapping wirelength (around 1.25%).

## 4.1.5 Summary

In this section, the effects of design simplification in devising an automation scheme for the resonant rotary clocking are analyzed. The results of skew analysis are used in designing a bounded skew constraint methodology to reduce the skew



Figure 4.4: Distribution of skew mismatch for R1- R5 circuits using a 3.5% bounded skew constraint.

mismatch in rotary clocking. With the experiments performed on the IBM R1-R5 benchmark circuits, it is observed that with a practical skew bound, the skew mismatch can be reduced from 5.28% to 3.5%, with a minimal increase of 1.25% in the total tapping wirelength.

# 4.2 Analysis, Design and Simulation of Capacitive Load Balanced Rotary Oscillatory Array

The rotary clocking technology provides constant magnitude clock signals with varying phase. Due to the "traveling" (varying phase) nature of the clock signal, the distribution of the rotary clock on the Rotary Oscillatory Array (ROA) distribution network features non-zero clock skew operation. The non-zero clock skew has the added advantage of 30% higher clock frequencies on average [75]. The tapping of registers on to the ROA network to satisfy the non-zero clock skew requirements, however, might degrade the capacitive balance of the rotary rings. The capacitive load distribution is an integral part of the operation of rotary clocking technology due to its implications on clock resonance. The majority of previous work on rotary clocking deal with the unique timing properties of this technology [36, 70–73]. However, there is no published design automation work on the capacitance analysis and balancing for the rotary clocking technology. Towards this end:

- 1. The effects of unbalanced capacitance distribution on the clock frequencies of the rings of ROA are analyzed using SPICE simulations,
- 2. A novel scheme called OCLB (optimal capacitive load balancing) is proposed for the rotary rings of the ROA,
- 3. A practical scheme called SOCLB (suboptimal capacitive load balancing) is proposed with the objective of reducing the overall wirelength of OCLB,
- 4. SPICE simulations are performed for both OCLB and SOCLB, and the resultant clock waveforms are presented.

The effect of unbalanced capacitive load on the clock frequencies is demonstrated using SPICE simulations in Section 4.2.1. The proposed OCLB methodology with the experimental results is presented in Section 4.2.2. In Section 4.2.3, SOCLB methodology is presented. The section is summarized in Section 4.2.4.

#### 4.2.1 Effects of Unbalanced Capacitive Load

The stability of the rotary operational frequency  $f_{osc}$  is partially characterized by the capacitive load distribution on each ring of the rotary oscillatory array. The operating frequency of a rotary ring depends on the device parameters as shown in (2.3). The inductance of the rotary ring depends primarily on the interconnect geometry and is identical for each ROA ring. The capacitance for the rotary ring is composed of four (4) different components as explained in (2.5) and  $C_{inv}$  and  $C_{ring}$  are identical for each ROA ring.  $C_{reg}$  and  $C_{wire}$  depend on the number of registers connected to each ring as well as their physical proximity to the ring. SPICE simulations are performed to observe the effects of an unbalanced capacitance distribution on the frequency of the rotary rings of the ROA.

Alternative SPICE models for rotary clocking have been proposed in [18, 19], which accurately capture the transmission line behavior. However, the variation in the capacitive load across the rings of the ROA, and the effects of such a variation on the frequency of the rings are not addressed by the previous work in [17–19, 72]. Towards this end, a SPICE model is created in order to develop an accurate simulation model for rotary clocking. The existing U-element from HSPICE is used to model the lossy transmission line [56], however, with the modified SPICE netlist to incorporate the capacitive load variation across the rotary rings due to changing  $C_{wire}$ .

The circuit is setup in SPICE to display five (5) of the ROA rings of a relatively slow frequency for rotary clocking. In this demonstrative setup, the total capacitance on each of the five (5) rings is varied in order to simulate the unbalanced capacitive load distribution. For the five (5) selected rotary rings Ring1, Ring2, Ring3, Ring4



Figure 4.5: SPICE simulations for unbalanced capacitance distribution on the five (5) rings of the ROA resulting in a frequency variation of 30.31%.

and Ring5, the total capacitance loads of 10 pF, 20 pF, 30 pF, 40 pF and 50 pF, are modeled respectively. The total capacitance of each ring is uniformly distributed to the tapping points within the corresponding ring. The clock waveforms observed for this setup are shown in Fig. 4.5. Across the different rings of the ROA, a maximum variation of 30.31% in frequency is observed from 1.281 GHz to 1.838 GHz. Note that, in addition to the unmatched frequencies, the oscillations are not very stable due to the high capacitance imbalance across the rings of the ROA. When the synchronous components are connected to different rotary rings and different tapping points on the rotary ring—in order to satisfy the skew requirements—similar variations might occur.

#### 4.2.2 Capacitive Load Balancing On ROA

As shown in Section 4.2.1, the unbalanced capacitance distribution across the rotary rings causes very poor oscillation characteristics, which renders rotary clocking impractical for high frequency implementations. Hence, to obtain a stable operating frequency of the rotary ring, the capacitive load should be balanced across the ROA. Towards this end, **Problem OCLB** for the optimal capacitive load balancing is formulated in Section 4.2.2.1, and the experimental results for Problem OCLB are presented in Section 4.2.2.2.

### 4.2.2.1 Problem OCLB: Optimal Capacitive Load Balancing

Consider *i* number of registers synchronized by *j* number of rotary rings on the ROA. For a capacitive balanced implementation, each register needs to be assigned to a rotary ring in the ROA depending on the *capacitive cost*. The capacitive cost of each tapping wire reflects the capacitive load of such tapping, computed by considering the register input capacitance  $\sum C_{reg}$  and the tapping wirelength capacitance  $\sum C_{wire}$ . For each register *i*, the capacitive cost of connecting to a ring *j* is computed by iden-

$$\begin{array}{ll} Minimize \quad k\\ \text{Subject to} \quad \sum_{i} c_{i,j} x_{i,j} = p_j \quad \forall \text{ j}\\ \\ \sum_{i} x_{i,j} = 1 \qquad \forall \text{ i} \in \{0, ... M\}\\ |p_{j_1} - p_{j_2}| \leq k \quad \forall \text{ j}_1, \text{ j}_2 \in \{0, ... N\}\\ \\ x_{i,j} \in \{0, 1\} \qquad \forall \text{ i}, \text{ j} \end{array}$$

Figure 4.6: ILP formulation for Problem OCLB.

tifying the tapping location on each ring j to satisfy the skew mismatch minimization objective. As each register will be connected to one ring only, one of the possible jcapacitive costs need to be selected so as to maintain the total capacitive load balance between each ring on the ROA.

The Problem OCLB is formulated as an integer linear programming (ILP) problem as shown in Fig. 4.6. The objective of Problem OCLB is to minimize k, the difference in capacitive loading across rotary rings on the ROA. The cost  $c_{i,j}$  is the tapping cost of connecting register i to ring j. The binary variables  $x_{i,j}$  denote register iconnecting to ring j. First set of constraints are defined for each ring j, where  $p_j$  is the total capacitive cost on ring j. The second set of constraints are defined for each register, where the summation guarantees that each register is connected to only one ring. The third set of constraints are defined for each pair of rings  $j_1$  and  $j_2$ , where kis the difference in the capacitive costs of each ring that is being minimized by the objective function. In a circuit with M registers and N rings, there are MN number of binary variables and  $M + N + {M \choose 2}$  number of constraints.

## 4.2.2.2 Experimental Results for OCLB

The ring perimeter is fixed to  $P_r$  based on the frequency  $f_r$ , which is selected as 1.8 GHz. The test data are the IBM R1-R5 benchmark circuits Based on  $f_r$  and the floorplan, the benchmark circuits R1, R2, R3, R4 and R5 are partitioned into ROA grid sizes (5×5), (6×6), (7×7), (9×9) and (10×10), respectively. Note that, the R1-R5 benchmark circuits use a generic distance unit. Thus, the same generic unit is used here to represent the wirelengths. However, the distances are appropriately scaled down in order to incorporate the physical dimensions of the wires and transmission lines in the SPICE simulation models for target frequencies. Input capacitance for the synchronous components and per unit capacitance of the tapping wires are obtained from the R1-R5 benchmark circuits. To facilitate the *non-zero* skew implementation, phase values—ranging from 0° to 360°—are randomly generated for the register sinks in the benchmark circuit files. The integer linear programming problem formulated for Problem OCLB is solved using a commercial mixed integer programming solver *CPLEX* [51].

For OCLB, experiments are performed to demonstrate the maximum capacitive load balanced implementation. The results are tabulated in Table 4.2, demonstrating the ROA grid size, number of integer variables on the formulation, the capacitance variation for unbalanced capacitance distribution and for optimal capacitance balance for the non-zero clock skew implementation, improvement in capacitance balancing and the corresponding run times. The capacitance variation across the ROAs of the R1-R5 benchmark circuits ranges from 5.70 pF and 17.13 pF. The optimal capacitance balance k (between each ROA ring) is between 2.21 pF and 5.76 pF. Thus, on average 3.73X improvement in capacitance balancing is achievable using OCLB. The optimal capacitive balance is demonstrated in Fig. 4.7 for the 25 (5 × 5) rings of the ROA on the benchmark circuit R1. The capacitive load balance between each ring is visible

| Benchmark | Grid           | # of integer | Run time | Capacitance variation k $(pF)$ |           | Improvement |
|-----------|----------------|--------------|----------|--------------------------------|-----------|-------------|
|           |                | variables    | (sec)    |                                |           | in k        |
|           |                |              |          | w/o balancing                  | with OCLB |             |
| R1        | $5 \times 5$   | 6675         | 1        | 5.70                           | 2.21      | 2.58X       |
| R2        | $6 \times 6$   | 21528        | 3        | 14.59                          | 2.29      | 6.37X       |
| R3        | $7 \times 7$   | 42238        | 11       | 10.35                          | 3.46      | 2.99X       |
| R4        | $9 \times 9$   | 154143       | 87       | 14.79                          | 3.96      | 3.74X       |
| R5        | $10 \times 10$ | 310100       | 198      | 17.13                          | 5.76      | 2.97X       |

Table 4.2: Results for OCLB formulation.

with a maximum difference of  $k = 2.21 \ pF$ . The drawback of OCLB is the excessive wirelength used to adhere the optimal capacitive load balance, by tapping some registers to distant tapping points and distant rings. Due to this potentially excessive wirelength, however, the total capacitive load also is high (286.49 pF).



Figure 4.7: Capacitance distribution of R1 on a  $5 \times 5$  grid with proposed OCLB formulation. The total capacitance is 286.49 pF.



Figure 4.8: SPICE simulation results for OCLB formulation. Frequency variation by 0.30% for the capacitance imbalance of k = 2.21 for R1 circuit.

SPICE simulations are performed for the ROA implementation computed by OCLB in order to observe the improvement in oscillation characteristics. The circuit is setup in SPICE to display five (5) of the ROA rings similar to the setup for the unbalanced capacitance distribution case in Section 4.2.1. In this setup, the capacitance values obtained from the OCLB analysis (from Fig. 4.7) are incorporated in the SPICE netlist. The maximum capacitive imbalance of k = 2.21 is used in the setup in order to investigate the variation in frequency across the five rings of the ROA. The resultant waveforms are shown in Fig. 4.8. It is observed that the frequency is relatively constant across the different rotary rings of the ROA. A maximum frequency variation of 0.30% is observed across the five (5) rings of the ROA. This frequency variation of 0.30% is superior when compared to the 30.31% frequency mismatch observed in the results of the simulations for the unbalanced capacitance case. The frequency magnitudes are not directly comparable, as the total capacitive loads in unbalanced capacitance case and the OCLB problem solutions are different.

## 4.2.3 Minimizing Wirelength Across Capacitance Balanced ROA

One drawback of OCLB, is the potentially excessive wirelengths obtained for the optimal capacitance balanced solution. To address the excessive wirelength problem, methodology SOCLB is developed in Section 4.2.3.1 and the experimental results are presented in Section 4.2.3.2.

## 4.2.3.1 SOCLB: Sub-optimal Capacitive Load Balancing for Minimum Tapping Wirelength

A more practical approach for a capacitance balanced ROA is to keep the capacitive balance difference under a predetermined upper bound for robust oscillation while minimizing the overall tapping wirelength. In practice, the upper bound depends on

$$\begin{aligned} Minimize \quad &\sum_{i} c_{i,j} x_{i,j} \\ \text{Subject to} \quad &\sum_{i}^{i} c_{i,j} x_{i,j} = p_j \quad \forall \text{ j} \\ &\sum_{j} x_{i,j} = 1 \qquad \forall \text{ i} \in \{0, \dots M\} \\ &|p_{j_1} - p_{j_2}| \leq k_{UB} \quad \forall \text{ j}_1, \text{ j}_2 \in \{0, \dots N\} \\ &x_{i,j} \in \{0, 1\} \qquad \forall \text{ i}, \text{ j} \end{aligned}$$

Figure 4.9: ILP formulation for SOCLB.

the manufacturing technology and the level of frequency mismatch tolerable by the design. In experimentation, an upper bound based on the solution of OCLB is selected for simplicity.

This optimization problem, labeled SOCLB, is modeled as an integer linear programming (ILP) as shown in Fig. 4.9. The objective is minimizing the total capacitive balancing load on each ring of the ROA. The first and second set of constraints are defined similar to OCLB. In the third set of constraints, the capacitance balance mismatch between each ring is set to be bounded by the upper bound  $k_{UB}$ . Similar to OCLB, SOCLB has MN binary variables and  $M + N + {M \choose 2}$  constraints for a circuit with M registers and N rings.

## 4.2.3.2 Experimental Results for SOCLB

For SOCLB, the experiments are performed to demonstrate the proposed practical implementation of minimal total wirelength for a capacitance balanced (with an upper bound) solution. The practical bound on capacitive balancing is set to twice the optimal value presented in Table 4.2 for each circuit. The results are tabulated in Table 4.3, demonstrating the capacitive difference k for both OCLB and SOCLB

| Benchmark | Capacitance variation k for OCLB $(pF)$ | Capacitance variation k for SOCLB $(pF)$ | Wirelength improvement for SOCLB over OCLB |
|-----------|-----------------------------------------|------------------------------------------|--------------------------------------------|
| R1        | 2.21                                    | 4.42                                     | 62.55%                                     |
| R2        | 2.29                                    | 4.58                                     | 65.24%                                     |
| R3        | 3.46                                    | 6.92                                     | 66.05%                                     |
| R4        | 3.96                                    | 7.92                                     | 74.77%                                     |
| R5        | 5.76                                    | 11.52                                    | 77.59%                                     |
| Average   |                                         |                                          | 69.24%                                     |

Table 4.3: Wirelength improvement for SOCLB methodology.



Figure 4.10: Capacitance distribution of R1 on a  $5 \times 5$  grid with the proposed SOCLB formulation. The total capacitance is 116.27 *pF*.

and the improvements in the total tapping wirelength. The average improvement in the tapping wirelength for the sub-optimally capacitive balanced circuits is 69.24%. Thus, by sacrificing some capacitive balance (while maintaining enough for robust operation), excessive wirelengths are prevented. Finally, the capacitive balance is demonstrated in Fig. 4.10 for the 25 (5  $\times$  5) rings of the ROA on the benchmark circuit R1.

Next, SPICE simulations are performed for SOCLB. The circuit is setup in SPICE to display five (5) of the rotary rings corresponding to R1 benchmark circuit, similar to the setup for OCLB. In this setup, the capacitance values obtained from the SOCLB analysis (from Fig. 4.10) are incorporated in the SPICE netlist. The maximum capacitive imbalance for the R1 benchmark circuit—k = 4.42—is used in order to investigate the variation in frequency across the five rings of the sample ROA. The resultant waveforms are shown in Fig. 4.11. A maximum frequency variation of 2.40% is observed across the five (5) rings of the ROA. Frequency variation of 2.40% is observed which is degraded over 0.3% of OCLB but significantly improved over the 30.31% variation of the unbalanced case.

## 4.2.4 Summary

In this section, it is shown using SPICE simulations that the unbalanced capacitive loading has a detrimental effect on the oscillation frequency. To prevent a design with the capacitive load imbalance, two novel capacitive balancing methodologies (OCLB and SOCLB) are devised. The devised methodologies provide the robust operation of the rotary clock on the conventional ROA topology. SPICE simulations are performed verifying the robust oscillation characteristics of the rings of the ROA in limiting the frequency variation to 0.30% and 2.40% as compared to 30.31% in the unbalanced case. SOCLB is proposed as a practical implementation of the capacitive balancing scheme, which leads to a wirelength improvement of approximately 69.24% over the results of optimal OCLB formulation for a frequency variation of only 2.40%.



Figure 4.11: SPICE simulation results for SOCLB. Frequency variation by 2.40% for the capacitance imbalance of k = 4.42 for R1 circuit.

# 4.3 Skew-Aware Capacitive Load Balancing for Low-Power Zero Clock Skew Rotary Oscillatory Array

One of the differentiating properties of rotary clocking is non-zero clock skew operation due to the "traveling" nature of the clock signal on the Rotary Oscillatory Array (ROA) distribution network. It is shown in Section 3.2, that the traveling nature of the clock signal does not necessitate a non-zero clock skew implementation and that zero clock skew circuits can be efficiently implemented with rotary clocking. However, in 3.2, the requirement of balanced capacitive load is ignored. The capacitive load balance is identified as an integral part of operation for the rotary clocking due to the implications on clock resonance.

The International Technology Roadmap for Semiconductors (ITRS) predicts that the clock skew in modern circuits can dominate up to 10% of the clock cycle [90]. With the high clock frequencies, the percentage of the clock cycle dominated by the clock skew further increases. Towards this end, a bounded skew constraint implementation is desired for high-frequency zero clock skew rotary oscillatory arrays presented in 3.2.

In Section 4.1, a bounded skew constraint methodology is proposed for non-zero clock skew circuits. However, the capacitive load balancing is ignored. In Section 4.2, capacitive balancing is addressed for non-zero clock skew circuits, however, the skew mismatch is neglected. Further, there is no methodology proposed in literature for simultaneous skew-control and capacitance-balancing for rotary clocking. Towards this end, two techniques are proposed in order to mitigate the unbalanced capacitive load problem while simultaneously controlling the skew for zero-skew synchronization with rotary clocking. The first method is proposed to achieve an optimal capacitive load balance between the rings with a bounded skew, at the expense of tapping wirelength. The second method trades off the optimality of the capacitive load balance with a practical limitation on the tapping wirelength. Both methods limit the clock

skew simultaneously with capacitive load balancing for a high frequency, low jitter and low skew operation.

In Section 4.3.1, the motivation for skew control and capacitive balancing is presented. In Section 4.3.2, the proposed methodologies are presented. In Section 4.3.3, experimental results on IBM R1-R5 circuits are shown. In Section 4.3.4, the work is summarized.

### 4.3.1 Motivation

A methodology for zero clock skew synchronization with rotary clocking technology (ZCS) is presented in Section 3.2. ZCS builds a design automation framework to connect zero-skew registers to the rotary rings of an ROA such that the wirelength is minimal. Note that in ZCS, there exists a skew mismatch due to the minimal wirelength constraint employed while connecting registers to the rings of the ROA. The skew mismatch resulted with the ZCS methodology is analyzed and a need for a controllable skew mechanism is identified.

To analyze the skew mismatch in ZCS, the ROA topology is implemented to draw rotary rings for a given frequency  $f_{osc}$  on the IBM R1 to R5 benchmark circuits (with the methodology adopted in [37]). Based on the equation (2.3), the ring perimeter is fixed to  $R_{peri}$ , corresponding to the frequency  $f_{osc}$ . For each rotary ring, the perimeter  $R_{peri}$  is kept constant so as to maintain the constant frequency. The grid size of the ROA topology is determined based on the perimeter  $R_{peri}$  and the placement information (x, y) for each register. For example, the IBM benchmark circuit R1 is partitioned into  $5 \times 5$  grids depending on the size of the circuit and the perimeter computed for a frequency of  $f_{osc} = 1.8 \ GHz$  of a rotary-clock synchronized circuit. The skew mismatch in ZCS is presented in Table 4.4. Note that, the average skew mismatch of 6.55% is observed for the R1-R5 benchmark circuits. These skew val-

| Benchmark | Grid           | # of registers | Skew (as a % of clock) |
|-----------|----------------|----------------|------------------------|
| R1        | $5 \times 5$   | 267            | 6.11%                  |
| R2        | $6 \times 6$   | 598            | 6.67%                  |
| R3        | $7 \times 7$   | 862            | 6.38%                  |
| R4        | $9 \times 9$   | 1903           | 6.67%                  |
| R5        | $10 \times 10$ | 3101           | 6.94%                  |
| Average   | —              | _              | 6.55%                  |

Table 4.4: Skew mismatch results with ZCS.



Figure 4.12: Capacitance distribution of R1 on a  $5 \times 5$  ROA grid with zero clock skew synchronization (ZCS). The total capacitance is 238.480 *pF*.

ues further increase with increase in frequency. Hence a skew-control mechanism is necessary for rotary clocking.

Note that in addition to the skew mismatch, the ZCS methodology might lead to an uneven distribution of registers—thus capacitive load—to the ROA rings. To this end, it is identified that such an uneven capacitance distribution affects the frequency and the stability of the rotary signals as follows.

In Fig. 4.12, the capacitive load imbalance is demonstrated for the 25  $(5 \times 5)$  rings of the ROA on the benchmark circuit R1, when R1 is synchronized by the ZCS methodology. The capacitive imbalance for the R1 circuit (of the IBM R1-R5 benchmark circuits) is  $k = 15.490 \ pF$ , which represents the difference between the heaviest and the lightest loaded ring. Similar analysis is performed for all the IBM R1-R5 benchmark circuits and capacitance variations of  $k = 15.490 \ pF$ ,  $13.514 \ pF$ ,  $9.772 \ pF$ ,  $15.177 \ pF$ ,  $16.671 \ pF$  are observed, respectively. For reference, note that the input capacitance of the smallest register is  $0.033 \ pF$ . The total capacitive load of the circuit after employing ZCS is  $238.480 \ pF$ .

SPICE simulations are performed with the ZCS methodology to display the instability in clock waveforms. A SPICE model is created in order to develop an accurate simulation model for the rotary clocking technology similar to [18, 19]. Towards this end, the U-element from HSPICE is used to model the lossy transmission line [56] and the capacitive load variation across the rotary rings computed by ZCS is incorporated into the netlist. Note that, the capacitive loads are shown in Fig. 4.12, with a maximum imbalance of  $k = 15.490 \ pF$ . The SPICE waveforms of five (5) of the 25 rings are shown in Fig. 4.13, which are representative of the capacitive load distribution of the ROA. Across the different rings of the ROA, a maximum variation of 10.14% in frequency is observed from 1.4712 GHz to 1.6373 GHz. Note that, in addition to the unmatched frequencies, the oscillations are not very stable due to the high capacitance imbalance causes poor oscillation characteristics, which renders the ZCS impractical for high frequency implementations.

To confirm the advantages of capacitance balancing, dummy capacitors are used to achieve balanced loading across the ROA. A methodology is devised for controlling the loading variations by adding dummy capacitive loads to equalize the load capacitance across the ROA. On each rotary ring a dummy capacitive load is added such that the total capacitive load on each ring is identical to the highest capaci-



Figure 4.13: SPICE simulation results for the ZCS methodology on the five (5) rings of the ROA, resulting in a frequency variation of 10.14% for the capacitance imbalance of  $k = 15.490 \ pF$ .

tive load at the rings obtained in ZCS. The waveforms of five (5) of the 25 rings are shown in Fig. 4.14, which are representative of the equal capacitive load distribution of the ROA due to the added dummy capacitances. Across the different rings of the ROA, a maximum variation of 0.59% in frequency is observed from 1.4227 GHz to 1.4311 GHz. Note that, although the capacitive load across the ROA is well balanced and the frequencies across the rings are relatively constant, the overall load is very high (approximately 425 pF for R1 a 1.78X increase). The increase in capacitance causes higher power dissipation as well as reduced operating frequency (1.4227 GHz) compared to the frequency in ZCS (1.5239 GHz).

## 4.3.2 Proposed Methodology

The oscillation on the rotary rings of the ROA structure depends on the network parasitics. The inductance of each ring in the conventional ROA topology is identical. However, the capacitance not only depends on the ring perimeter but also on the register loads on each ring. In [71], the number of registers per ring is limited with an upper bound in an effort to have uniform register distribution on the ROA ring. Capacitive balancing requires a more comprehensive analysis, however, as the tapping wires and the inverters contribute to the capacitive load [ $C_{wire}$  and  $C_{inv}$  respectively in (2.5)]. In addition to the balanced load, it is necessary to maintain the clock skew within a reasonable bound as well.

Two relevant problem formulations are devised to establish the proposed design methodology. In Section 4.3.2.1, the optimal capacitive balancing problem with controllable skew is solved for the common ROA implementation. In Section 4.3.2.2, the optimality constraints of the capacitive balance are replaced by a user-specified upper bound, in an effect to limit the register tapping wirelength.



Figure 4.14: SPICE simulation results for the ZCS methodology on the five (5) rings of the ROA, with dummy capacitances to balance the loads.

$$\begin{array}{ll} Minimize \quad k\\ \text{Subject to} \quad \sum_{i} c_{i,j} x_{i,j} = p_j \quad \forall \text{ j}\\ & \sum_{j} x_{i,j} = 1 \qquad \forall \text{ i} \in \{0, \ldots M\}\\ & |p_{j_1} - p_{j_2}| \leq k \quad \forall \text{ j}_1, \text{ j}_2 \in \{0, \ldots N\}\\ & s_{i,j} \leq |s_{UB}| \qquad \forall \text{ i}, \text{ j}\\ & x_{i,j} \in \{0, 1\} \qquad \forall \text{ i}, \text{ j} \end{array}$$

Figure 4.15: MIP formulation for SkCLB.

#### 4.3.2.1 SkCLB: Skew Aware Capacitive Load Balancing on ROA

The skew aware capacitive load balancing problem is formulated as a mixed integer programming (MIP) problem as shown in Fig. 4.15. The objective is to minimize k, the difference in capacitive loading across rotary rings on the ROA. Consider register i synchronized with ring j on the ROA. For a capacitive balanced implementation, each register needs to be assigned to a rotary ring in the ROA depending on the *capacitance cost*. Capacitive cost of each tapping wire reflects the capacitive load of such tapping computed by considering the register input capacitance  $C_{reg}$  and the tapping wirelength capacitance  $C_{wire}$ . For each register i, the capacitive cost of connecting to a ring j is computed by identifying the tapping location on each ring j to satisfy the skew mismatch minimization objective. The cost  $c_{i,j}$  is the tapping cost of connecting register i to ring j. Note that, as each register will be connected to one ring only, one of the possible j capacitive costs need to be selected so as to maintain the total capacitive balance between each ring on the ROA. The binary variables  $x_{i,j}$ denote register i connecting to ring j. First set of constraints are defined for each ring j, where  $p_j$  is the total capacitive cost on ring j. The second set of constraints are defined for each register, where the summation guarantees that each register is connected to only one ring. The third set of constraints are defined for each pair of rings  $j_1$  and  $j_2$ , where k is the difference in the capacitive costs of each ring that is being minimized by the objective function. The fourth set of constraints are defined for each register to keep the clock skew within a bound  $s_{UB}$ , where  $s_{i,j}$  is the skew resulted in connecting register i to ring j. In a circuit with M registers and N rings, there are MN number of binary variables and  $M + N + {M \choose 2}$  number of constraints.

## 4.3.2.2 ZCSCLB: Zero Clock Skew Synchronization with Capacitance Balanced ROA

A drawback of SkCLB methodology is the potentially excessive wirelength obtained for the optimal capacitance balanced solution. The overall tapping wirelength is integral to low-power operation and reduced routing congestion. A more practical approach to zero clock skew synchronization with capacitive load balanced rotary oscillatory array is to keep the difference in the capacitive load balance under a predetermined upper bound in addition to the bounded skew for robust oscillation while minimizing the overall tapping wirelength. In practice, the upper bound is dependent on the manufacturing technology and the level of frequency variation tolerable by the design.

This optimization problem, labeled ZCSCLB, is modeled as a mixed integer programming (MIP) problem shown in Fig. 4.16. The objective is minimizing the total capacitive balancing load on each ring of the ROA. The first and second set of constraints are defined similar to SkCLB. In the third set of constraints, the capacitive load imbalance between each ring is set to be bounded by the upper bound  $k_{UB}$ . The fourth set of constraints are for bounded skew implementation similar to SkCLB.

$$\begin{array}{ll} Minimize & \sum_{i} c_{i,j} x_{i,j} \\ \text{Subject to} & \sum_{i}^{i} c_{i,j} x_{i,j} = p_{j} \quad \forall \text{ j} \\ & \sum_{i} x_{i,j} = 1 \qquad \forall \text{ i} \in \{0, \dots M\} \\ & |p_{j_{1}} - p_{j_{2}}| \leq k_{UB} \quad \forall \text{ j}_{1}, \text{ j}_{2} \in \{0, \dots N\} \\ & s_{i,j} \leq |s_{UB}| \qquad \forall \text{ i}, \text{ j} \\ & x_{i,j} \in \{0, 1\} \qquad \forall \text{ i}, \text{ j} \end{array}$$

Figure 4.16: MIP formulation for ZCSCLB.

Similar to SkCLB, ZCSCLB has MN binary variables and  $M + N + \binom{M}{2}$  constraints for a circuit with M registers and N rings.

## 4.3.2.3 Power Analysis

One of the main characteristics of the rotary oscillators is the charge recovery property. The rotary oscillators store the energy in the inductors during the discharging stage so that the stored energy can be re-circulated during the charging stage–thus minimizing the dynamic power consumption. Hence, the power dissipation in the rotary oscillators is mainly the static power due to the resistance of the transmission line interconnects. The overall power dissipation with the rotary oscillators can be estimated:

$$P_{total} = P_{ring} + P_{wire}, \tag{4.4}$$

where  $P_{ring}$  and  $P_{wire}$  are the power dissipated on the rotary ring and the power dissipation due to the capacitive loads exhibited by the tapping wires, respectively.
$P_{ring}$  is estimated as:

$$P_{ring} = P_{tra} + P_{inv}, \tag{4.5}$$

where  $P_{tra}$  and  $P_{inv}$  are the power dissipated due to the transmission line parasitics and the inverter pairs, respectively. The static power  $(P_{tra})$  dissipated due to the transmission line interconnects is further expressed as:

$$P_{tra} = \frac{V_{DD}^2}{Z_0^2} R_l,$$
(4.6)

where  $V_{DD}$  is the power supply voltage,  $R_l$  is the total resistance of the rotary ring interconnects, and  $Z_0$  is the transmission line impedance.  $Z_0$  is approximated as:

$$Z_0 = \sqrt{\frac{L_l}{C_l}}.\tag{4.7}$$

#### 4.3.3 Experimental Results

The test data are the IBM R1-R5 benchmark circuits [91] which have a number of clock sinks ranging from 267 to 3101. The benchmark circuits R1, R2, R3, R4 and R5 are partitioned into ROA grid sizes  $(5 \times 5)$ ,  $(6 \times 6)$ ,  $(7 \times 7)$ ,  $(9 \times 9)$  and  $(10 \times 10)$ , respectively, based on the selected frequency  $f_r$  and the floorplan. In experiments, to illustrate the zero-skew implementation, each register in the R1-R5 benchmark files is assigned with the identical clock phase of 0°. The integer linear programming problems formulated for SkCLB and ZCSCLB are solved using a commercial solver *CPLEX* [51]. The results for the SkCLB and ZCSCLB methodologies are presented in Section 4.3.3.1 and Section 4.3.3.2, respectively. The results for the power analysis is presented in Section 4.3.3.3.

| Bench                 | Grid           | Variables | Run        | Cap variation k $(pF)$ |       | Improvement | Skew variation |       |
|-----------------------|----------------|-----------|------------|------------------------|-------|-------------|----------------|-------|
| $\operatorname{mark}$ |                |           | time $(s)$ |                        |       |             |                |       |
|                       |                |           |            | ZCS                    | SkCLB |             | ZCS            | SkCLB |
| R1                    | $5 \times 5$   | 6675      | 1          | 15.490                 | 1.296 | 11.95X      | 6.11%          | 2.77% |
| R2                    | $6 \times 6$   | 21528     | 4          | 13.514                 | 3.027 | 4.46X       | 6.67%          | 2.77% |
| R3                    | $7 \times 7$   | 42238     | 8          | 9.772                  | 2.827 | 3.46X       | 6.38%          | 3.05% |
| R4                    | $9 \times 9$   | 154143    | 83         | 15.177                 | 5.120 | 2.96X       | 6.67%          | 3.05% |
| R5                    | $10 \times 10$ | 310100    | 178        | 16.671                 | 3.173 | 5.25X       | 6.94%          | 2.77% |
| Average               | -              | -         | —          | -                      | _     | 5.62X       | 6.55%          | 2.88% |

Table 4.5: Skew aware capacitive balancing results (SkCLB).

#### 4.3.3.1 SkCLB Results

For SkCLB, experiments are performed to demonstrate the maximum capacitance load balanced implementation. The skew upperbound  $s_{UB}$  is set to 3.5% of the total clock cycle. The results are tabulated in Table 4.5, demonstrating the ROA grid size, number of integer variables on the formulation, the run time, the capacitance variation for unbalanced capacitance distribution and for optimal capacitance balance for the zero clock skew implementations, improvement in capacitance balance and skew variations for ZCS and for SkCLB. The capacitance variation across the ROA for R1-R5 benchmark circuits varies from 9.772 pF and 16.671 pF. The optimal capacitance balance k (between each ROA ring) is between 1.296 pF and 5.120 pF. Thus, on average 5.62X improvement in capacitance balancing is observed using SkCLB. The average skew variations for ZCS and SkCLB are 6.55% and 2.88% of the total clock period, respectively. Thus on average 3.67% skew improvement is observed using SkCLB. The capacitive balance is demonstrated in Fig. 4.17 for the 25  $(5 \times 5)$  rings of the ROA on the benchmark circuit R1. The capacitance load balance between each ring is visible with a maximum difference of  $k = 1.296 \ pF$ . The drawback of SkCLB is the excessive wirelength used to adhere the optimal capacitive load balance, by tapping some registers to distant tapping points. Due to this potentially excessive wirelength, however,



Figure 4.17: Capacitance distribution of R1 on a  $5 \times 5$  ROA grid with SkCLB. The total capacitance is 260.060 pF.

the total capacitance load of the circuit is increased to 260.060 pF (from 238.480 pF of ZCS formulation in Section 3.2).

A perfectly balanced capacitive load across the ROA can be obtained by using dummy capacitive loads as well. This methodology is explained in Section 4.3.1. The simulations performed using HSPICE (Fig. 4.14) demonstrate the effect of balanced load resulting in negligible (0.59%) frequency variations across the ROA. However, this technique is limited in effectiveness due to the overall increase in the capacitance (from 238.480 pF to 425.000 pF for R1) resulting in reduced frequency and increased power dissipation.

SPICE simulations are performed for SkCLB in order to observe the improvement in oscillation characteristics. The circuit is setup in SPICE to display five (5) of the ROA rings similar to the setup explained in Section 4.3.1. In this setup, the capacitance values obtained from the SkCLB analysis (from Fig. 4.17) are incorporated in



Figure 4.18: SPICE simulation results for the SkCLB formulation. Frequency variation is 2.12% for the capacitance imbalance of  $k = 1.296 \ pF$  on R1 circuit.

| Benchmark |     | Wirelen | Skew   |       |        |
|-----------|-----|---------|--------|-------|--------|
|           | ZCS | SkCLB   | ZCSCLB | SkCLB | ZCSCLB |
| R1        | 1X  | 4.35X   | 2.46X  | 2.77% | 2.77%  |
| R2        | 1X  | 5.33X   | 2.27X  | 2.77% | 2.22%  |
| R3        | 1X  | 6.05X   | 2.47X  | 2.77% | 3.05%  |
| R4        | 1X  | 7.17X   | 2.62X  | 3.05% | 2.50%  |
| R5        | 1X  | 8.23X   | 2.31X  | 3.05% | 2.50%  |
| Average   | 1X  | 6.23X   | 2.43X  | 2.88% | 2.61%  |

Table 4.6: Normalized tapping wirelength and skew comparison using SkCLB and ZCSCLB.

the SPICE netlist with a maximum capacitive imbalance of  $k = 1.296 \ pF$ . The resultant waveforms are shown in Fig. 4.18. It is observed that the frequency is relatively constant across the different rotary rings of the ROA. A maximum frequency variation of 2.12% is observed across the rings of the ROA compared to 10.14% for ZCS. The absolute frequency figures are not directly comparable, as the total capacitive loads ZCS and SkCLB problem solutions are different.

## 4.3.3.2 ZCSCLB Results

For ZCSCLB formulation, the experiments are performed to demonstrate the proposed practical implementation of minimal total wirelength for a capacitance balanced (with an upper bound) solution. The practical bound on capacitive balancing is set to twice the optimal value presented in Table 4.5 for simplicity. In application, this bound can be selected based on the desired operating frequency. The skew upperbound  $s_{UB}$  is set to 3.5% of the total clock period. The results are tabulated in Table 4.6, demonstrating the relative tapping wirelengths (normalized with ZCS wirelength) and skew variations for both SkCLB and ZCSCLB. The average total wirelength using the ZCSCLB technique is 2.43X compared to 6.23X in SkCLB.



Figure 4.19: Capacitance distribution of R1 on a 5  $\times$  5 ROA grid with ZCSCLB. The total capacitance is 145.490 *pF*.

The average clock skew variation using ZCSCLB is 2.61% of the total clock cycle. Thus, by sacrificing some capacitive balance (while maintaining enough for robust operation), excessive wirelengths are prevented. Finally, the capacitive load balance is demonstrated in Fig. 4.19 for the 25 (5  $\times$  5) rings on the benchmark circuit R1.

Next, SPICE simulations are performed for the ZCSCLB formulation. The circuit is setup in SPICE to display five (5) of the rotary rings similar to the setup for SkCLB. In this setup, the capacitance values obtained from the ZCSCLB analysis (Fig. 4.19) are incorporated in the SPICE netlist with a maximum capacitive imbalance of  $k = 2.592 \ pF$ . The resultant waveforms are shown in Fig. 4.20. A maximum frequency variation of 3.62% is observed across the rings of the ROA. The frequency variation in the simulation results of the ZCSCLB is small (3.62%) compared to the frequency variation observed in the simulations for the ZCS (10.14%), due to a relatively balanced capacitance load on the different rings of the ROA in ZCSCLB.



Figure 4.20: SPICE simulation results for the ZCSCLB formulation. Frequency variation is 3.62% for the capacitance imbalance of  $k = 2.592 \ pF$  on R1 circuit.

In ZCS, due to the highest imbalance in the capacitance distribution across the rings of the ROA, the frequency variation is maximum (10.14%). The SkCLB methodology, gives minimum frequency variation (2.12%) across the rings due to a very well balanced capacitance distribution. However, in balancing the capacitive loads across the rings, the total tapping wirelength increases by 6.23X on average. In ZCSCLB, a more practical approach for the capacitive balancing is used. The wirelength obtained is 2.43X compared to 6.23X for the CLB formulation. Also, the frequency variation is relatively less (around 3.62%), due to a relatively balanced capacitive load across the rings of the ROA. The ZCSCLB technique is proposed as a blueprint for practical application of rotary clock synchronization.

#### 4.3.3.3 Power Analysis Results

The power dissipated is measured using SPICE simulations. Rotary ring is simulated in SPICE using the U-models incorporating the transmission line parasitics. The inverters are modeled in a 180 nm technology. The IBM R1-R5 benchmark circuits are used to model the register load and the wire capacitance. Note that,  $P_{ring}$  and  $P_{reg}$  are identical for ZCS, SkCLB, and ZCSCLB rotary implementation methodologies across R1-R5 circuits. However, overall power dissipation is different for the proposed methodologies due to the variation in the capacitive loading across the rotary ring. The power dissipation is tabulated in Table 4.7. Note that for R1-R5 circuits, the power dissipation with the SkCLB and ZCSCLB methodologies are within  $\pm 1.5\%$  of the power dissipated with the ZCS methodology.

#### 4.3.4 Summary

In this section, the need for simultaneous skew control and capacitive load balancing on the rings of the ROA are identified. Two (2) methodologies for skew-

| Benchmark | ZCS(w) | SkCLB $(w)$ | $ZCSCLB\ (w)$ |
|-----------|--------|-------------|---------------|
| R1        | 0.2168 | 0.2276      | 0.2279        |
| R2        | 0.2180 | 0.2195      | 0.2208        |
| R3        | 0.2181 | 0.2294      | 0.2290        |
| R4        | 0.2179 | 0.2206      | 0.2199        |
| R5        | 0.2261 | 0.2269      | 0.2265        |

Table 4.7: Power dissipation results.

control and capacitive-balancing are devised for the timing closure and robust operation of the rotary oscillatory array (ROA) topology. Experiments performed on the IBM R1-R5 benchmark circuits show a 5.62X improvement in capacitive load balance and a 3.67% improvement in clock skew. SPICE simulations are performed verifying the frequency variations of 2.12% and 3.62% across the rings of the ROA, for the proposed skew-aware optimal (SkCLB) and practical (ZCSCLB) capacitive load balancing methodologies, respectively. Further, power dissipated with the proposed methodologies is analyzed using SPICE simulations. Power dissipated with the proposed optimization techniques are within  $\pm 1.5\%$  of the power dissipated with the conventional design automation techniques for rotary synchronization.

# 5. Timing Analysis and Optimization for Mobius Implementation of Resonant Standing Wave Oscillator

A mobius standing wave oscillator (SWO) is first implemented in [19]. Similar to the rotary wave, the mobius standing wave implementation aids in charge recovery process, and hence, results in low power dissipation. Further, the mobius standing wave generated clock signals have uniform phase throughout the ring. However, the mobius standing wave implementation lacks design automation and timing analysis methodologies. Other than [19], there is no published work on the mobius implementation of resonant standing wave oscillator. To this end, the timing analysis and optimization methodologies presented for rotary clocking (in Chapter 4) are extended to the mobius standing wave technology as well. In Section 5.1, a design automation scheme to connect the registers to the SWO technology is presented. In Section 5.2, a capacitive load balancing methodology for SWO is presented. In Section 5.3, the skew properties of the SWO are analyzed.

#### 5.1 Design Automation Scheme for SWO

The design automation scheme to synchronize the registers with the SWO technology is presented with an example in Section 5.1.1. The results of the proposed design automation scheme are presented and compared with the rotary clocking results in Section 5.1.2. The summary of the section is presented in Section 5.1.3.

## 5.1.1 Proposed Methodology

The mobius standing wave topology is implemented to draw rings for a given frequency  $f_r$ . The ring perimeter is fixed to  $P_r$  corresponding to the frequency  $f_r$ .



Figure 5.1: Registers connecting to rotary ring and standing wave ring implemented on zero skew circuits.

The grid size for the resonant topologies is determined based on the perimeter  $P_r$  and placement information (x, y) for each register.

Consider a sample circuit of 20 registers preplaced over a square grid as depicted in Fig. 5.1. A sample ring with the rotary wave topology and the standing wave topology are shown in Fig. 5.1(a) and Fig. 5.1(b), respectively. For analysis purposes, consider the registers marked as **A** and **B**. In general, all the registers have the same placement and skew (zero skew) requirements in Fig. 5.1(a) and Fig. 5.1(b).

Consider the standing wave implementation as shown in Fig. 5.1(b). In Fig. 5.1(b), the clock signal is recovered using clock recovery circuits at S1-S8 connection points. These are analogous to the tapping points TP1-TP8 in rotary clocking technology as shown in Fig. 5.1(a). In standing wave technology, the registers are connected to the closest possible tapping points to minimize the skew mismatch. In Fig 5.1(b), register A and register B are connected to connection points S2 and S6, respectively. The tapping wirelengths for registers A and B are 2 units and 4 units, respectively. In general, let the connection points  $S_k$  be distributed uniformly along the rotary ring. Let the minimum wirelength required to connect a register j to the closest connection point  $S_k$  be  $WL_{k,j}$ . The total register tapping wirelength in standing wave implementation is  $\sum_{i} WL_k$ .

For comparison purposes, the clocking technology is implemented as well. In Fig. 5.1(a), the tapping points TP1-TP8 are identified at uniform distances on the rotary ring. Due to the "traveling" nature of the wave produced, the tapping points TP1-TP8 have the phases distributed uniformly between 0° and 360°. This property of rotary clocking technology necessitates a non-zero skew implementation. However, phase contributed by the tapping wire is used to synchronize zero skew circuits with rotary clocking. In rotary clocking technology, depending on the phase contributed by the tapping wire and the phase available at the tapping point, registers A and B are connected to those tapping points which result in minimum skew mismatch. In Fig 5.1(a), register A and register B are connected to tapping points TP3 and TP7, respectively. The tapping wirelengths for registers A and B are 5 units and 7 units, respectively. In general, let the tapping points  $TP_i$  be distributed uniformly along the rotary ring. Let the wirelength required to connect a register j to a favorable tapping point  $TP_i$  be  $WL_{i,j}$ . The total register tapping wirelength in rotary clocking technology implementation is  $\sum_i WL_i$ .

# 5.1.2 Tapping Wirelength Comparison

The clock network design methodologies for the rotary clocking and the mobius standing wave technology are implemented in C++. These methodologies are tested on a 2GHz x86 processor with a 1GB RAM. The test data are the IBM R1-R5 benchmark circuits. Depending on the placement dimensions and the frequency of the ring, benchmark circuits R1, R2, R3, R4 and R5 are partitioned into grid sizes (5x5), (6x6), (7x7), (9x9) and (10x10), respectively. A mobius strip of a rotary ring or a standing

wave ring is implemented in each grid with similar dimensions to satisfy the desired resonant clock frequency  $f_r$ .

Experiments are carried out to compare the tapping wirelengths for rotary clocking and mobius standing wave implementations. The experiments are performed for varying number of tapping (connection) points on the resonant networks to demonstrate the practical limitations of design (e.g. limited area for clock recovery circuits in the standing wave). In Table 5.1, tapping wirelength for the rotary wave  $(\sum_{j} WL_{i})$  and tapping wirelength for mobius implementation of standing wave  $(\sum_{k} WL_k)$ , tested on R1-R5 circuits with varying number of tapping points are shown. For R1, R2, R3, R4 and R5 circuits, tapping wirelengths for standing wave implementations are better then the tapping wirelengths for rotary clocking implementation by 3.80X, 4.28X, 3.85X, 3.94X and 4.06X, respectively. On average, the tapping wirelength for standing wave implementation is 3.99X less than the tapping wirelength for rotary clocking implementation. The results demonstrate that the standing wave technology has an advantage over the rotary clocking technology in terms of total tapping wirelength, which comprises a portion (but not all) of the overall power consumption. It is noted that the higher granularity of connection points leads to reduced wirelength on average, yet, the granularity depends on the resources (area, power, etc.) available to accommodate the clock recovery circuits needed at each connection point for standing wave technology.

#### 5.1.3 Summary

In this section, a design automation scheme for mobius implementation of standing wave oscillator is presented. A comparative study between the rotary clocking technology and the mobius standing wave technology is presented with design automation perspective. It is demonstrated that with zero skew implementation, the standing wave technology consumes 3.99X less register tapping wirelength on average when compared to the rotary clocking technology. With increased granularity of connection points the tapping wirelength savings in standing wave implementation increase at the expense of system resources used for the clock recovery circuits at each connection point.

| Change         |                   | 2.41X                    | $3.20 \mathrm{X}$        | 4.02X                    | 4.51X                    | 4.93X                    | 4.86X                    | 3.99 X                   |
|----------------|-------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
| Ŋ              | $\sum_{j} WL_k$   | $1.27 \text{x} 10^7$     | $1.02 x 10^{7}$          | $0.86 \mathrm{x} 10^{7}$ | $0.77 \mathrm{x} 10^{7}$ | $0.73 \mathrm{x} 10^7$   | $0.75 \mathrm{x} 10^{7}$ | $0.90 \mathrm{x} 10^{7}$ |
| В              | $\sum_{j} WL_{i}$ | $3.08 \mathrm{x} 10^{7}$ | $3.32 \mathrm{x} 10^{7}$ | $3.49 \mathrm{x} 10^{7}$ | $3.60 \mathrm{x} 10^7$   | $3.66 \mathrm{x} 10^7$   | $3.71 \mathrm{x} 10^{7}$ | $3.48 \mathrm{x} 10^{7}$ |
| 4              | $\sum_{j} WL_k$   | $0.79 \mathrm{x} 10^{7}$ | $0.63 \mathrm{x} 10^7$   | $0.53 \mathrm{x} 10^7$   | $0.48 \mathrm{x} 10^{7}$ | $0.45 \mathrm{x} 10^{7}$ | $0.47 \mathrm{x} 10^{7}$ | $0.59 \mathrm{x} 10^7$   |
| Я              | $\sum_{j} WL_{i}$ | $1.87 \text{x} 10^{7}$   | $2.00 \mathrm{x} 10^7$   | $2.10\mathrm{x}10^7$     | $2.16\mathrm{x}10^7$     | $2.20\mathrm{x}10^7$     | $2.23 \mathrm{x} 10^7$   | $2.10 \mathrm{x} 10^{7}$ |
| R3             | $\sum_{j} WL_k$   | $3.54 \mathrm{x10^{6}}$  | $2.91 \mathrm{x} 10^{6}$ | $2.42 \mathrm{x10^{6}}$  | $2.24 \mathrm{x10^{6}}$  | $2.10 \mathrm{x} 10^{6}$ | $0.22 \mathrm{x} 10^{7}$ | $2.56 \mathrm{x10^6}$    |
|                | $\sum_{j} WL_{i}$ | $8.37 \mathrm{x10^{6}}$  | $8.97 x 10^{6}$          | $9.46 \mathrm{x} 10^{6}$ | $9.75 \mathrm{x} 10^{6}$ | $9.95 \mathrm{x} 10^{6}$ | $1.01 \times 10^{7}$     | $9.43 \mathrm{x} 10^{6}$ |
| 2              | $\sum_{j} WL_k$   | $2.49 \mathrm{x} 10^{6}$ | $1.88 \mathrm{x} 10^{6}$ | $1.56 \mathrm{x} 10^{6}$ | $1.45 \mathrm{x} 10^{6}$ | $1.33 \mathrm{x} 10^{6}$ | $1.37 \mathrm{x} 10^{6}$ | $1.68 \mathrm{x} 10^{6}$ |
| В              | $\sum_{j} WL_{i}$ | $6.11 \mathrm{x} 10^{6}$ | $6.51 \mathrm{x} 10^{6}$ | $6.81 \mathrm{x} 10^{6}$ | $6.99 \mathrm{x} 10^{6}$ | $7.11 x 10^{6}$          | $7.20 \mathrm{x} 10^{6}$ | $6.79 \mathrm{x} 10^{6}$ |
| $\mathbf{R1}$  | $\sum_{j} WL_k$   | $1.09 \mathrm{x} 10^{6}$ | $0.92 \mathrm{x} 10^{6}$ | $0.78 \mathrm{x} 10^{6}$ | $0.71 \mathrm{x} 10^{6}$ | $0.66 \mathrm{x} 10^{6}$ | $0.67 \mathrm{x} 10^{6}$ | $0.81 \mathrm{x} 10^{6}$ |
|                | $\sum_{j} WL_{i}$ | $2.63\mathrm{x}10^{6}$   | $2.78 \mathrm{x10^6}$    | $2.95 \mathrm{x10^6}$    | $3.04 \mathrm{x} 10^{6}$ | $3.10\mathrm{x}10^{6}$   | $3.15\mathrm{x}10^{6}$   | $2.94 \mathrm{x10^{6}}$  |
| $\#$ of $TP_i$ |                   | 4                        | ×                        | 12                       | 16                       | 20                       | 24                       | Average                  |

Table 5.1: Tapping wirelength comparison for R1-R5 circuits with varying # of tapping points.

#### 5.2 Capacitive Load Balancing for SWO

The stability of operational frequency is partially characterized by the capacitance load distribution on each mobius ring of the standing wave technology implemented on a grid structure (Fig. 2.11) as given in (2.6). The capacitance balancing requirements are briefed in Section 5.2.1. The capacitance for the mobius ring is composed of four (4) different components register input capacitance  $C_{reg}$ , inverter capacitance  $C_{inv}$ , the ring transmission line capacitance  $C_{ring}$  and the tapping wire capacitance  $C_{wire}$ , given by:

$$C_T \approx \sum C_{reg} + \sum C_{inv} + \sum C_{ring} + \sum C_{wire}.$$
(5.1)

Similar to the rotary oscillator, the inductance,  $C_{ring}$  and  $C_{inv}$  are identical for different rings across the mobius standing wave grid. However, the capacitances  $C_{reg}$  is defined based on the type and size of the register and the tapping wire capacitance  $C_{wire}$ depends on the distance between the ring and the registers (*e.g.* clock sinks). A capacitive load balancing problem formulation for mobius SWO technology is presented in Section 5.2.1. The experimental results are presented in Section 5.2.2. The section is summarized in Section 5.2.3.

# 5.2.1 Problem Formulation

The standing wave technology provides a constant phase clock signal if the total capacitance load is uniformly distributed across the rings in the grid structure. With a change in capacitance, the phase and frequency of the mobius ring generated clock signals [given by (2.6)] change. Hence, there is a need for a balanced capacitance distribution across the rings of mobius standing wave implementation on a grid structure. Consider i number of registers on the chip area synchronized by j number of mobius rings on the standing wave implementation. For a capacitive balanced implementation, each register needs to be assigned to a ring in the grid structure depending on the capacitance cost. For each register i, the capacitive cost of connecting to a ring j can be computed by identifying the connection points on each ring j. As each register will be connected to one ring only, one of the possible j capacitive costs needs to be selected so as to maintain the total capacitive load balance between each ring on the grid.

The capacitive load balancing problem is formulated as an integer linear programming (ILP) problem as shown in Fig. 5.2. The objective is to minimize k, the difference in capacitive loading across mobius rings on the grid structure. The cost  $c_{i,j}$ is the tapping cost of connecting register i to ring j, which reflects the capacitive load of such tapping computed by considering the register input capacitance and the tapping wirelength capacitance. The binary variables  $x_{i,j}$  denote register i connecting to ring j. First set of constraints are defined for each ring j, where  $p_j$  is the total capacitive cost on ring j. The second set of constraints are defined for each register, where the summation guarantees that each register is connected to only one ring. The third set of constraints are defined for each pair of rings  $j_1$  and  $j_2$ , where k is the difference in the capacitive costs of each ring that is being minimized by the objective function. In a circuit with M registers and N rings, there are MN number of binary variables and  $M + N + {M \choose 2}$  number of constraints.

# 5.2.2 Capacitive Load Balancing Results for SWO

The clock network design methodology with capacitive load balancing for the mobius standing wave technology are implemented in C++ and tested on R1-R5 benchmark circuits. Depending on the placement dimensions and the frequency of

$$\begin{array}{ll} Minimize \quad k\\ \text{Subject to} \quad \sum_{i} c_{i,j} x_{i,j} = p_j \quad \forall \text{ j}\\ & \sum_{i} x_{i,j} = 1 \qquad \forall \text{ i} \in \{0, \ldots M\}\\ & |p_{j_1} - p_{j_2}| \leq k \quad \forall \text{ j}_1, \text{ j}_2 \in \{0, \ldots N\}\\ & x_{i,j} \in \{0, 1\} \qquad \forall \text{ i}, \text{ j} \end{array}$$

Figure 5.2: ILP formulation for capacitive load balancing on SWO.

the ring, benchmark circuits R1, R2, R3, R4 and R5 are partitioned into grid sizes  $(5 \times 5)$ ,  $(6 \times 6)$ ,  $(7 \times 7)$ ,  $(9 \times 9)$ , and  $(10 \times 10)$ , respectively. A mobius strip of standing wave ring is implemented in each grid with similar dimensions to satisfy the desired resonant clock frequency  $f_r$ . Input capacitance for the synchronous components, per unit resistance and per unit capacitance of the tapping wires are obtained from the R1-R5 benchmark circuits. The integer linear programming problem formulated is solved using a commercial mixed integer programming solver *CPLEX* [51].

For the proposed methodology, experiments are performed to demonstrate the maximum capacitive load balancing implementation. The results are tabulated in Table 5.2 demonstrating the grid size, # of integer variables on formulation, the optimal capacitance balance and the corresponding run times. The optimal capacitance balance k between each ring is between 0.60 pF and 2.44 pF.

In Fig. 5.3, the capacitive load balancing is demonstrated for the 25 (5 × 5) rings of the standing wave technology implementation on the benchmark circuit R1. The capacitance load balance between each ring is visible with a maximum difference of  $k = 0.83 \ pF$ . Though capacitive load balancing is a multi-objective problem, in the current problem formulation, capacitance across the rings is the only metric

| Benchmark | Grid size      | # of integer variables | Optimal $k$ | Run time (sec) |
|-----------|----------------|------------------------|-------------|----------------|
| R1        | $5 \times 5$   | 6675                   | 0.83        | 1              |
| R2        | $6 \times 6$   | 21528                  | 0.60        | 2              |
| R3        | $7 \times 7$   | 42238                  | 0.82        | 8              |
| R4        | $9 \times 9$   | 154143                 | 1.60        | 63             |
| R5        | $10 \times 10$ | 310100                 | 2.44        | 104            |

Table 5.2: Capacitive load balancing results.

Table 5.3: Capacitance variation for R1-R5 circuits.

| Benchmark | Grid size      | # of registers | Capacitance variation $(pF)$ |                |  |  |  |  |
|-----------|----------------|----------------|------------------------------|----------------|--|--|--|--|
|           |                |                | without balancing            | with balancing |  |  |  |  |
| R1        | $5 \times 5$   | 267            | 2.45                         | 0.83           |  |  |  |  |
| R2        | $6 \times 6$   | 598            | 5.11                         | 0.60           |  |  |  |  |
| R3        | $7 \times 7$   | 862            | 4.40                         | 0.82           |  |  |  |  |
| R4        | $9 \times 9$   | 1903           | 6.18                         | 1.60           |  |  |  |  |
| R5        | $10 \times 10$ | 3101           | 6.39                         | 2.44           |  |  |  |  |

considered. The wirelengths are small and hence wire losses are neglected. Note that, the capacitive load balancing scheme is employed for all the benchmark circuits R1-R5. However, due to the space constraints, only the results for R1 benchmark circuits are shown. The capacitance balance results for R2, R3, R4 and R5 are similar with maximum capacitance difference of  $k = 0.60 \ pF$ ,  $k = 0.82 \ pF$ ,  $k = 1.60 \ pF$  and  $k = 2.44 \ pF$ , respectively. With the capacitive load balancing implementation, the variation in the capacitive load distributed across the rings is very small (within 5% of the overall capacitive load).



Figure 5.3: Capacitance distribution of R1 on a  $5 \times 5$  grid with proposed ILP formulation.

For comparison, the standing wave technology is implemented without capacitive load balancing. In this implementation, the registers are connected to the closest connection points without considering their input capacitance or the wire capacitance. In Fig. 5.4, the capacitive load variation is demonstrated for the 25 ( $5 \times 5$ ) rings of the standing wave technology implementation on the benchmark circuit R1. Note that, the standing wave scheme without capacitive load balancing is implemented for all the benchmark circuits R1-R5 as well. Due to the space constraints only the results for R1 benchmark circuits are shown. The range of capacitance variation for R1-R5 are shown in the Table 5.3. For R1, R2, R3, R4 and R5 circuits, the capacitance range varies as 2.45 pF, 5.11 pF, 4.40 pF, 6.18 pF and 6.39 pF, respectively. This demonstrates that by employing capacitive load balancing scheme, an effective improvement of 2.95X, 8.51X, 5.36X, 3.86X and 2.61X, can be achieved for the benchmark circuits R1, R2, R3, R4 and R5, respectively. Thus, the experiments performed on the IBM R1-R5



Figure 5.4: Capacitance distribution of R1 on a  $5 \times 5$  grid without capacitive load balancing consideration.

benchmark circuits demonstrate an average improvement of 4.66X in capacitive load balancing.

# 5.2.3 Summary

The frequency and to a certain extent clock phase in the mobius standing wave implementation are characterized by the capacitance balance across the mobius rings. In this section, a methodology to achieve capacitive load balancing is proposed using the ILP formulation. Experiments performed on the IBM R1-R5 benchmark circuits demonstrate an average improvement of 4.66X in capacitive load balancing.

#### 5.3 Skew Analysis for SWO

A design automation scheme is presented in Section 5.1 for the synchronization of registers with the SWO topology. Next, in Section 5.2, a capacitive balancing methodology is presented for mobius SWO technology. In order to achieve timing closure, a detailed timing analysis for mobius SWO implementation is necessary in addition to the proposed methodologies in Sections 5.1 and 5.2. Towards this end, the skew analysis with the proposed design automation scheme (Section 5.1) is presented in Section 5.3.1 and is compared with the skew analysis on the rotary clocking technology. Further, the effects of capacitive load balancing on skew are investigated and the results are presented in Section 5.3.2. Finally, the section is summarized in Section 5.3.3.

# 5.3.1 Proposed Methodology

The rotary traveling wave topology and the mobius standing wave topology are implemented to draw rings for a given frequency  $f_r$ . The ring perimeter is fixed to  $P_r$  corresponding to the frequency  $f_r$ . The grid size for the resonant topologies is determined based on the perimeter  $P_r$  and the placement information. For example, the IBM benchmark circuit **R1** is partitioned into  $5 \times 5$  grids depending on the size of the circuit and the perimeter computed for a frequency of  $f_r = 3.4 \ GHz$  (simulated frequency in [17]).

Consider the case for a mobius of standing wave. The difference for standing wave is that, the register  $R_j(x, y)$  is connected to the ring such that the total tapping wirelength is minimum. This is possible because the clock signal generated from the standing wave implementation has a constant phase throughout the ring ( $\Theta_{TP_i} = 0$ ). As all points on the standing wave oscillator (SWO) have the same phase 0°, the skew in connecting register  $R_j$  to connection point  $CP_i$  is  $S_{j,i}^{swo}$ , computed as:

$$S_{j,i}^{swo} = (\Theta_{l_i}) \mod 360^\circ.$$

$$(5.2)$$

For comparison purposes, the skew values are analyzed for resonant rotary clocking technology as well. Consider the case for a rotary ring. Let a register  $R_j$  be located at (x, y) tap on to the rotary ring at a tapping point  $TP_i$  which satisfies the phase of the register. The selection of the tapping point for  $R_j(x, y)$  depends on:

- 1.  $\Theta_{TP_i}$  the phase available at the tapping point  $TP_i$ ,
- 2.  $\Theta_{l_i}$  the phase attributed to the tapping wire  $l_i(x, y)$ .

The skew resulted in connecting register  $R_j$  to the tapping point  $TP_i$  is  $S_{j,i}^{rwo}$ , which is computed as:

$$S_{j,i}^{rwo} = (\Theta_{TP_i} + \Theta_{l_i}) \bmod 360^\circ.$$
(5.3)

The tapping point phases  $\Theta_{TP}$  at various tapping points  $TP_i$  are distributed between 0° and 360° due to the traveling nature of the rotary clock as explained in Section 2.1.2. The phase contributed by the tapping wire  $\Theta_{l_i}$  depends on the tapping wire  $l_i$ . The tapping point for the register  $R_j$  is chosen such that the clock skew  $S_{j,i}^{rwo}$  is minimum.

Considering (5.3) and (5.2), the skew of the rwo system depends on the tapping point phase as well as the wirelength phase, where as the skew of the SWO depends on the latter (tapping wire phase) only. Note that, the tapping locations and the corresponding tapping wirelengths depend on the capacitive balancing of the resonant systems. In other words, in order to achieve a capacitively balanced systems, registers can be connected to tapping points that do not give optimal skew value for each register. This phenomenon is analyzed in the next section.

#### 5.3.2 Skew Analysis Results

The clock network design methodologies for the rotary clocking and the mobius standing wave technology are implemented in C++. The test data are the IBM R1-R5 benchmark circuits. The test circuits—IBM R1-R5 benchmark circuits—have a number of clock sinks ranging from 267 to 3101. Depending on the placement dimensions and the frequency of the ring, benchmark circuits R1, R2, R3, R4 and R5 are partitioned into grid sizes  $(5\times5)$ ,  $(6\times6)$ ,  $(7\times7)$ ,  $(9\times9)$  and  $(10\times10)$ , respectively. A mobius strip of a rotary ring and a standing wave ring is implemented in each grid with similar dimensions to satisfy the desired resonant clock frequency  $f_r$ .

Two sets of experiments are performed to analyze skew with and without capacitive load balancing. The experiments without capacitive load balancing demonstrate the skew mismatch with the methodologies described in the previous work [19, 41]. The results with the capacitive load balancing demonstrate the skews expected with a more realistic, stable oscillation of the resonant clocking systems [39, 42] which are analyzed in this section.

For the first set of experiments tapping points are selected for registers synchronized by the RWO and the SWO. The tapping points are selected depending on the phase available at the tapping point and the phase generated by the tapping wire such that the skews  $S_{j,i}^{rwo}$  and  $S_{j,i}^{swo}$  are minimal when computed by (5.3) and (5.2), respectively. In Fig. 5.5, the skew distribution of rotary wave (RWO) and standing wave (SWO) is shown for the R1-R5 benchmark circuits. Note that, criticality in the static timing analysis is defined for the worst case skew mismatch, thus, for the tail ends of the distributions in Fig. 5.5. For the rotary wave [shown in dotted lines in Fig. 5.5(a) through Fig. 5.5(e)], an average skew mismatch (worst-case) of 6.94% of the clock period is observed. The average skew mismatch is computed by measuring the worst case skew (e.g. 24° for R1) and dividing this worst case skew by 360°

(of one clock cycle). The absolute values of the represented skew values are used in computation. For the standing wave [shown in solid lines in Fig. 5.5(a) through Fig. 5.5(e), an average skew mismatch (worst-case) of 0.83% of the clock period is observed. It is clear from this analysis that the standing wave technology (SWO) provides better skew properties when compared with rotary clocking technology (RWO). However, this type of design proposed in [19, 72], lacks the capacitive load balancing required to provide stable resonant oscillation. In the second set of experiments, the capacitive load balancing operation is considered. The skew values are analyzed for rotary clocking and standing wave technologies. Note that, with the capacitive load balancing operation explained in Section 5.2, the tapping point for each register—and hence the phase  $\Theta_{l_i}$  in (5.3) and (5.2)—might change. This results in a change in the skews  $(S_{j,i}^{rwo} \text{ and } S_{j,i}^{swo})$ . In Fig. 5.6, the skew distributions of the rotary wave and standing wave technologies after the capacitive load balancing are shown. For the rotary wave [shown in dotted lines in Fig. 5.6(a) through Fig. 5.6(e)], a worst-case skew mismatch of 3.05% of the clock period is observed. For the standing wave shown in solid lines in Fig. 5.6(a) through Fig. 5.6(e)], a worst-case skew mismatch of 20.56%of the clock period is observed, which is significantly degraded. This is mainly due to the identical phase throughout the standing wave technology (SWO). The identical phase does not grant the flexibility to provide the required non-zero skew values to the registers. The registers which have a very high skew mismatch for SWO are shown in the circled section in Fig. 5.6. Hence, with the capacitive load balancing achieved to provide a stable resonant oscillation, the rotary clocking technology (RWO) provides better skew properties compared to the standing wave technology (SWO).



Figure 5.5: Distribution of skew mismatch for R1- R5 circuits without capacitive load balancing.



Figure 5.6: Distribution of skew mismatch for R1- R5 circuits after capacitive load balancing. Circled regions include the registers with non-zero skews for *SWO*.

# 5.3.3 Summary

In this section, the skew mismatch for the standing wave technology (SWO) and the rotary wave technology (RWO) are analyzed and plotted. Capacitive load balancing is necessary for the stable operation of resonant clocking as shown in [39, 42]. With the capacitive balancing the skew mismatch (worst-case) of standing wave and the rotary clocking are, 20.56% and 3.05% of the total clock period, respectively, proving rotary clocking technology to be superior compared to the standing wave technology.

# 6. Interconnect Modeling and Parasitic Analysis for Rotary Clocking

Rotary clocking is a GHz range clocking technology. The frequency of the rotary ring is estimated by (2.3), which depends on the circuit parasitics in (2.4) and (2.5). However, the high frequency effects and the impact of the interconnect geometries cannot be captured using (2.3). In the pioneering work on rotary clocking [17], the lumped RLC model is proposed for SPICE simulation. This model does not consider the high frequency effects and hence cannot be used for high frequency analysis. Towards this end, interconnect modeling and parasitic extraction techniques using the *partial element equivalent circuit* (PEEC) and 3-D FEM based electro-magnetic analysis are proposed in Sections 6.1 and 6.2, respectively.

## 6.1 PEEC Based Interconnect Modeling and Parasitic Analysis

A PEEC based method is used to capture the parasitic effects at high frequencies [60, 61]. As the PEEC analysis is based on the Maxwell's wave equations, it models all the electromagnetic effects leading to an analysis of higher accuracy. SPICE simulations including the PEEC models [60, 61] have been proposed for rotary clocking in [18]. In [18], the mutual inductance effects between the two transmission line elements are considered, however, the mutual inductance effects due to topological properties (such as corners and gap) are neglected. Note that, at high frequencies, parasitic effects due to the corners are significant and cannot be neglected. Especially in the CROA topology, the custom rings have varying number of corners. Hence, the parasitic effects of the additional corners are necessary in order to accurately analyze the oscillating frequency of the rotary clocking technology. Towards this end, a more accurate PEEC based analysis is proposed to be integrated with SPICE.



(a) Segmenting the transmission line for inductance computation.

| 1   | 1   | 1       | 1        |
|-----|-----|---------|----------|
| a   | a   | <br>(5) | a<br>(6) |
| (1) | (2) |         |          |
| _1  |     |         |          |
| a   | a   | а       | а        |
| (3) | (4) | (7)     | (8)      |

(b) Different cases in mutual inductance calculation.

Figure 6.1: Mutual inductance computation.

In Section 6.1.1, PEEC based parasitic analysis is presented. In Section 6.1.2, experimental results for the PEEC based parasitic analysis and the SPICE simulations are presented. In Section 6.1.3, impact of PEEC based parasitic analysis on the oscillation frequency is discussed. In Section 6.1.4, power analysis is presented based on the SPICE simulations. A summary is presented in Section 6.1.5.

## 6.1.1 PEEC Based Parasitic Analysis

In order to perform the improved PEEC based analysis, the transmission line interconnects forming the rotary rings are partitioned into uniform segments similar to the procedure adopted in [18]. On these segments, constant current densities and charge densities are formed. Each of these segments is further divided into uniform filaments of length l as shown in Fig. 6.1(a). Using the constant current and charge densities, the mutual inductance between the segments can be computed using the center filaments as explained in [62, 92]. The inductance contributed by the different segments of lengths l and a for the custom ring can be summarized leading to eight (8) different cases for PEEC analysis as shown in Fig. 6.1(b). The PEEC analysis is performed on all the corners analytically using the formulas in [62, 92]. For any given custom ring topology, analytically computed models are synthesized to perform SPICE simulations of improved accuracy.

The PEEC analysis results are presented in three stages in order to first establish the importance of parasitic analysis on various geometries and then to demonstrate its overall impact. First, the PEEC analysis is performed on a geometry constituting a "corner". Second, the PEEC analysis is performed to compute the mutual inductance between the opposite edges of the custom rings called a "gap". The varying parasitics of the corner geometry component and the gap geometry component are compared to a "regular" segment of two equal length parallel transmission lines. Third, the components are merged to analyze the parasitics for the entire custom ring with a varying number of corners.

# 6.1.1.1 Corners

Consider a custom ring shown in Fig. 6.2. An enlarged version of a particular segment of a custom ring with two corners is shown. Each corner segment in a custom ring is similar to one of the corner segments marked as P or Q. Due to the cross-connected arrangement for rotary rings, each corner on the rotary ring has the outer transmission line and the inner transmission line as shown in the enlarged part of Fig. 6.2. The separation between the two transmission lines is s and the width of each transmission line is w. The mutual inductance for each corner segment consists of two parts called the horizontal part (x part) and the vertical part (y part). For corner segments P and Q, the horizontal parts are marked as P<sub>x</sub> and Q<sub>x</sub>, respectively,



Figure 6.2: Corners and gap in a custom ring.

and the vertical parts are marked as  $P_y$  and  $Q_y$ , respectively. A corner segment of type P is of length 2l, where as a corner segment of type Q is of length 2a on the outer transmission line.

Consider the case of the horizontal part  $P_x$  of the corner segment P. The outer transmission line is marked as l and the inner transmission line is marked as a. The mutual inductance between the outer transmission line l and the inner transmission line a [by identifying the cases shown in Fig. 6.1(b)] is computed using [62]:

$$M = \left[\frac{1}{2}(L_{s+t} + L_{s-t}) - L_s\right] \cdot \left(\frac{s}{t}\right)^2 + \left(L_{s+t} - L_{s-t}\right) \cdot \left(\frac{s}{t}\right) + \frac{1}{2}\left(L_{s+t} + L_{s-t}\right), \quad (6.1)$$

where the subscripts to inductance L indicate the thickness of the segment, whose width is w and length is l. The self inductances L on the RHS of (6.1) are calculated using:

$$\frac{L}{l} = \frac{\mu}{2\pi} \Big[ \ln \frac{2l}{0.2235 \cdot (w+t)} - 1 \Big], \tag{6.2}$$

where s, t, l, w represent the separation, thickness, length and width of the transmission line segments, respectively, and  $\mu = 4\pi$  nH/cm is the permeability in free space. The vertical part  $P_y$  and the horizontal part  $P_x$  have identical mutual inductances, since the dimensions of the transmission lines, the separation and the width remain unchanged. Next, consider the case of horizontal part  $Q_x$  at corner Q. In this case, the lengths of the outer and inner transmission lines are a and l, as opposed to l and a as in  $P_x$ . In a similar manner, the mutual inductance between l and a is computed based on the cases shown in Fig. 6.1(b), using (6.1) and (6.2). The additional capacitance due to the corner is estimated using [52]:

$$C_{corner} = 0.5 \times C_l \times w, \tag{6.3}$$

where  $C_l$  is the capacitance per unit length of the transmission line. The impact of corners on capacitance due to the increased wire width is included in the proposed simulation model. These improved simulation models are particularly important for the custom topology rings, where the number of corners can be high. However, the models should be used for the regular rotary rings as well, where the regular rings have the added capacitance of the four (4) corners.

# 6.1.1.2 Gap

The distance between the opposite edges in a custom ring, marked on Fig. 6.2, is termed a "gap". A custom ring can have multiple edges, and every opposite edge pair



Figure 6.3: Possible custom ring topologies with  $P_r = 12$  grids.

contributes towards the parasitics of the corner geometry. In order to investigate the level of this contribution, the mutual inductance between the opposite pairs (i.e. gap) is analyzed. The mutual inductance computation for the gap is similar to a case of two parallel segments with equal lengths [case (6)] as shown in Fig. 6.1(b). The major difference is that the separation between the transmission lines is the length of an edge of the custom ring as opposed to the separation  $\mathbf{s}$  in Fig. 6.2. It is projected that due to the distance between the transmission lines, the mutual inductance contributed by the gap is controllable. Based on this projection a minimum gap G is devised for a custom ring, either to eliminate the mutual inductance by keeping gap G long enough or by analyzing for the existing mutual inductance due to the gap G.

## 6.1.1.3 Custom Ring Topologies of CROA

From a topology perspective, the parasitics of a custom rotary ring depend on the gap dimensions, the number and the type of corners. In computation, the rotary ring is partitioned into regular (straight) segments R, corner segments (of type P and Q) and gap segments (of type G), as shown in Fig. 6.2. The parasitics of each of the regular segment R, the corner segments P and Q and the gap segment G are computed as described in Section 6.1.1.1 and Section 6.1.1.2, and summed over the entire length of the rotary ring. In Fig. 6.3, an arbitrary upper bound of twelve (12) corners is selected for computation purposes. A higher number of corners is possible.

The total mutual inductance of custom rings with the minimum, and any possible number of corners up to 12 is computed in experimentation. Furthermore, the frequencies for different rotary topologies are simulated with SPICE using the improved model with PEEC analysis. These results are demonstrated in Section 6.1.2.

#### 6.1.2 Experimental Results

A series of PEEC computations are performed for the rotary ring implementations of nominal dimensions (separation and width) but with varying number of corners. SPICE models are created using U-element models and the PEEC analysis results from Section 6.1.1. A U-element [56] is used to model the lossy transmission line. The U-model in HSPICE effectively captures the resistance, self inductance, self capacitance, mutual capacitance values. However, the mutual inductances of corners and gaps are significantly different, which are incorporated separately. Consequently, the results of the PEEC analysis are used to model the mutual inductance between the U-elements and the self capacitance and resistance of the corner elements as a part of the SPICE model. Such a model captures the expected behavior of the corner segments and gaps. The rotary ring is simulated in HSPICE with a 180 nm device technology. The PEEC computation results and SPICE simulations are presented in Section 6.1.2.1 and Section 6.1.2.2, respectively. Further, the power analysis on the custom rotary rings using the SPICE simulation models are presented in Section 6.1.4.



Figure 6.4: Change in mutual inductance when corner segments P and Q are compared with a regular segment R.

# 6.1.2.1 PEEC Analysis

The PEEC analysis results are presented in three stages. First, the results for the "corner" analysis in Section 6.1.1.1 are presented. In this case, a regular segment of type R is compared with corner segments of types P and Q in order to observe the total increase in inductance of the transmission line of equal lengths due to the mutual inductance of the corner elements. In Fig. 6.4(a), the variation in mutual inductance with a fixed width and a varying separation is shown. It is seen that for a fixed width, the increase in separation causes a linear decrease in the mutual inductance. For a nominal case implementation of separation  $s = 40\mu m$  and width  $w = 20\mu m$ , each corner segment (of type P or Q) leads to 79.9% increase in the mutual inductance when compared with a regular segment (of type R).

Second, the results for the "gap" component analysis in Section 6.1.1.2 are presented. In this case, the opposite edges of the custom ring are divided into regular segments of type R. The opposite edges in each segment have an equal length l with the gap G varied to analyze the effect of mutual inductance. In Fig. 6.5, the plot shows the decrease in mutual inductance with an increase in the gap for the CROA methodology tested on the R1 benchmark circuit. With a grid size of  $1000\mu m$  (cor-


Figure 6.5: Mutual inductance for varying "gap" with a segment length of 1000 units.



Figure 6.6: Overall increase in the mutual inductance of a custom ring with an additional corner pair compared with the overall mutual inductance of a regular ring. Note that, the vertical axis is in % (e.g. 0.9% for s=25,w=5units)

responding to R1 circuit), it is seen that if the minimum gap is approximately 70% of the grid size, the mutual inductance is negligible (< 0.000022%). For the custom ring implementations, the gap has to be fixed as greater than 70% of the grid size to eliminate the "gap" effect. In a regular ring of conventional ROA, the gap is  $\frac{P_r}{4}$  long. This gap is long enough so that the mutual inductance contributed by the gap can be safely neglected, which has been the norm.

Third, the results for the overall "custom ring topology" analysis in Section 6.1.1.3 are presented. In Fig. 6.6, the plots of the overall increase in the total mutual inductance of a custom ring with six (6) corners are shown compared to a regular ring with

| Custom topology                        | $F_{th_1}$ (GHz) | $F_{th_2}$ (GHz) | Var   |
|----------------------------------------|------------------|------------------|-------|
| $\frac{1}{1} \text{ Min corners } (4)$ | 4.70             | 4.56             | 2.98% |
| Nom corners (8)                        | 4.70             | 4.38             | 6.81% |
| Max corners $(12)$                     | 4.70             | 4.25             | 9.57% |

Table 6.1: Comparison of  $F_{th_1}$  (frequency without PEEC parasitics) and  $F_{th_2}$  (frequency with PEEC parasitics), as approximated by (2.3).

varying separation and width. For all practical dimensions (separation and width), the PEEC computations suggest that the change in the total mutual inductance is under 1% for the CROA ring topology. At this scale, it is seen that for a fixed width (separation), the increase in separation (width) causes the mutual inductance to decrease. Note that, although the increase in the total mutual inductance for every corner segment is very high (about 79.9%) compared to a regular segment, the overall increase in total mutual inductance for every additional corner pair of a custom ring is not very high (under 1%) compared to the overall mutual inductance of a regular ring. This trend is reasonable as the number of regular segments in a custom ring is much higher than the number of corner segments.

A series of PEEC computations are performed for the CROA implementations of nominal dimensions (separation and width) but with a varying number of corners. In order to compute the oscillation frequencies, the length units of the R1-R5 benchmark circuits are scaled to reflect the pioneering implementation in [17]. For instance a perimeter of 50000units for R1 corresponds to a perimeter of 3200  $\mu$  in a 180 nm technology. For the ring with the minimum number of corners [shown in Fig.6.3(a)], the mutual inductance analysis results in 42 straight segments and 4 corner segments. The total mutual inductance on this structure is computed as 0.1288 nH. The frequency estimated using (2.3) for the CROA topology with the minimum number of corners is 4.56 GHz. Similarly, for the ring with the maximum number of corners [shown in Fig.6.3(b)], the analysis results in 34 straight segments and 12 corner segments. The total mutual inductance in this case is computed as  $0.1485 \ nH$  and the computed frequency is  $4.25 \ GHz$ . For a custom ring with nominal (eight) number of corners, the analysis results in a total mutual inductance of 0.1399 nH and the computed frequency is  $4.38 \ GHz$ . The results are shown in Table 6.1, under the third column labeled  $F_{th_2}$ , which is the frequency computed with (2.3) using the mutual inductance (computed using PEEC analysis) for the corresponding number of corner segments. In Table 6.1, the first column depicts the number of corners in the CROA topology. The second column shows  $F_{th_1}$ , which is the frequency computed using (2.3) without considering the mutual inductance effect due to the corner segments. This frequency is the same across CROA topologies with different number of corners, as the additional mutual inductance due to the corner segments is neglected. The fourth column depicts the variation in the computed frequency due to the added mutual inductance. When compared with  $F_{th_1}$ , the change in  $F_{th_2}$  increases from 2.98% to 9.57%, for the minimum number of corners (4) to the maximum number of corners (12) of the custom ring topology, respectively.

## 6.1.2.2 Simulation Results with SPICE

In this section, SPICE models are created using U-element models and the PEEC analysis results for mutual inductance. A U-element [56] is used to model the lossy transmission line. The U-model in HSPICE effectively captures the resistance, self inductance and self capacitance of the transmission lines. The U-model also captures the mutual inductance and mutual capacitance values between the parallel transmission line pairs. However, the mutual inductance of the transmission lines at the corners are significantly different, which are incorporated separately. Consequently, the results of the PEEC analysis are used to model the mutual inductance between



Figure 6.7: A portion of the SPICE simulation schematic.

the U-elements and the self capacitance and resistance of the corner elements as a part of the SPICE model. Such a model captures the expected behavior of the corner segments. Based on the perimeter of the rotary ring, 24 transmission lines segments are used and a total of 24 uniform cross coupled inverter pairs are placed at equal distances from each other on each rotary ring [17]. A portion of the proposed simulation schematic setup for rotary clocking with the U-Elements and the parasitics (for corners) is shown in Fig. 6.7.

First, the rotary clocking circuit is set up in SPICE for the ring based topology. In this setup, the parasitics due to the corner and the gap segments are neglected (which is the state of the previous research in [17–19]). The clock waveform obtained for this setup is shown in Fig. 6.8. The simulated clock frequency is  $4.74 \ GHz$ .



Figure 6.8: Clock signal simulated for a rotary ring with no parasitics at the corners.

Next, the SPICE model is modified in order to incorporate the parasitic components including the mutual inductance elements computed in the PEEC analysis. The clock waveforms obtained for the CROA topology with minimal corners (4 corners), nominal corners (8 corners) and the maximum number of corners (12 corners) are presented in Fig. 6.9. The frequencies for the minimal, nominal and the maximum number of corners are,  $4.623 \ GHz$ ,  $4.465 \ GHz$  and  $4.362 \ GHz$ , respectively. It is observed that the clock frequency decreases with increasing number of corners. This decrease is expected because the mutual inductance increases with the increasing number of corners of the CROA topology.

In Table 6.2, the SPICE simulation results with and without corner parasitics for a different number of corners are compared. The first column depicts the number of corners in the CROA topology. The second column shows the simulated frequency  $F_{sim_1}$ , without considering the parasitics due to the corner segments.  $F_{sim_1}$ stays the same for different CROA topologies as parasitics due to the corners are neglected. The third column shows the simulated frequency  $F_{sim_2}$ , using the mutual inductance (computed using PEEC analysis) for the corresponding number of corner segments. The fourth column depicts the variation in  $F_{sim_2}$  compared with  $F_{sim_1}$ . The variation is the improvement in accuracy of the proposed simulated frequency



Figure 6.9: Clock signals obtained for CROA topologies with varying number of corner segments.

due to the mutual parasitic analysis presented. When compared with  $F_{sim_1}$ , the frequency accuracy of  $F_{sim_2}$  improves from 2.53% to 8.02%, for the minimum number of corners to the maximum number of corners of the CROA topology, respectively. The decrease in frequency  $F_{sim_2}$  is attributed to the increased mutual inductance due to the additional corners in the custom ring topology.

In order to evaluate the accuracy of the approximations in the theoretical computations in (2.3), the simulated frequencies (using SPICE) are compared with the

| Custom topology    | $F_{sim_1}$ (GHz) | $F_{sim_2}$ (GHz) | Acc. imp. |
|--------------------|-------------------|-------------------|-----------|
| Min  corners  (4)  | 4.74              | 4.62              | 2.53%     |
| Nom corners $(8)$  | 4.74              | 4.46              | 5.91%     |
| Max corners $(12)$ | 4.74              | 4.36              | 8.02%     |

Table 6.2: Comparison of frequency  $F_{sim_1}$  without PEEC parasitics and frequency  $F_{sim_2}$  with PEEC parasitics, as simulated in HSPICE.

Table 6.3: Comparison of simulated frequency  $F_{sim_2}$  (Corner parasitics, SPICE) with the theoretical frequency  $F_{th_2}$  (Corner parasitics, PEEC) and with the theoretical frequency  $F_{th_1}$  (PEEC).

| Custom topology     | $F_{sim_2}$ (GHz) | $F_{th_2}$ (GHz) | Var   | $F_{sim_2}$ (GHz) | $F_{th_1}$ (GHz) | Var   |
|---------------------|-------------------|------------------|-------|-------------------|------------------|-------|
| Min corners (4)     | 4.62              | 4.56             | 1.29% | 4.62              | 4.70             | 1.70% |
| Nominal corners (8) | 4.46              | 4.38             | 1.79% | 4.46              | 4.70             | 5.38% |
| Max corners $(12)$  | 4.36              | 4.25             | 2.52% | 4.36              | 4.70             | 7.79% |

theoretical frequencies computed using (2.3). In Table 6.3, the theoretical rotary clock frequencies  $F_{th_2}$  [computed using (2.3)] and the simulated clock frequencies  $F_{sim_2}$  (using SPICE) are tabulated. Note that, the same mutual inductance values computed using PEEC based analysis are used in computing the theoretical frequencies and in the simulations. The results of the clock frequencies from SPICE simulations are in agreement with the theoretical frequencies computed from (2.3) with a small variation (1.29% to 2.52% for min corners to max corners case). Thus, the theoretical estimation in (2.3) provides a reasonable approximation to the expected clock frequency.

#### 6.1.3 Impact on the Oscillation Frequency

Recall from Section 2.1.3.3 that the inductance [in (2.4)] and the capacitance [in (2.5)] properties characterize the frequency of oscillation in rotary clocking technology. The oscillation frequency for the rotary oscillator is approximated as  $f_{osc} \approx \frac{1}{2\sqrt{L_T C_T}}$ . This

relation can be rewritten [using (2.5)] as,

$$f_{osc} \approx \frac{1}{2\sqrt{(L_T)(\sum C_{reg} + \sum C_{inv} + \sum C_{ring} + \sum C_{wire})}},$$
(6.4)

where  $L_T$  is the total inductance that does not include the mutual inductance due to the corners of the rotary ring. As analyzed in Section 6.1.1, for a more accurate analysis, the corner parasitics have to be included in the computation. When corner parasitics are included, the frequency of the regular rotary oscillatory array is estimated by:

$$f_{roa} \approx \frac{1}{2\sqrt{(L_{roa}) \times (C_{roa})}},\tag{6.5}$$

where the total inductance on the ROA  $L_{roa}$  is estimated by:

$$L_{roa} \approx L_T + M_{corner} \times N_c. \tag{6.6}$$

The total capacitance  $C_{roa}$  is estimated by:

$$C_{roa} \approx \sum C_{reg} + \sum C_{inv} + \sum C_{ring} + \sum C_{wire} + C_{corner} \times N_c, \qquad (6.7)$$

where  $M_{corner}$  and  $C_{corner}$  are the mutual inductance and capacitance exhibited by each corner, respectively.  $N_c$  is the number of corners (4 in ROA). Although the absolute frequency figures are not comparable, it is clear that due to the additional parasities at the corners,  $f_{roa} < f_{osc}$ .

As reported in Chapter 3 (Section 3.1.3), an average of 39.25% of the tapping wirelength can be saved in CROA. As analyzed in Section 6.1.1, however, the design of the CROA topology also causes a change in the inductance due to the varying corners. Overall, the frequency of the CROA is given by:

$$f_{croa} \approx \frac{1}{2\sqrt{(L_{croa}) \times (C_{croa})}},\tag{6.8}$$

where  $L_{croa}$  is estimated by:

$$L_{croa} \approx L_T + M_{corner} \times N_c. \tag{6.9}$$

 $C_{croa}$  is estimated by:

$$C_{croa} \approx \sum C_{reg} + \sum C_{inv} + \sum C_{ring} + (1 - 39.25\%) \sum C_{wire} + C_{corner} \times N_c. \quad (6.10)$$

Note that,  $N_c \ge 4$  for CROA, and  $Max(N_c) = 12$  in the current CROA design. Depending on the granularity of the grid and the gap specifications (so that the mutual inductance due to the gap is negligible) the  $Max(N_c)$  can be different for other CROA topologies. In CROA, the wirelength savings (resulting in smaller  $\sum C_{wire}$ ) overcomes the effects of additional parasitics due to the increased corners, and  $f_{croa} \ge f_{osc}$ . In cases where the frequency increase is undesirable, the perimeter of the custom rings in CROA topology can be increased proportionately in order to compensate for the increase in frequency.

#### 6.1.4 Power Analysis

One of the main characteristics of the rotary oscillators is the charge recovery property. The rotary oscillators store the energy in the inductors during the discharging stage so that the stored energy can be re-circulated during the charging stage-thus minimizing the dynamic power consumption. Hence, the power dissipation in the rotary oscillatory array is mainly the static power due to the resistance of the transmission line interconnects. The overall power dissipation with the custom rings can be estimated as:

$$P_{total} = P_{ring} + P_{load}, \tag{6.11}$$

where  $P_{ring}$  and  $P_{load}$  are the power dissipated on the custom ring and the power dissipation due to the capacitive loads, respectively.  $P_{ring}$  is estimated as:

$$P_{ring} = P_{tra} + P_{inv},\tag{6.12}$$

where  $P_{tra}$  and  $P_{inv}$  are the power dissipated due to the transmission line parasitics and the inverter pairs, respectively.  $P_{load}$  is estimated as:

$$P_{load} = P_{req} + P_{wire},\tag{6.13}$$

where  $P_{reg}$  and  $P_{wire}$  are the power dissipated due to the register load and the power dissipated due to the capacitive loads exhibited by the wires connecting the registers to the rings, respectively.

The rotary rings are simulated in SPICE using the improved simulation models accounting for the parasitics due to the varying interconnect geometries—as discussed in Section 6.1.2.2. The inverters are modeled in a 180 *nm* technology. The IBM **R1-R5** benchmark circuits are used to model the register load and the wire capacitance. The clock load is evenly distributed across the ring for the ease of implementation. The results from Section 3.1 are used to model the wirelengths for the custom rings.

First, the power analysis on the custom ring is considered. Note that, in this case the loading on the rings is not considered. In Fig. 6.10, the percentage increase in the power dissipation across the custom ring with varying number of corners compared to the ring without considering the corner parasitics is plotted. From 4 corners to 8 corners, the percentage increase in the power dissipation climbs from 2.5% to 13%.



Figure 6.10: Percentage increase in power with varying number of corners.

However, for a more accurate analysis, the loading on the rings needs to be considered, which is explained next.

Next, the custom rotary ring is loaded evenly with the capacitive load computed for R1-R5 benchmark circuits. The register input capacitance values from the R1-R5 benchmarks are used to analyze  $P_{reg}$ . Note that, when compared to the wirelength for the regular rotary ring, the custom rotary ring results in approximately 8%, 12%, 11.75%, 13.5% and 10% improved wirelengths, for R1-R5 circuits, respectively [36]. These wirelength results for the R1-R5 benchmark circuits are considered for the  $P_{wire}$ analysis. The simulation model is incorporated with the above loading details and the custom rotary rings with varying number of corners are simulated to measure  $P_{total}$ . In Fig. 6.11, the total power dissipation on the custom rings with varying number of corners are compared with the regular ring (4 corners). With the increased number of corners, the ring power  $P_{ring}$  is increased, however, the overall power  $P_{total}$  is reduced. Note that, due to the reduced wirelength in the custom rings, the  $P_{wire}$ in (6.13) is reduced. Further,  $P_{ring}$  and  $P_{reg}$  are identical for both custom and regular rotary rings across R1-R5 circuits. Hence, this reduction in  $P_{wire}$  is resulted in a



Figure 6.11: Total power dissipation on the custom ring with varying number of corners compared with the regular ring.

reduction in the overall power dissipation  $P_{total}$  according to (6.11). The total power dissipated ( $P_{total}$ ) on the custom ring (corners between 4 and 12) is within  $\pm 5\%$  of the total power dissipated on the regular ring (4 corners).

## 6.1.5 Summary

In this chapter, a methodology to analyze the parasitics resulting from the geometries in the regular and custom ring topology is presented. Simulation based analysis is performed for the clock waveforms of the rotary ring topologies incorporating the parasitics. When the parasitics contributed by the varying geometries of the rotary rings are considered, the resultant clock frequency is observed to be 8% less than the expected frequency from the formulations in [17–19]. Further, the power dissipation on the rotary ring is analyzed with varying number of corners. When tested with the R1–R5 benchmark circuits, the total power dissipated on a custom ring (corners between 4 and 12) is within  $\pm 5\%$  of the total power on a regular ring (4 corners).

# 6.2 Parasitic Analysis–Revisited: 3-D Parasitic Modeling for Rotary Interconnects

In Section 6.1, a detailed discussion on the partial element equivalent circuit (PEEC) based parasitic analysis for the rotary interconnects is presented. However, these methods are incomplete as they do not accurately capture the parasitics due to the different geometries (corners and crossovers) in the rotary rings. The accurate characterization of on-chip rotary interconnects requires 3D full wave electromagnetic analysis. Towards this end, a 3D finite element based full wave electromagnetic analysis is presented for the characterization of different transmission line segments constituting a rotary ring. The rotary ring is modeled in SPICE incorporating the parasitics extracted from the 3D electromagnetic analysis and is compared with the U-model and the PEEC models. Further, the power dissipated on the rotary ring is analyzed using the SPICE simulations.

The rest of the chapter is organized as follows. In Section 6.2.1, interconnect modeling is presented. In Section 6.2.2, parasitic analysis for the rotary ring structures is presented. In Section 6.2.3, the experimental results are presented. In Section 6.2.6, the chapter is summarized.

# 6.2.1 Modeling Interconnect Parasitics for 3-D Based Extraction

A 3D full wave electromagnetic based analysis is the most accurate way of modeling the transmission line parasitics. However, they are computationally intensive and time consuming. Especially, for the array structure of rotary rings, full wave electromagnetic analysis is understandingly computationally expensive and time consuming. Hence, to speed up this computation, simple sub-structures of transmission line segments for the rotary oscillatory array are examined.



(a) Segments on a regular ring



Figure 6.12: Segments on the regular and custom rings.

Consider the rotary rings shown in Fig. 6.12. In order to accurately analyze the parasitics, the interconnects forming the rotary rings are partitioned into straight, corner, crossover, and gap segments, which are categorized in Sections 6.2.1.1, 6.2.1.2, 6.2.1.3, and 6.2.1.4, respectively.

#### 6.2.1.1 Straight Segments

Consider the straight segment on a rotary ring topology shown in Fig. 6.12. The magnified view of the straight segment is shown in Fig. 6.13(a). Each straight segment is composed of length  $l_{seg}$ , width w, and thickness t. The separation between the transmission lines is s. Straight segments are the most abundant geometric shapes in a regular or custom topology rotary clock network.

#### 6.2.1.2 Corner Segments

Consider a corner segment on the rotary ring shown in Fig. 6.13(b). Due to the cross-connected arrangement (mobius topology of differential transmission lines) for rotary rings, each corner on the rotary ring has the outer transmission line and the

inner transmission line. The length of the transmission line at the corner segment is composed of  $l_{seg}$  and  $l_{add}$ . Note that,  $l_{add}$  is the additional transmission line contributed by the corner segment as shown in Fig. 6.13(b). The number of corner segments is at minimum four and does depend on the custom topology in a non-regular ring.

#### 6.2.1.3 Crossover Segments

The traveling wave in rotary clocking is not terminated due to the mobius crossing on the rotary ring. Consider a crossover segment on the rotary ring shown in Fig. 6.13(c). The length of the transmission line at the crossover segment is composed of  $l_{seg}$  and  $l_{add}$ . In an IC implementation, the crossover segment is fabricated on two metal layers to avoid a short circuit. Typically, each rotary ring has a unique crossover segment, although, higher number of crossover segments are possible.

#### 6.2.1.4 Gap Segments

The distance between the opposite edges in a rotary ring, marked on Fig. 6.13(d), is termed a "gap". In particular, a custom ring can have multiple edges, and every opposite edge pair contributes towards the additional parasitics. In order to investigate the level of this contribution, the mutual inductance between the opposite pairs (i.e. gap) is analyzed. The mutual inductance computation for the gap is similar to the case of mutual inductance in a straight segment. The major difference is that the separation between the transmission lines is the length of an edge of the custom ring as opposed to the separation  $\mathbf{s}$  in Fig. 6.13(a). It is projected that due to the distance between the transmission lines, the mutual inductance contributed by the gap is controllable. Based on this projection a minimum gap is devised for a rotary ring, either to eliminate the mutual inductance by keeping the gap long enough or by



Figure 6.13: Different types of segments on a rotary ring.

analyzing for the existing mutual inductance due to the gap. The gap segments exist in all rotary topologies. For the regular ring topology, the length of gap segment is  $\frac{2l}{4}$ . However, for the non-regular ring topology, the length of the gap segment depends on the custom topology.

# 6.2.2 Parasitic Analysis

The equivalent circuit for the straight segment (Section 6.2.1.1) can be modeled by the U-model in SPICE. However the corner segments (Section 6.2.1.2) and the crossover segments (Section 6.2.1.3) can not be modeled by the U-model in SPICE. The 90° bend at the corner segment causes reflection in the traveling waves due to which the wave velocity  $v_p$  is not uniform at the corner segments. Note that, the transmission line U-models in SPICE do not take into consideration the additional corner parasitics. As an alternative, the closed form PEEC equations are used to capture the corner parasitic effects at high frequencies as explained in Section 6.1.1.

The crossover segment involves the interconnects crossing over multiple metal layers. The accurate modeling of the crossover requires the analysis of the electric and magnetic coupling due to the multiple metal layers and the substrate parasitics.

For the gap segment analysis (Section 6.2.1.4), the opposite edges of the rotary ring are divided into regular segments. The mutual inductance between the two straight segments constituting the gap is computed using (6.1). The separation s is the gap in this case.

#### 6.2.3 Experimental Results

The rotary ring is implemented on the 90 nm CMOS IC process with the BSIMv4 transistor model. The perimeter of the rotary ring (3200  $\mu$ ) is fixed based on the desired oscillation frequency  $f_{osc}$  (4.5 GHz). The different segments of the rotary ring corresponding to Sections 6.2.1.1, 6.2.1.2, and 6.2.1.3 are analyzed and included in the modified circuit models for rotary ring simulation. The simulation results are presented in 6.2.3.1. The power analysis on the rotary ring is presented in Section 6.2.4. Further, the effects of parasitics on the rotary oscillation frequency and the phase velocity are discussed in Section 6.2.5.



Figure 6.14: Different types of interconnect modeling topologies.

# 6.2.3.1 Results for Interconnect Segment Modeling

Based on the interconnect segments, the corresponding parasitics can be extracted by using the 2D analysis topology used in the previous works (e.g. topology used by U-model in SPICE) and the multi-layered process based topology (e.g. 90 nm process topology). The process based topology more accurately models a typical IC and the environment of operation for the rotary ring. In Fig. 6.14(a), the basic structure

|                           |           | SPICE U-model (2D) | PEEC         | HFSS 3D      |
|---------------------------|-----------|--------------------|--------------|--------------|
| 2D analysis<br>topology   | Straight  | $\checkmark$       | $\checkmark$ | $\checkmark$ |
|                           | Corner    | ×                  | $\checkmark$ | $\checkmark$ |
|                           | Crossover | ×                  | ×            | ×            |
| Process based<br>topology | Straight  | ×                  | ×            | $\checkmark$ |
|                           | Corner    | ×                  | ×            | $\checkmark$ |
|                           | Crossover | ×                  | ×            | $\checkmark$ |

Table 6.4: Different interconnect segment modeling topologies and analysis methods.

used to compute the parasitics using the 2D analysis topology used by the U-model in SPICE is illustrated. The para-meters SP, WD, TH, correspond to the separation, width and thickness of the transmission lines, respectively. HT is the height of the dielectric. This topology lacks the multi-metal layers which are typically present in all IC modeling. Also, the 2D analysis topology does not model the lossy substrate and a high conductivity epitaxial layer present in most semiconductor processes. In order to model the environment of operation for an electromagnetic analysis, it is necessary to model the process topology. The electromagnetic analysis are performed on the 90 nm low power process based topology. HFSS solver is used to perform the 3-D full wave electromagnetic analysis [58]. In Fig. 6.14(b), the basic structure used to compute the parasitics is illustrated.

In Table. 6.4, the parasitic extraction methods for the straight segment, corner segment and the crossover segments are tabulated. The straight segments can be extracted using the 2D analysis topology [based on Fig. 6.14(a)], with a 2D solver (using U-model in SPICE), PEEC and HFSS 3D (3D FEM based full wave electromagnetic analysis tool). However, the corner segments can be extracted either using PEEC modeling or using HFSS 3D modeling. SPICE U-model does not account for the corner parasitics. Also, the crossover segments cannot be characterized in the 2D anal-



Figure 6.15: Mutual inductance for varying "gap".  $1unit=1\mu m$ .

ysis topology as the different metal layers and the substrate effects are absent in this topology.

SPICE circuit for the rotary ring is constructed with the parasitics extracted using the straight segments without incorporating the corner effects. The clock waveforms obtained are shown in Fig. 6.16(a). The oscillation frequency is 4.35 GHz. Next, the corner parasitics are incorporated with the straight segment parasitics for the rotary ring circuit in SPICE. The clock waveforms obtained are shown in Fig. 6.16(b). The oscillation frequency in this case is 4.33 GHz. Note that, there is a slight decrease in the frequency when the additional corner parasitics are included in simulation. For a ring with increased corners [36], the oscillation frequency further reduces due to the added parasitics of the corner segments.

Next, the process based topology is used to characterize the parasitics for different rotary segments. The HFSS 3D full wave electromagnetic analysis is used to model the straight, corner, crossover, and the segments. Note that, for the "gap" segment analysis in Section 6.2.1.4, the opposite edges of the rotary ring are divided into regular segments. In Fig. 6.15, the plot shows the decrease in mutual inductance with increase in gap for the rotary ring methodology. With a segment size of  $1000\mu m$ , if the minimum gap is approximately 70% of the segment size, the mutual inductance becomes negligible. For the custom ring implementations in [36], the gap has to be fixed as greater than 70% of the minimum length of the ring edge to eliminate the "gap" effect. In a regular ring of the conventional ROA, the gap is  $\frac{P_r}{4}$  long. This gap is long enough so that the mutual inductance contributed by the gap can be safely neglected, which has been the norm. The rotary circuit is rebuilt in SPICE with the parasitic analysis using HFSS. The clock waveforms obtained are shown in Fig. 6.16(c). The observed oscillation frequency is 3.32 GHz.

# 6.2.4 Power Analysis

The power dissipated on the rotary rings is analyzed based on the formulation in Section 6.1.4. The power dissipated is measured using SPICE simulations. Rotary ring is simulated in SPICE using the U-models incorporating the parasitics of the segments characterized in Section 6.2.1. The power dissipation is tabulated in Table 6.5.

First the SPICE circuit for rotary ring is constructed with the parasitics extracted using the straight segments without incorporating the corner effects. This is the current state of research in [18, 19]. The power dissipation is 0.248 w.

Next, the corner parasitics are incorporated with the straight segment parasitics for the rotary ring circuit in SPICE. The power dissipation is 0.251 w. Note that, the additional 0.003 w power dissipation in this case is due to the additional parasitics of the corner segments.

Finally, the rotary ring is simulated using the straight (Section 6.2.1.1), corner (Section 6.2.1.2) and the crossover segments (Section 6.2.1.3) characterized using the process based topology (90 nm process). This is the most accurate characteriza-



(c) Clock signal simulated for a rotary ring with additional corner parasitics and the cross over parasitics.

Figure 6.16: Simulated clock waveforms.

| Type of segments in the ring                | Power                |
|---------------------------------------------|----------------------|
| Only straight                               | $0.248 \ \mathrm{w}$ |
| Straight and Corner $(4)$                   | $0.251 {\rm ~w}$     |
| Straight, Corner $(4)$ and Cross-Over $(1)$ | $0.260 \ {\rm w}$    |

Table 6.5: Power dissipation on the ring with different segments used for simulation.

tion of the rotary ring, because, it includes the regular segments, corner segments (4 in the ring) and the crossover segment. The power dissipation observed is 0.260 w. Note that, the additional 0.012 w in this case compared to the case with straight segment and corner segments is due to the additional parasitics of the crossover segments. Thus, the proposed model with the increased accuracy—obtained through the proposed scalable application of 3D full wave electromagnetic simulations—leads to a 4.84% increase in the power dissipation projected for a 3200  $\mu m$  rotary ring operating at 3.32 GHz in a 90 nm technology.

## 6.2.5 Discussion on Oscillation Frequency and Phase Velocity

Let the simulated oscillation frequency using 2D modeling of parasitics and 3D modeling of the parasitics be  $f_{2D}$  and  $f_{3D}$ , respectively. From the simulation results shown in Fig. 6.16, for a design frequency of  $f_{osc} = 4.5 \ GHz$ , the resulted frequencies  $f_{2D}$ and  $f_{3D}$  are 4.35 GHz and 3.32 GHz, respectively. Note that, with the addition of corner and crossover parasitics the oscillation frequency is reduced by 23.68%. This drop in frequency can be attributed to the non-uniform velocity of the traveling wave due to the corner and crossover segments. The frequency in (2.3) is rewritten as:

$$f^{new} = \alpha \frac{v_p^{straight}}{2l},\tag{6.14}$$

where  $\alpha$  is the compensation factor due to the corner and crossover parasitics and  $v_p^{straight}$  is the phase velocity of the wave on a straight transmission line. In general, the compensation factor  $\alpha$  is the slowdown of the propagation velocity due to parasitics and can be empirically estimated by:

$$\alpha = \frac{f_{3D}}{f_{2D}}.\tag{6.15}$$

In the simulated rotary ring with 4 corners and a crossover, the compensation factor  $\alpha$  is 0.76.

#### 6.2.6 Summary

In this chapter, different segments constituting the rotary rings are identified. The different methods used for analyzing parasitics for these interconnect structures are revisited. A 3D finite element based electromagnetic analysis is adopted for characterizing the additional parasitics contributed by the corners and crossovers. The simulations show that the 3D full wave based parasitic analysis results in 23.68% reduction in the observed oscillation frequency when compared with the parasitics analyzed using the 2D based methodology. Further, power dissipated in the rotary ring using the 3D full wave based parasitic modeling results in 4.84% more frequency compared to the ring using the 2D based methodology. Thus, the proposed 3D based methodology is critical for timing (due to 23.68% over estimation of frequency in the 2D methodology), however, is not so critical for the power dissipation (within 5% of the power dissipated in the 2D methodology).

## 7. Conclusion and Future Directions

Rotary clocking technology and the mobius standing wave technology provide attractive alternatives to the conventional clocking schemes with the high-frequency and low-power operation. However, these technologies lack the design automation methodologies for integration with the mainstream IC design flow. Towards this end, various methodologies are proposed in this dissertation which are summarized in Section 7.1. Future directions are discussed in Section 7.2.

# 7.1 Conclusion

In this dissertation, topology, timing analysis, optimization, parasitic modeling and power analysis aspects of rotary clocking technology are evaluated and design automation algorithms are proposed for easy integration of rotary clocking with the mainstream IC design flow. The work related to topology, timing analysis and optimization, parasitic modeling and power analysis are summarized in Sections 7.1.1, 7.1.2, and 7.1.3, respectively.

#### 7.1.1 Topology Related Work

From the topology perspective, two novel methodologies are presented for rotary clocking in Chapter 3. First, a novel methodology called custom rotary oscillatory array (CROA) is proposed for the design and distribution of rotary clocking. In CROA, a physical design flow for connecting non-zero skew registers on to the custom rings so as to satisfy the register skew requirements is described. With the CROA topology a 39% tapping wirelength savings are demonstrated leading to reduced wire congestion and reduced power dissipation. Second, an algorithm called zero clock skew

synchronization (ZCS) is developed for the synchronization of "zero clock skew" components with rotary clocking technology. The myth of rotary clocking as only a nonzero skew clocking technology is disproved by the proposed ZCS methodology, which demonstrates minimal degradation in tapping wirelength and oscillation characteristics compared to the non-zero skew implementation. These results are encouraging in proving the feasibility of using industrial tool flows (placement and routing) targeting zero clock skew implementations in rotary-clock-synchronized-circuits.

# 7.1.2 Timing Analysis and Optimization Related Work

From the timing and optimization perspective, the challenges are in addressing the requirements of the non-zero skew synchronization due to the traveling wave operation and in maintaining the balanced capacitive load on the rings in order to achieve a stable resonant operation. To this end, in Chapter 4, first, a bounded skew constraint methodology is presented for rotary clocking to limit the skew mismatch. Next, two novel capacitive balancing methodologies (OCLB and SOCLB) are proposed for the stable frequency and operation of rotary clock signals. SPICE simulations verify the robust oscillation characteristics of the capacitance balanced rings of the ROA in limiting the frequency variation to 0.30% (when compared to 30.31% in the unbalanced case). Finally, the bounded skew constraint and capacitive load balancing techniques are integrated towards a robust operation of low-power zero skew rotary oscillatory array. Two (2) techniques for simultaneous skew-control and capacitive balancing (SkCLB and ZCSCLB) are proposed demonstrating a 5.62X improvement in capacitive load balance.

Next, the design automation, skew analysis and capacitive balancing methodologies proposed for rotary clocking are extended to the mobius standing wave oscillators in Chapter 5. The capacitive load balancing techniques demonstrate an average improvement of 4.66X compared to the unbalanced capacitance loading. The skew analysis results demonstrate that without capacitance load balancing, the standing wave provides superior skew properties, however, with capacitance balancing, the rotary clocking provides superior skew properties.

### 7.1.3 Parasitic Analysis Related Work

From the parasitic analysis perspective, the challenge is to accurately model the interconnects for efficient rotary prototyping. In order to verify the rotary operation with SPICE based simulations, accurate simulation models incorporating the transmission line chracteristics and the crosstalk characteristics need to be developed. To this end, a PEEC based and a 3-D based modeling techniques are proposed for the rotary interconnects in Chapter 6. Different segments constituting rotary rings are identified for parasitic analysis. First, the effects of the parasitics on the oscillation frequency of the CROA generated square waves are analyzed using PEEC models. The additional parasitics due to the corner segments of the custom rotary rings are incorporated in the modified SPICE simulation models for frequency and power analysis. Next, a 3-D finite element based electromagnetic analysis is adopted for characterizing the additional parasitics contributed by the regular and custom rotary ring segments. The simulations show that the 3D full wave based parasitic analysis results in 23.68% reduction in the observed oscillation frequency when compared with the parasitics analyzed using the 2-D (PEEC) based methodology. Further, a detailed power analysis for the regular and custom rotary rings is presented using the modified SPICE simulation models incorporating the additional parasitics extracted using the 3-D FEM based electromagnetic analysis.

# 7.2 Future Directions

As an extension to the completed work on the resonant clocking technologies, following items are identified.

#### 7.2.1 Synchronization Between the Custom Rings in CROA

The rings in the regular ROA topology are locked in phase by sharing the four-port junctions on the corners of the grid topology shown in Fig. 2.7, which are magnified in Fig. 7.1. In Fig. 7.1, a four-port junction between the two standard ring of ROA topology is shown [17]. In Fig. 7.1(a), the on-time arriving pulse and the delayed arriving pulse are shown. The on-time arrival pulse drives the three transmission lines. When the delayed pulse arrives, the pulses combine and branch into output ports as shown in Fig. 7.1(b). The oscillation signals are locked in phase, and the jitter effects are implicitly minimized.

The CROA topology (explained in Chapter 3), also has a grid structure, however, the corners are not enforced to be interconnected. Thus, the synchronization between the custom rings of the CROA topology requires an additional design element. A preliminary study on the synchronization between the custom rings of the CROA topology is presented here. Consider the CROA topology as shown in Fig. 3.1. For synchronization purposes, the rings are connected using short transmission lines as the additional design element, which form the dual-three-port network junctions. Consider the two custom rings connected together using three-port networks as shown in Fig. 7.2. In Fig. 7.2(a), the velocity mismatched pulses are shown. The on-time arriving pulse and the delayed pulse on the first three-port junction are as shown in Fig. 7.2(a). When the delayed pulse arrives, the delayed pulse combines with the earlier pulse. This new pulse is locked in phase and travels through the other ports synchronizing the phase of the two custom rings as shown in Fig. 7.2(b). Through this



Figure 7.1: Synchronization of regular rings in ROA using 4 port network.



Figure 7.2: Synchronization of custom rings in CROA using 3 port network.

mechanism, the oscillating signals of both the rings are locked in phase, regardless of the direction of the delayed pulse. Note that, the phase delay  $\theta_{3-port}$  contributed by the transmission line segment  $l_{3-port}$  (as shown in Fig. 7.2) connecting the two custom rings with perimeter  $P_r$  is estimated as:

$$\theta_{3-port} \approx \frac{l_{3-port}}{P_r} \times 360^\circ,$$
(7.1)

using the uniform phase velocity  $(v_p)$  assumption discussed in Section 2.1.3.3. For instance, a  $25\mu$  long transmission line connecting the two custom rotary rings of perimeter  $3200\mu$  delays the clock signal by  $\frac{25\mu}{3200\mu} \times 360^{\circ} \approx 3^{\circ}$ . The tapping point phase values  $\theta_i$  are redefined to account for the phase delay  $\theta_{3-port}$  due to the additional transmission line segment  $l_{3-port}$ . For instance in Fig. 7.2,  $\theta_y = \theta_x + 3^\circ$  for the presented propagation order and directions. In Fig. 3.1, the three-port networks are illustrated for a 5-ring CROA topology.

However, this methodology for custom ring synchronization needs to be verified with the SPICE based simulations and finally with the fabricated test chip.

#### 7.2.2 Optimal Placement and Sizing of the Inverter Pairs

The inverter pairs in the rotary rings are instrumental in achieving adiabaticity and amplification of the clock signals. The large size of inverters used to replenish the clock signals in the rotary ring, poses layout related issues in terms of placement and sizing. Further, the size of the inverters in addition to the ring perimeter can be tuned to achieve high operating frequency and low power. Hence, this multi-objective problem of high-frequency low-power with optimal inverter-sizing and optimal inverterspacing on the rotary oscillatory rings needs to be investigated.

## 7.2.3 Fabrication of Rotary Rings

Verification of the frequency and the skew characteristics with the fabricated rotary rings is the most important work that needs to be completed. The impact of on-chip variations in the context of interconnect and inverter modeling, timing and power analysis needs to be studied and addressed. The effects of technology scaling on the operational characteristics of the rotary based designs need to be investigated.

#### Bibliography

- P. Gronowski, W. Bowhill, R. Preston, M. Gowan, and R. Allmon, "High performance microprocessor design," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 5, pp. 676–686, May 1998.
- [2] E. G. Friedman, "Clock distribution networks in synchronous digital integrated circuits," In Proceedings of IEEE, vol. 89, no. 5, pp. 665–692, May 2001.
- [3] H. Sutter, "The free lunch is over: A fundamental turn toward concurrency in software," Mar. 2005, http://www.gotw.ca/publications/concurrency-ddj.htm.
- [4] C. Dike, N. Kurd, P. Patra, and J. Barkatullah, "A design for digital, dynamic clock deskew," in *Proceedings of the International Symposium on VLSI Circuits (ISVLSI)*, Jun. 2003, pp. 21–24.
- [5] V. Gutnik and A. Chandrakasan, "Active GHz clock network using distributed plls," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 11, pp. 1553–1560, Nov. 2000.
- [6] S. Tam, S. Rusu, U. N. Desai, R. Kim, J. Zhang, and I. Young, "Clock generation and distribution for the first ia-64 microprocessor," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 11, pp. 1545–1552, Nov. 2000.
- [7] M. Saint-Laurent, M. Swaminathan, and J. Meindl, "On the micro-architectural impact of clock distribution using multiple plls," in *Proceedings of the IEEE International Conference on Computer Design (ICCD)*, Sep. 2001, pp. 214–220.
- [8] H. G. Chyun and J. Hung, "Phase-locked loop techniques. a survey," IEEE Transactions on Industrial Electronics, vol. 43, no. 6, pp. 609–615, Dec. 1996.
- [9] B. Floyd, C. Hung, and K. O. Kenneth, "Intra-chip wireless interconnect for clock distribution implemented with integrated antennas, receivers, and transmitters," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 5, pp. 522–543, May 2002.
- [10] B. Floyd, X. Guo, J. Caserta, T. Dickson, C.-M. Hung, K. Kim, and K. O. Kenneth, "Wireless interconnects for clock distribution," in *Proceedings of the ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems (TAU)*, Dec. 2002.

- [11] M. Haurylau, G. Chen, H. Chen, J. Zhang, N. A. Nelson, D. H. Albonesi, E. G. Friedman, and P. M. Fauchet, "On-chip optical interconnect roadmap: Challenges and critical directions," *IEEE Journal of Selected Topics in Quantum Electronics*, vol. 12, no. 6, pp. 1131–1134, Nov.–Dec. 2006.
- [12] R. Chen, "Optical interconnects: A viable solution for interconnection beyond 10 gbit/sec," in *Proceedings of the International Symposium on Physical De*sign (ISPD), Mar. 2007, pp. 85–86.
- [13] J. Roychowdhury, "Micro-photonic interconnects: Characteristics, possibilities and limitation," in *Proceedings of the ACM/IEEE Design Automation Conference (DAC)*, Jun. 2007, pp. 574–575.
- [14] M. Saint-Laurent, M. Swaminathan, and J. Meindl, "On the micro-architectural impact of clock distribution using multiple plls," in *Proceedings of the IEEE International Conference on Computer Design (ICCD)*, Sep. 2001, pp. 214–220.
- [15] K.-N. Chen, M. J. Kobrinsky, B. C. Barnett, and R. Reif, "Comparisons of conventional, 3-D, optical, and RF interconnects for on-chip clock distribution," *IEEE Transactions on Electron Devices*, vol. 51, no. 2, pp. 233–239, Feb. 2004.
- [16] V. L. Chi, "Salphasic distribution of clock signals for synchronous systems," *IEEE Transactions on Computers*, vol. 43, no. 5, pp. 597–602, May 1994.
- [17] J. Wood, T. Edwards, and S. Lipa, "Rotary traveling-wave oscillator arrays: a new clock technology," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 11, pp. 1654–1665, Nov. 2001.
- [18] Z. Yu and X. Liu, "Power analysis of rotary clock," in *Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI)*, May 2005, pp. 150–155.
- [19] V. H. Cordero and S. P. Khatri, "Clock distribution scheme using coplanar transmission lines," in *Proceedings of the Design*, Automation and Test in Europe (DATE), Mar. 2008, pp. 985–990.
- [20] S. C. Chan, P. J. Restle, N. K. James, and R. L. Franch, "A 4.6 GHz resonant global clock distribution network," in *Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC)*, Feb. 2004, pp. 341–343.
- [21] S. C. Chan, K. L. Shepard, and P. J. Restle, "Design of resonant global clock distributions," in *Proceedings of the IEEE International Conference on Computer Design (ICCD)*, 2003, pp. 238–243.
- [22] P. J. Restle, T. G. McNamara, P. J. Camporese, K. F. Eng, K. A. Jenkins, D. H. Allen, M. J. Rohn, M. P. Quaranta, D. W. Boerstler, C. J. Alpert, C. A. Carter, R. N. Bailey, J. G. Petrovik, B. L. Krauter, and B. D. McCredie, "A clock"

distribution network for microprocessors," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 792–799, May 2001.

- [23] S. C. Chan, K. L. Shepard, and P. J. Restle, "Uniform-phase uniform-amplitude resonant-load global clock distributions," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 1, pp. 102–109, Jan. 2005.
- [24] F. O'Mahony, C. Yue, M. Horowitz, and S. Wong, "Design of a 10ghz clock distribution network using coupled standing-wave oscillators," in *Proceedings of* the IEEE/ACM Design Automation Conference (DAC), Jul. 2003, pp. 682–687.
- [25] —, "A 10-GHz global clock distribution using coupled standing-wave oscillators," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 11, pp. 1813–1820, Nov. 2003.
- [26] A. Drake, K. Nowka, T. Nguyen, J. Burns, and R. Brown, "Resonant clocking using distributed parasitic capacitance," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 9, pp. 1520–1528, Sep. 2004.
- [27] J. Wood, S. Lipa, P. Franzon, and M. Steer, "Multi-gigahertz low-power lowskew rotary clock scheme," in *Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC)*, Feb. 2001, pp. 400–401.
- [28] J. Wood, T. Edwards, and C. Ziesler, "A 3.5GHz rotary-traveling-wave-oscillator clocked dynamic logic family in 0.25/spl mu/m cmos," in *Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC)*, Feb. 2006, pp. 1550–1557.
- [29] J.-Y. Chueh, M. C. Papaefthymiou, and C. H. Ziesler, "Two-phase resonant clock distribution," in *Proceedings of the IEEE Computer Society Annual Symposium* on VLSI (ISVLSI), May 2005, pp. 65–70.
- [30] B. Taskin, J. DeMaio, O. Farell, M. Hazeltine, and R. Ketner, "Custom topology rotary clock router with tree subnetworks," ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 14, no. 3, pp. 44:1–44:14, May 2009.
- [31] J. P. Fishburn, "Clock skew optimization," *IEEE Transactions on Computers*, vol. C-39, no. 7, pp. 945–951, Jul. 1990.
- [32] I. S. Kourtev and E. G. Friedman, *Timing Optimization Through Clock Skew Scheduling*, 1<sup>st</sup> ed. Springer, Feb. 2000.
- [33] S. Held, B. Korte, J. Massberg, M. Ringe, and J. Vygen, "Clock scheduling and clocktree construction for high performance asics," in *Proceedings of* the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2003, pp. 232–239.

- [34] B. Taskin and I. S. Kourtev, "Linearization of the timing analysis and optimization of level-sensitive digital synchronous circuits," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 12, no. 1, pp. 12–27, Jan. 2004.
- [35] V. Honkote and B. Taskin, "Maze router based scheme for rotary clock router," in Proceedings of the IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), Aug. 2008, pp. 442–445.
- [36] —, "Custom rotary clock router," in *Proceedings of the IEEE International* Conference on Computer Design (ICCD), Oct. 2008, pp. 114–119.
- [37] —, "Zero clock skew synchronization with rotary clocking technology," in Proceedings of IEEE International Symposium on Quality of Electronic Design (ISQED), Mar. 2009, pp. 588–593.
- [38] —, "Skew analysis and bounded skew constraint methodology for rotary clocking technology," in *Proceedings of the IEEE International Symposium on Quality Electronic Design (ISQED)*, Mar. 2010, pp. 413–417.
- [39] —, "Analysis, design and simulation of capacitive load balanced rotary oscillatory array," in *Proceedings of the IEEE International Conference on VLSI Design (VLSID)*, Jan. 2010, pp. 218–223.
- [40] —, "Skew-aware capacitive load balancing for low-power zero clock skew rotary oscillatory array," (in review).
- [41] —, "Design automation scheme for wirelength analysis of resonant clocking technologies," in Proceedings of IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), Aug. 2009, pp. 1147–1150.
- [42] —, "Capacitive load balancing for mobius implementation of standing wave oscillator," in *Proceedings of IEEE International Midwest Symposium on Circuits* and Systems (MWSCAS), Aug. 2009, pp. 232–235.
- [43] —, "Skew analysis and design methodologies for improved performance of resonant clocking," in *Proceedings of IEEE International SoC Design Conference (ISOCC)*, Nov. 2009, pp. 165–168.
- [44] —, "Peec based parasitic modeling for power analysis on custom rotary rings," in Proceedings of IEEE International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2010 (to appear).
- [45] V. Honkote, A. More, and B. Taskin, "3-D interconnect parasitic modeling and power analysis for rotary clocking," (in review).
- [46] —, "3-D based parasitic modeling for rotary interconnects," (in review).

- [47] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3<sup>rd</sup> ed. The MIT Press, Sep. 2009.
- [48] C. Y. Lee, "An algorithm for path connections and its applications," in IRE Transactions on Electronic Computers, Sep. 1961, pp. 346–365.
- [49] S.-C. Fang and S. Puthenpura, *Linear Optimization and Extensions: Theory and Algorithms*. Prentice Hall, Feb. 1993.
- [50] W. L. Winston, Operations Research Applications and Algorithms, 3<sup>rd</sup> ed. Wadsworth Publishing Company, Jan. 1997.
- [51] R. E. Bixby, C. M. Mczeal, and M. W. P. Savelsbergh, "Cplex optimization, inc."
- [52] E. Bogatin, Signal Integrity Simplified. Prentice Hall, 2004.
- [53] T. C. Edwards and M. B. Steer, Foundations of Interconnect and Microstrip Design. Wiley, 2004.
- [54] J. P. Uyemura, Introduction to VLSI Circuits and Systems. John Wiley & Sons, Inc., 2002.
- [55] D. A. Pucknell and K. Eshraghian, *Basic VLSI Design*. Prentice Hall, 1994.
- [56] HSPICE Signal Integrity User Guide, Synopsys, 2009.
- [57] Synopsys Online Documentation, Synopsys, 2002.
- [58] High Frequency Structure Simulator: User's Guide, 10th ed., Ansoft Corporation, Jun. 2005.
- [59] B. Stroustrup, The C++ Programming Language,  $3^{rd}$  ed. Addison-Wesley, Jun. 1997.
- [60] A. E. Ruehli, "Equivalent circuit models for three-dimensional multiconductor systems," *IEEE Transactions on Microwave Theory and Techniques*, vol. 22, no. 3, pp. 216–221, Mar. 1974.
- [61] —, "Inductance calculations in a complex integrated circuit environment," *IBM Journal of Research and Development*, pp. 470–481, Sep., 1972.
- [62] A. E. Ruehli and H. Heeb, "Circuit models for three-dimensional geometries including dielectrics," *IEEE Transactions on Microwave Theory and Techniques*, vol. 40, no. 7, pp. 1507–1516, Jul., 1992.
- [63] S. C. Chan, K. L. Shepard, and P. J. Restle, "1.1 to 1.6ghz distributed differential oscillator global clock network," in *Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC)*, Feb. 2005, pp. 518–519.

- [64] —, "Distributed differential oscillators for global clock networks," *IEEE Jour*nal of Solid-State Circuits, vol. 41, no. 9, pp. 2083–2094, Sep. 2006.
- [65] L. Divina and Z. Skvor, "The distributed oscillator at 4 ghz," *IEEE Transactions on Microwave Theory and Techniques*, vol. 46, no. 12, pp. 2240–2243, Dec. 1998.
- [66] W. F. Andress and D. Ham, "Standing wave oscillators utilizing wave-adaptive tapered transmission lines," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 638–651, Mar. 2005.
- [67] J. Wood, "Electronic circuitry," United States Patent Application Number 20030128075, Jul. 2003.
- [68] J. S. Denker, "A review of adiabatic computing," in *IEEE Symposium on Low Power Electronics (ISLPED)*, Oct 1994, pp. 94–97.
- [69] K. Suhwan and M. C. Papaefthymiou, "Single-phase source-coupled adiabatic logic," in *International Symposium on Low Power Electronics and De*sign (ISLPED), 1997, pp. 97–99.
- [70] B. Taskin, J. Wood, and I. S. Kourtev, "Timing-driven physical design for VLSI circuits using resonant rotary clocking," in *Proceedings of the IEEE International Midwest Symposium on Circuits and Systems (MWSCAS)*, Aug. 2006, pp. 261– 265.
- [71] G. Venkataraman, J. Hu, F. Liu, and C. N. Sze, "Integrated placement and skew optimization for rotary clocking," in *Proceedings of the Design, Automation and Test in Europe (DATE)*, Mar. 2006, pp. 756–761.
- [72] Z. Yu and X. Liu, "Design of rotary clock based circuits," in Proceedings of the IEEE/ACM Design Automation Conference (DAC), Jun. 2007, pp. 43–48.
- [73] B. Taskin and I. Kourtev, "A timing optimization method based on clock skew scheduling and partitioning in a parallel computing environment," in *Proceedings* of the IEEE International Midwest Symposium on Circuits and Systems (MWS-CAS), Aug. 2006, pp. 486–490.
- [74] C. Zhuo, H. Zhang, R. Samanta, J. Hu, and K. Chen, "Modeling, optimization and control of rotary traveling-wave oscillator," in *Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD)*, Nov. 2007, pp. 476–480.
- [75] I. S. Kourtev, B. Taskin, and E. G. Friedman, *Timing Optimization Through Clock Skew Scheduling*, 1<sup>st</sup> ed. Springer, Nov. 2009.
- [76] G. D. Mercey, "A 18GHz rotary traveling wave VCO in CMOS with I/Q outputs," in *Proceedings of the European Solid-State Circuits Conference (ESS-CIRC)*, Sep. 2003, pp. 489–492.
- [77] Z. Yu and X. Liu, "Low-power rotary clock array design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 15, no. 1, pp. 5–12, Jan. 2007.
- [78] —, "Implementing multiphase resonant clocking on a finite-impulse response filter," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 17, no. 11, pp. 1593–1601, Nov. 2009.
- [79] W. C. Elmore, "The transient response of damped linear networks with particular regard to wideband amplifiers," *Journal of Applied Physics*, vol. 19, pp. 55–63, Jan. 1948.
- [80] J. P. Fishburn, "Clock skew optimization," *IEEE Transactions on Computers*, vol. 39, no. 7, pp. 945–951, Jul. 1998.
- [81] J. Lu, V. Honkote, X. Chen, and B. Taskin, "Steiner tree based rotary clock routing," (in review).
- [82] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, "Multilevel hypergraph partitioning: Applications in VLSI domain," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 7, no. 1, pp. 69–79, Mar. 1999.
- [83] J. Cong and C.-K. Koh, "Minimum-cost bounded-skew clock routing," in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), vol. 1, Apr.-May 1995, pp. 215–218.
- [84] R. Chaturverdi and J. Ju, "An efficient merging scheme for prescribed skew clock routing," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 13, no. 6, pp. 750–754, Jun. 2005.
- [85] M. Edahiro, "A clustering-based optimization algorithm in zero-skew routings," in *Proceedings of the ACM/IEEE Design Automation Conference (DAC)*, Jun. 1993, pp. 612–616.
- [86] D. W. Pentico, "Assignment problems: A golden anniversary survey," European Journal of Operational Research, vol. 176, pp. 774–794, Jan. 2007.
- [87] P. E. Black, "Munkres' assignment algorithm," Dictionary of Algorithms and Data Structures, May 2006.
- [88] R. E. Burkard, M. Dell'Amico, and S. Martello, Assignment Problems, 1st ed. SIAM, 2009.
- [89] R. E. Burkard and E. Cela, "Linear assignment problems and extensions," 1998.
- [90] "Semiconductor Industry Association, International Technology Roadmap for Semiconductors (ITRS)," 2007, http://public.itrs.net/.

- [91] R. S. Tsay, "An exact zero-skew clock routing algorithm," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 12, no. 2, pp. 242–249, Feb. 1993.
- [92] F. W. Grover, *Inductance Calculations:Working Formulas and Tables*. Instrument Society of America, 1973.

## Vita

Vinayak Honkote was born in Kumta, North Kanara, Karnataka, India. He received a Bachelor's degree in Electronics and Communication Engineering from Bangalore Institute of Technology, India in 2003. He joined the Department of Electrical and Computer Engineering at Drexel University in 2004 and received a Master's degree in Electrical Engineering in 2006. His Ph.D. work at Drexel University is focused on design automation and analysis of resonant clocking technologies. His research interests include clock network design, VLSI physical design, EDA for VLSI, Post-CMOS interconnects, emerging technologies including QCA and Nano-CMOS.

Vinayak is a recipient of the "Richard A. Newton Graduate Scholarship" awarded by the Design Automation Conference (DAC) committee in 2007. He is one of the first recipients of the "Nihat Bilgutay Fellowship" awarded by the Department of ECE at Drexel University in 2010. He has served as an adjunct primary instructor for the courses "Computer Structures" and "Design with Micro-controllers" at the Goodwin College of Professional Studies, Drexel University. He is a Freshman Design Fellow at the College of Engineering, Drexel University, during the academic year 2009-2010. He has represented Drexel in various programming and poster contests at DAC and ICCAD conferences. He is a member of IEEE and ACM.