Decreasing the Power-Clock Resonant Signal Central Voltage as a Mean for Power Reduction in Integrated Power and Clock Distribution Networks

Background/Objectives : Density, performance, and design complexity of integrated circuits are rapidly increasing speciﬁcally in 3-D integration where multi-plane synchronization is required. The power and clock distribution networks consume a large portion of the limited on-chip metal resources. In order to reduce the metal overhead associated with the power, global clock, and local clock distribution networks, the concept of an integrated power and clock distribution network (IPCDN) was investigated and correct functionality of combinational and sequential elements veriﬁed. This study discusses potential power savings in IPCDNs achieved by reducing the central voltage at which the signal oscillates. Methods/Statistical analysis: In this paper, an IPCDN with diﬀerential power-clock signals centered at half the supply voltage is proposed to further reduce the power consumption. The elements of the proposed scheme including the LC diﬀerential power-clock driver, clamping circuit, clock buﬀer, and voltage doubler have been simulated using Tanner 0.25 um CMOS technology at a frequency of 50 MHz and a supply voltage of 2.5 V. Findings: Simulation results indicate that the proposed scheme achieves 75.32% and 76.47% power reduction in the LC diﬀerential power-clock driver and clock buﬀer, respectively. The eﬀects of process, voltage supply, and temperature (PVT) variations on the proposed scheme were also investigated. Discussion: The IPCDN has a large capacitance and is heavily loaded, thus reducing the central voltage of the resonant sinusoidal signal ﬂowing in this network enables signiﬁcant reduction in power consumption. Novelty/Applications: The proposed scheme enables power reductions in the LC diﬀerential power clock driver and clock buﬀer. The eﬀects of process, voltage supply, and temperature (PVT) variations on all circuit elements of the proposed scheme was investigated. , at and low swing of decrease in driver strength compared to full-swing CDNs clock buffer order to extract full-swing clock which supplies clock port in sequential elements.


Introduction
The clock distribution network (CDN) distributes the clock signal which acts as a timing reference that controls and synchronizes the flow of data. The CDN operates at high frequencies, is heavily loaded, and has a high capacitance. The CDN in synchronous systems-on-chip (SOCs) and application-specific-integrated-circuits (ASICs) consumes a large amount of power compared to the whole system (1,2) . In high performance processors, more than 30% of total power is consumed in the CDN (3)(4)(5)(6)(7) . In the IBM POWER4 1.3 GHz microprocessor, around 70% of total power is dissipated in the CDN and latches (8) . Latest developments in 3-D integrated circuits where multi-plane synchronization is required, indicate that the power consumption and metal overhead associated with the CDN will remain at these high levels (9,10) .
Energy recovery resonant clocking is an appealing scheme to reduce the power consumption of the CDN. The resonant clocking scheme, unlike square-wave clocking, uses the capacitance of the clock network, an on-chip inductor, and a decoupling capacitance to generate a resonant sinusoidal clock. Resonant CDNs reduces power by recycling and transforming the energy between electrical and magnetic form in the capacitor and inductor, respectively. The LC resonant clock driver implemented on a 22 nm process node achieves 50% power reduction in the driver as compared to the non-resonant driver (11) . The modified Cell Broadband Engine Processor incorporating a large resonant global clock network demonstrates a 6-8 Watt power savings (12) . Measurement results using intermittent resonant clocking (IRC) show that resonant clocking reduces the clock power by 36% at 980 kHz compared to conventional non-resonant clocking (13) . Simulation results of 1024 clocked flip-flops through an H-tree clock network driven by a resonant clock generator indicate total power savings of up to 83% and a power reduction of 90% on the clock-tree as compared to the same implementation using conventional square-wave clocking (14) . The distributed model of a two-level resonant H-tree structure presented in (15) exhibits 84% decrease in power dissipation as compared to a standard H-tree clock distribution network. AMD's x86-64 resonant clock design for a power-efficient high-volume microprocessor achieved a peak 25% power reduction in the global clock (16) . Varying the operating frequency, LC tank placement, and sizes in a 3-level resonant clock H-tree demonstrates that around 23% power reduction is achieved when the LC tank is placed on the second level of the clock network (17) . Implementing compensation capacitors to reduce the overhead of the on-chip inductor and capacitor resources, resulted in a 12% reduction in passive device area while still achieving 49.9% power savings as compared to traditional square-wave clock (18) . In (19) , a modeling and optimization method for resonant clocking using the mesh clock structure was proposed since the mesh clock structure has a higher power consumption as compared to H-tree. The experiment demonstrated that the resonant clock mesh structure can save more than 80% of power consumption compared to the same mesh structure without using the resonant LC tanks. A successful experiment on a test-chip in 130-nm CMOS technology at 1.56 GHz LC resonant clock directly driving 2x896 flip-flops illustrates that resonant clocking results in 57% lower clock power and 15-30% lower total chip power as compared to conventional clocking (20) .
Various schemes have been proposed in the literature to enable further power reduction in the resonant clock distribution network. Dual-edge triggered flip-flops operating with a resonant sinusoidal clock signal can achieve up to 58% reduction in power consumption (21) . Clock gating has been used to reduce the resonant clock power consumption (22,23) . However, the global clock distribution network driving the clock gates still consumes a large amount of total power (24,25) . A resonant converter with voltage doubler rectifier for wide output voltage with soft switching characteristics is presented in (26) . A full-bridge rectifier and voltage-doubler are employed for low-voltage and high-voltage output applications. Quasi-resonant clocking with voltagefrequency scalable resonant clocking system was proposed in (27) . The proposed scheme uses a timing control module to control the assertion/de-assertion of the signals generated to ensure efficient operation. A two-phase synthesis algorithm for resonant clock networks supporting dynamic voltage/frequency scaling is discussed in (28) . The first phase in the algorithm is the allocation and placement of the inductor and the second phase is the resizing of the driving buffers. Simulation results show that resonant clock networks synthesized using the proposed algorithm achieve 17% reduction in power. In (29) , a quadrature resonant clock generator with tuning capacitors and amplitude control feedback loop is presented. The proposed clock generator enables 20 to 25% reduction in power as compared to conventional CMOS clock driver. A resonant clock driver for low-power and low-voltage for IoT device was simulated in (30) . The proposed clock driver operates under a low supply voltage and a voltage doubler to reduce the on-resistance of the NMOS transistor.
Integrated circuits (ICs) density, performance, and design complexity are increasing specifically in 3-D integration (31)(32)(33) , which emphasizes the need for new techniques to supply the power, ground, and clock signals. A large portion of the limited onchip metal resources are consumed by the power and clock distribution networks (34) . A novel 3D Through-Silicon-Via (TSV) based capacitor is investigated in (35) for LC resonant clocking in 3D ICs. Replacing conventional inductor and capacitor to TSV based structures achieves a 2.2× and 16.3× reduction in area, respectively. A new scheme to use idle TSVs to form inductors in LC resonant clock for 3D ICs to reduce the power consumption in the CDN was proposed in (36) . Experimental results on industrial designs show that the power consumption is reduced by up to 47.9% compared to square-wave conventional CDNs.
https://www.indjst.org/ A globally integrated power and clock (GIPAC) distribution network was proposed in (37) as a means of eliminating the on-chip global clock distribution network. The power and clock signals are integrated in the GIPAC and then separated in the local power and local clock networks using a low-pass and high-pass filters, respectively. The proposed approach does not eliminate the need for the local clock distribution network. The authors of this paper had proposed the integration of the power and clock distribution networks into one network called the Integrated Power and Clock Distribution Network (IPCDN) (38) . The proposed IPCDN combines the power and clock networks, thus reduces metal requirements at higher levels and decreases routing complexity. This approach is most suitable for 3D integrated ICs since it significantly reduces the routing complexity of the clock and power distribution networks and reduces the demand on the limited on-chip metal resources. The IPCDN distributes a differential resonant sinusoidal power-clock signal, P wr _C lk , centered at a DC offset that is equal to the supply voltage, V DD . The resonant differential power-clock signals are generated using a differential LC P wr _C lk driver. These signals are connected to the V DD port of sequential and combinational circuits. The low voltage swing of the differential P wr _C lk signals leads to substantial decrease in the required driver strength as compared to full-swing resonant CDNs (39) . A clock buffer is needed in the IPCDN in order to extract a full-swing clock signal which supplies the clock port in sequential elements.
The IPCDN has a large capacitance, thus reducing the central voltage of the resonant sinusoidal signal flowing in the IPCDN enables significant reduction in power consumption. In this paper, the authors investigate the power savings in the IPCDN achieved by reducing the central offset voltage at which the P wr _C lk signal oscillates. In the proposed scheme the central offset voltage is reduced to V DD /2 as presented in section 2. Simulation results and analysis are discussed in section 3 and the conclusion of the paper is given in section 4.

Proposed IPCDN with Differential Power-Clock Signals Centered at Half the Supply Voltage
The IPCDN offers several advantages as compared to traditional CDNs. Combining the clock and power networks reduces routing complexity and metal overhead. In addition, the reduced swing of the P wr _C lk signals decrease the required LC driver strength. In this paper, we investigate the feasibility of reducing the power consumption of the IPCDN by decreasing the DC voltage at which the power-clock signals oscillate from V DD to V DD /2. Figure 1 illustrates the IPCDN with power-clock resonant signals centered at high DC voltage, i.e., V DD . The generated P wr _C lk signals in this scheme will be referred to as H DC _P wr _C lk signals. As shown in Figure 1, the H DC _P wr _C lk signals generated by the differential LC driver are connected directly to the V DD ports of sequential and combinational circuits. The full-swing clock signal, F swing _C lk , extracted by the clock buffer, feeds the clock port in sequential circuits. The proposed IPCDN with P wr _C lk signals centered at V DD /2 is presented in Figure 2. The differential LC driver generates P wr _C lk signals that are centered at a lower DC voltage, thus refereed to from hereon as L DC _P wr _C lk . These signals will be distributed by the IPCDN to reduce the power consumption in this network. The L DC _P wr _C lk signals are then fed to a clamping circuit in order to restore the DC component back to V DD and generate the H DC _P wr _C lk signals needed to supply the power port in combinational and sequential circuits. The clock buffer generates a low-swing clock signal, L swing _C lk , with a peak voltage of V DD /2. A voltage doubler is used to generate a full-swing clock signal, F swing _C lk , to be connected to the clock port in sequential elements.
The generated H DC _P wr _C lk signal presented in Figures 1 and 2 is given by the following equation: The equation for the L DC _P wr _C lk signal shown in Figure 2 is given by: Where V OH and V OL are the highest and lowest voltage levels of the generated P wr _C lk signal and f is the resonant frequency. The power dissipation of the IPCDN with a P wr _C lk signal centered at V DD is given by (16,40,41) : Where Q is the quality factor of the system, C is the capacitance of the IPCDN and f is the operating frequency. Similarly, the power dissipation of the IPCDN with a P wr _C lk signal centered at V DD /2 is given by: https://www.indjst.org/ Equations 3 and 4 demonstrate that the power consumption of the IPCDN is proportional to the square of the voltage at which the P wr _C lk resonant signal is centered. Reducing this voltage to half achieves a 75% reduction in the power consumption of the IPCDN. Given that the capacitance associated with this global network is high, significant savings in the power consumption can be achieved by the proposed approach. Additional area and power overhead are introduced on the other hand by the clamping circuit and the voltage doubler circuit required to generate the H DC _P wr _C lk and F swing _C lk signals, respectively.

Differential Pwr_Clk Driver with Reduced Central Voltage
The LC differential P wr _C lk driver used in the proposed scheme is shown in Figure 3. The V DD /2 node of the inductors determines the central voltage or the DC component around which the generated P wr _C lk signals, i.e., L DC _P wr _C lk + and L DC _P wr _C lk -oscillate. Reference pulses VREF1 and VREF2 feed the transistors MP1 and MP2 in order to pull-up the differential P wr _C lk signals to V OH . The outputs of the two inverters, alternately turn on transistors MN1 and MN2 to pulldown the differential P wr _C lk signals to V OL .

Clamping Circuit
The clamping circuit used in the proposed scheme ( Figure 2) is shown in Figure 4. The clamping circuit utilizes a diode connected transistor, MN1, with a suitable DC voltage, V1, and capacitor, C1, to pull up the central voltage of the P wr _C lk differential signals from V DD /2 to V DD .

Clock Buffer
The differential low-swing clock signals, L swing _C lk and L swing _C lkB are extracted from the L DC _P wr _C lk differential signals using the clock buffer presented in Figure 5. The buffer utilizes two cross-coupled CMOS inverters. The first inverter is implemented by MP2 and MN2. The second one is implemented by MP3 and MN3. The gate of each inverter is connected https://www.indjst.org/ to the L DC _P wr _C lk signals using pass transistors MN5 and MN6. The voltage difference between nodes X and Y shown in the figure is amplified and fed to the two inverters at the output stage implemented by transistors MP1/MN1 and MP4/MN4, respectively, to generate the L swing _C lk and L swing _C lkB signals with sharp edges.

Voltage Doubler
The voltage doubler proposed in (42,43) and shown in Figure 6 is used to generate full-swing differential clock signals, F swing _C lk from the low-swing clock signals, L swing _C lk . The full-swing clock signals are connected to the clock port in sequential elements.

Simulation Results and Analysis
All circuits discussed in the previous section were implemented using Tanner 0.25 µm CMOS technology and simulated with a supply voltage, V DD of 2.5 V. The network capacitance, C was chosen to be equal to 8 pF, for which a 1.267 µH is needed in order to generate a P wr _C lk signal with a resonant frequency of 50 MHz. https://www.indjst.org/

IPCDN with Pwr_Clk Signals Centered at VDD
The LC differential P wr _C lk driver with V DD as the central voltage in the scheme shown in Figure 1 was simulated. In the simulation, the width of the used NMOS devices was 0.3 µm and the width of the used PMOS devices was 0.975 µm. This sizing of the transistors was selected to ensure that the inverter in the LC differential P wr _C lk driver is a matched inverter. As for the clock buffer, the width of the used PMOS devices and NMOS devices in the simulation was 0.3 µm. The differential H DC _P wr _C lk signals generated by the LC driver and resonating around a central voltage of 2.5 V are presented in Figure 7. The outputs of the clock buffer, i.e., F swing _C lk and F swing _C lkB are shown in Figure 8.

Proposed IPCDN with P wr_ C lk Signals Centered at V DD /2
The LC differential P wr _C lk driver with V DD /2 as the central voltage in the scheme shown in Figure 3 was simulated. In the simulation, the width of the used NMOS devices was 0.3 µm and the width of the used PMOS devices was 2.175 µm. This sizing of the transistors was selected to ensure that the inverter in the LC differential P wr _C lk driver is a matched inverter. The differential L DC _P wr _C lk signals generated by the LC driver and resonating around a central voltage of 1.25 V are presented in Figure 9. The clamping circuit was simulated using a 100 kΩ resistance, a 1.2 nF capacitance, and a 3.625 V voltage source. The diode connected transistor (MN1 in Figure 4) has a width of 0.5 um. The H DC _P wr _C lk signals generated by the clamping circuit are shown in Figure 9.
The clock buffer was simulated with NMOS and PMOS devices sized carefully to achieve correct functionality. Figure 11 presents the outputs of the clock buffer, i.e., L swing _C lk and L swing _C lkB . https://www.indjst.org/ The low-swing differential clock signals generated by the clock buffer https://www.indjst.org/ The voltage doubler was simulated using a 1 pF capacitance, and a 1.35 V voltage source. The width of the used NMOS devices is 0.5 µm and the width of the used PMOS devices is 0.3 µm. The high swing clock signals generated by the voltage doubler are shown in Figure 12 .

Power Consumption Analysis
The power consumption of the LC differential H DC _P wr _C lk driver, L DC _P wr _C lk driver, and clock buffer in each scheme are presented in Table 1 . Centering the P wr _C lk signals at V DD /2 in the proposed scheme reduces the LC differential P wr _C lk driver power consumption by 75.32% which is consistent with what was previously calculated based on a comparison of equations (3) and (4). Furthermore, the proposed scheme enables a 76.47% reduction in the power of the clock buffer. The clamping circuit and the voltage doubler circuit used by the proposed scheme add additional area and power overhead. The topology of the IPCDN and the number of clamping circuits and voltage doubler circuits needed to drive every section of the IPCDN need to be carefully examined in order to optimize power savings. The area and power overhead introduced by the clamping circuit and the voltage doubler circuit can be reduced by sharing these circuits between several combinational and sequential elements.

Process, Supply Voltage, and Temperature (PVT) Variation Effects Analysis
This section discusses the effects of process, supply voltage, and temperature variations (PVT) on the proposed scheme.
Simulations were conducted for all of the circuits in the proposed scheme, i.e., the LC differential P wr _C lk driver, the clamping circuit, the clock buffer, and the voltage doubler using minimum sized transistors. The supply voltage was varied by ±10% of V DD /2 (44) . Figures 13 and 14 present the outputs of the LC clock driver and the clock buffer, i.e., the L DC _P wr _C lk and L swing _C lk signals, respectively under supply voltage variations. The results obtained illustrate that the clock buffer in the proposed scheme is susceptible to drops in the supply voltage. Based on these results, variations in the supply voltage should be carefully considered and tested to ensure correct functionality of IPCDNs. The proposed IPCDN with low central voltage was simulated at 0 • C, 27 • C, and 100 • C (44) . The simulation results are shown in Figures 15 and 16 . The simulation results demonstrate correct functionality of the proposed scheme at low temperature, and at high temperature. https://www.indjst.org/

Conclusion
In this paper, an IPCDN with differential power-clock signals centered at half the supply voltage as a mean to enable power reduction in this network is proposed. All of the components associated with the proposed network including the LC differential power-clock driver, the clock buffer, the clamping circuit, and the voltage doubler circuit were simulated using Tanner 0.25 µm technology at a frequency of 50 MHz. Simulation results verify correct functionality of all the components under the proposed scheme. Power reductions of 75.32% and 76.47% were achieved in the LC differential power-clock driver and clock buffer, respectively. The effects of process, voltage supply, and temperature (PVT) variations on the proposed scheme were also investigated. The results show that the components of the proposed scheme are immune to process, supply voltage, and temperature variations, except for the clock buffer which is susceptible to power supply variations. The clamping circuit and the voltage doubler circuit used by the proposed scheme add additional area and power overhead which can be reduced by sharing these circuits between several combinational and sequential elements. Future work will concentrate on implementing this scheme at advanced technology nodes with higher frequencies and lower supply voltage. Alternative designs will be explored to reduce circuit complexity and additional overhead of the components used.