Low power add-one circuit IPGL based high speed square root carry select adder

Background: An adder is the basic building block of any circuitry. Most ripple carry adders suffer from carry rippling which constrains its performance due to increased delay though they occupy less area. Objectives: To design and implement a high speed adder to overcome the carry rippling, which should consume less power and also operate at higher frequency. Method: Squareroot CSLA architecture is designed by replacing ripple carry adder with of AddOne Circuit (AOC) tominimize the area and carry rippling delay. Improved PassGate Adiabatic Logic (IPGL) is incorporated in the proposed SCLSA to reduce power and to increase frequency of operation. Cadence Virtuoso and Spectre is used to design and simulate the adder circuits in CMOS 180nm technology. Findings: We proposed SCSLA adder, which consumed 89% lesser power compared to the reference architecture at 400 MHz operating frequency with a power saving factor of 7.3. Results were verified by simulating up to 1 GHz frequency. Novelty: Incorporation of AOC in the design of square-root CSLA with adiabatic logic (IPGL) incorporated results in lesser power consumption and also adder operates in the higher frequency (GHz).


Introduction
The trend towards low-power IC design is driven by the increasing demand for long-life portable devices. Carry look-ahead adder (CLA), carry skip/bypass adder, conditional sum adder, carry select adder (CSLA), parallel prefix and other adder architectures have been proposed to mitigate rippling of carry in ripple carry adder (RCA). CSLA need less time to provide the carry output (1) . The sum generated in a CSLA is by independent RCAs (for Cin = '1' and '0'). CSLA can be either Linear or Square Root (SQRT) depending on the block length of RCA (2) . Various-bit SQRT CSLA architectures are discussed in (3) . Linear CSLA uses two RCAs to compute the output but it isn't power and area efficient. Hence, to decrease the area occupied by the CSLA, a Binary to Excess-1 Converter (BEC) can be considered instead of an RCA. A N+1-bit BEC is used instead of the N-bit RCA in the modified architecture to decrease area and power consumption. Static CMOS based CLA with modified circuits for obtaining propagate and generate https://www.indjst.org/ terms is discussed in (4) .
In adiabatic logic (AL) circuits during switching, the charge is recycled in form of trapezoidal voltage (clock) (5) . Nonadiabatic as well as adiabatic losses are incurred in adiabatic circuits. The former being independent of frequency of operation cannot be minimized. However, the latter being frequency dependent can be reduced to an extent. Adiabatic circuits can be Partial/Quasi Adiabatic Circuits (QAC) or Fully Adiabatic Circuits. QAC are more preferred. Among QAC families, Improved Pass-Gate adiabatic charge-recovery Logic (IPGL) is better for energy recovery and also preferred for higher frequency of operation. IPGL requires inputs at the charge phase of the clock to obtain differential outputs. There are four phase power clocking mechanism in adiabatic circuits namely evaluate, hold, recover, and wait (6) . A basic IPGL gate is shown in (7) . The gate has two paths: charging and recovery paths. The charging path consists of the logic block, F and the F' is complementary logic block which are parallel to a pair of cross-coupled PMOS transistors.
The losses incurred in adiabatic circuits are discussed in (6) . If R represents the on resistance of the charging path, T time for charging and discharging, C L load capacitance and V supply voltage then, the energy dissipation is given by equation (1) (5) .
A N+1-bit BEC is used instead of the N-bit RCA in the modified architecture to decrease area and power consumption (8) . Add-one circuit (AOC) works on the principle of "first" zero detection logic. After first zero is detected, it generates sum S 1 (with Cin= '1') by complementing each bit in S 0 (with Cin= '0'). If no zero is encountered all the bits are complimented and a carry out of '1' is generated. A new add-one scheme is discussed in (9) , where the inverters can be reduced. It doesn't incur speed penalty. Power dissipation is reduced due to shorter chain and minimization of toggling of internal signal. Final sum and carry are obtained by using NAND gates and a Multiplexer (MUX). Figure 1 depicts the block level representation of realizing a 1-bit adder using the AOC architecture. Since switching power is directly related to the square of supply voltage, minimizing the voltage would result in reducing the energy consumption. Adder is one of the most essential blocks needed to implement any arithmetic circuitry. The power consumption is very critical issue in any adder circuits along with the speed. Many adders were proposed in literature, but still consume more power. The literature provides conclusive results that adiabatic logic helps in realizing low-power circuits (IPGL). Hence, adders can be designed for high-speed operation with low-power consumption incorporating adiabatic logic. Thus, the aim of this study is to propose a modified square root CSLA with AOC and BEC in adiabatic logic (IPGL) which can be used for high speed and less power applications. The rest of this paper is organized as follows: Design methodology of CMOS and adiabatic adders are discussed in Section 2. Section 3 presents the obtained results in terms of power and delay, and finally conclusions derived with future scope in Section 4. https://www.indjst.org/

Design and implementation of adders
The circuits are designed and simulated using Cadence Virtuoso on CMOS 180 nm technology at V DD = 1.8 V. Since PMOS is slower than NMOS, they have different rise and fall times. CMOS circuits to have symmetrical rise and fall times, sizing of the transistors is very important. DC analysis of static CMOS inverter is performed. Based on DC analysis, a transistor sizing (r) of 2.75 is used for the circuits. Lengths of both the MOS transistors were kept the same. CSLA can be constructed using RCAs and BECs. to decrease the area occupied by the CSLA, a Binary to Excess-1 Converter (BEC) can be considered instead of an RCA. A N+1-bit BEC is used instead of the N-bit RCA in the modified architecture to decrease area and power consumption (8) . A BEC is realized using AND, XOR and NOT gates. A 4-bit BEC circuit is illustrated in Figure 2. A BEC minimizes the delay incurred with the use of RCA by generating the output carry with less delay.

Linear CSLA with RCA
A 1-bit CMOS full adder is realized and cascaded to obtain 16-bit adder. The Cadence Virtuoso implementation of the static CMOS based 16-bit linear CSLA is as depicted in Figure 3. The 16-bit linear CSLA is realise with RCA and multiplexers to select the sum and carry of the 4-bit adders.

Square root CSLA with BEC
The linear CSLA may produce wrong outputs initially because of miss match in delays. As first stage carry output (MUX) come one cycle later than the inputs of second stage. To overcome miss match in the delay at each stage, the SCSLA architecture is used (10) . Figure 4 illustrates the Cadence implementation of the 16-bit SCSLA with RCA and BEC.

SCSLA with BEC in IPGL
PFAL doesn't produce correct outputs at high frequencies. This can be overcome using IPGL (11) . Low power VLSI circuits using two phase adiabatic dynamic logic are discussed in (12) . An IPGL based 16-bit SCSLA with BEC is shown in Figure 5. Extra buffers are required to incorporate the clocking mechanism (13) . Various adiabatic logic families as well as comparisons between ECRL and PFAL have been presented in (6) . https://www.indjst.org/

SCSLA with AOC
The modified add-one scheme discussed in (9) is depicted in Figure 6 which uses buffers with only one inverter. Internal nodes of the PMOS-NMOS chain generate the compliment of the sum bit. Each pair acts as an inverter before the first zero is detected. When the first zero is detected, it acts as a multiplexer and the sum is selected. A 1-bit full adder based on the modified Add-one scheme was constructed in ALs using a 1-bit RCA, NAND gates, buffers and multiplexers as shown in Figure 7. The constructed 1-bit full adder circuit is cascaded to form 16-bit full adders, which were then used to construct the SCSLA as shown in the Figure 8. The 16-bit IPGL based SCSLA circuit with AOC was implemented on Cadence Virtuoso IDE wherein its functionality was verified (Figure is not included, as size of the image is large).

Results and Discussion
The simulation results of the adder circuits are presented in this section. We first compared the performance of different adiabatic families with respect to the number of transistor, power consumed, and operating frequency of an inverter. Delay analysis is then performed for different adder s. This was followed by performing a power analysis. Finally, calculations regarding the power saving factor were analyzed using Cadence Virtuoso and Spectre.
The IPGL based circuits operate at higher frequencies when compared to ECRL and PFAL. To illustrate, CMOS inverter is designed and simulated in static CMOS and in other AL circuits at 20 MHz, with load capacitance (C L ) of 10 pF, at V DD of 1.8 V. Power consumption values of different adiabatic inverters are shown in Figure 9. It can be inferred that inverter realized in static CMOS logic consumes the most power as expected. PFAL circuits outperformed other logic. But except IPGL, other aidabatic logics fail to provide proper output at GHz frequencies (11) . Hence, for high frequency operation IPGL is preferred over PFAL (14) .  Table 1 lists the power consumed by CMOS based RCA for different (4-, 8-and 16-bit) sizes operating on various frequency, implemented in Cadence Virtuoso. From the Table 1, it can be inferred that power consumed by RCA increases with both bit sizes and frequency. This necessitates the use of adiabatic circuits to reduce the power consumption of a circuit at high frequency. Delay analysis of RCA, linear CSLA, and SCSLA is performed to choose the fastest adder for further implementation in adiabatic logic. In SCSLA the delay of each stage were matched by varying the sizes of adders in each stage. From Figure 10 it can be analyzed that, a 16-bit SCSLA reduced the delay by 80% when compared with 16-bit RCA. SCSLA minimizes the delay during the addition. Delay of square root CSLA is also less compared to CSLA, hence preferred. A 16-bit static CMOS based linear and square root CSLA with RCA and BEC architecture is constructed. The 16-bit SCSLA realized with BECs decreases power consumption when compared to linear CSLA implemented with RCAs. Power saving of almost 25 % is achieved when SCSLA is designed with BEC over RCA. Hence, for adiabatic (IPGL) based design, SCSLA with BEC is preferred. Power consumption of 16-bit CMOS based CSLA considering load capacitance of 10 pF with operating frequency of 200 kHz are listed in the Table 2 (11) . A 16-bit square root CSLA with BEC is constructed in both static CMOS logic and IPGL, and simulated for multiple operating frequencies (kHz to GHz range).
https://www.indjst.org/  The power consumed by 16-bit SCSLA at various frequencies is depicted in Table 3. The IPGL based 16-bit SCSLA consumed less power compared to static CMOS based logic at various frequencies. Hence, for low power and high frequency applications, IPGL based circuits are preferred (11) . Area analysis of both static CMOS and IPGL based CSLA is tabulated in Table 4. It can be inferred that 16-bit SCSLA adder with AOC requires fewer transistors compared to 16-bit SCSLA adder with RCA or BEC. A 16-bit IPGL based SCSLA when realized with BEC circuit requires more area (transistors). To reduce the area in IPGL based adder, BEC circuit is replaced by AOC in SCSLA. IPGL based 16-bit SCSLA realized using AOC requires 21.72% lesser area when compared to IPGL based 16-bit SCSLA realized with BEC and 32.36% lesser area when compared to IPGL based 16-bit SCSLA realized with RCA. Hence, AOC based IPGL is preferred over both RCA and BEC in design of SCSLA.
A 16-bit static CMOS based SCSLA realized with RCA and BEC is compared with (1) , the power values are listed in Table 5. The 16-bit SCSLA with AOC's power values are not listed in (1) . Power values infer that proposed adder consumes less power. A 16-bit static CMOS based SCSLA realized with AOC (9) is compared with proposed 16-bit IPGL based SCSLA with AOC (operating at frequency of 400 MHz) to assess the power saving. The power consumed by both the adders is listed in Table 6. It can be analyzed from the Table 6, that the proposed adder reduces the power at high frequency also.
Power analysis of SCSLA in CMOS logic and IPGL with BEC and AOC for various frequencies is depicted in Figure 11. The proposed SCSLA with AOC consumes very less power compared to BEC. For instance, at 100 MHz, IPGL based a SCSLA realized with AOC consumes 60.45% lesser power than the SCSLA with BEC. As frequency of operation increases, for instance at 1 GHz, power saving of 68% is obtained when the adder is realized in adiabatic (IPGL) logic. IPGL based adder consume less power compared to CMOS based adder. A power reduction of 86.76% is achieved by realizing the IPGL based SCSLA with AOC architecture, in comparison to SCSLA with BEC architecture in static CMOS logic at 1 GHz frequency. Hence, preferred over static CMOS and IPGL based adder with BEC. At 1 GHz, the power consumed by the AOC based circuit is nearly 60% lesser than the BEC circuit in IPGL. Hence, IPGL based 16-bit SCSLA with AOC is the preferred circuit in high frequency of operation for low power applications. Improvement in power saving can be analysed for IPGL based adder over static CMOS adder using power saving factor (PSF). PSF is defined as power consumed by a static CMOS circuit divided by the power consumed by the same circuit implemented in any adiabatic logic.
The PSF obtained by realizing a square root CSLA with BEC in both CMOS and IPGL is shown in Figure 12. PSF infers that the power consumed by IPGL based adder is less in higher frequencies. Figure 13 illustrates the PSF incurred by adopting the proposed 16-bit SCSLA with AOC in IPGL instead of the 16-bit SCSLA with BEC in static CMOS. Thus, from Figures 12 and 13, it can be inferred that, the 16-bit SCSLA implemented with AOC, realized in IPGL has higher PSF compared to 16-bit SCSLA realized with BEC. Hence, IPGL based (adiabatic) designs are preferred in low power and high frequency applications. The power consumption increases linearly for higher order bits. The proposed AOC removes the area overhead of SCSLA by replacing one of the RCA or BEC and also outperforms the SCSLAs in both area and power consumption.

Conclusions and Future Scope
In this study, a comparative analysis between linear and square root carry select adder is presented. High-speed and low power SCSLA with BEC and AOC is realized in IPGL and in static CMOS logic. SCSLA reduced the delay by 80% when compared to a ripple carry adder. The IPGL based SCSLA realized with BEC consumed almost 60% lesser power compared to CMOS based logic. At 1 GHz operating frequency, results inferred a PSF of 3.134 using IPGL based SCSLA realized with BEC. At 1 GHz the proposed IPGL based 16-bit SCSLA with AOC consumed 87 % lesser power when compared to the static CMOS based adder. A PSF of 8.241 was achieved at 1 GHz by implementing the 16-bit SCSLA with AOC. Thus, AOC based modified SCSLA architecture is preferred for both high-speed and low power signal processing applications. Hence, by adapting adiabatic logic, the power consumed by the circuits can be minimized with a trade-off in area. The designed adder can be used to construct low power higher order bit adders, multipliers, and MAC etc.