Pipelined and Wave Pipelined Approach Based Comparative Analysis for 16x16 Vedic Multiplier

Objectives: This work objective is to construct an FPGA-based 16x16 Vedic multiplier and assess the performance of the multiplier using three distinct architectures: pipeline, wave pipeline, and modiﬁed wave pipeline in terms of delay and clock skew. Methods: The 16 (cid:2) 16 Vedic multiplier was constructed and designed through four numbers of an 8x8 Vedic multiplier. For the 16x16 Vedic multiplier, the 3-stage pipeline and


Introduction
The Vedic multiplier is a specific multiplier utilized in rapidly developing technology, particularly in the vast fields of image processing, DSP (digital signal processing), and real-time applications due to its efficiency in computation time, area, latency, and speed.Recently, many approaches have been used to improve the efficiency of a Vedic multiplier in terms of less computation time, high speed, and minimum area.Using a 32-bit full adder and a 42-bit full adder to create a Vedic multiplier is one method (1) .In addition to consuming the maximum area, this type of technique causes delays as the multiplier and multiplicand bit sizes increase.Another method for developing a Vedic multiplier is a 4:2 compressor that utilizes multiplexers and exclusive OR logic.A significant route delay results from the cascading of two full adders with four EX-OR gates in this approach (2) .In any type of multiplier, an adder is the primary building element.The multiplier's total efficiency will be defined by the adder's performance.Ripple carry adders are a special type of adder used to design a Vedic multiplier (3) .The ripple carry adder does not permit simultaneous usage of all full adders.Every full adder has to wait for the carry bit from the adjacent full adder to become available.Propagation takes longer as a result.This causes the ripple carry adder to become extremely slow and leads to maximum computation time.Implementation of a Vedic multiplier using a low power 13T hybrid full adder is one of the techniques (4) .Although the performance of the hybrid logic types is promising, the majority of these adders have poor driving capabilities, and if the appropriately designed buffers are not included, their performance significantly deteriorates in the cascaded mode of operation.To enhance the performance of the Vedic multiplier, the encoding technique of Radix-2 and Radix-4 encoding is the best method (5) .The primary drawback of the Radix method is that it requires complex multiplication and complex addition.As the number of inputs increases, this leads to increased computation time.Choosing an adder is crucial in multiplier design since adder design consumes more space and consumes maximum path delay in the architecture, which is related to the multiplication process.Vedic multiplier design using basic adders, half adders, and full adders offered the longest path delay as well as more space (6) .Carry save adder, ripple carry adder, and carry skip adder are utilized for FPGA-based multiplier implementation (7) .These adders slow down the multiplication operation when the width of the inputs increases.
In this work, we presented a carry-look-ahead adder for the Vedic multiplication process, evaluated the multiplier performance with three different architecture pipelines, wave pipelines, and modified wave pipeline techniques, and investigated Vedic multipliers using these methods based on the Altera tools, Xilinx 12.1, and Xilinx ISE 14.2 synthesis reports.In terms of area and delay, the Vedic multiplier performance was improved by the modified wave pipeline approach.

Motivation
Most important DSP (Digital Signal Processing) algorithms, such as digital finite and infinite impulse response filters, DFT, FFT, convolution, and other methods, use multiply-accumulate computations (8) .For any DSP method, the number of multiplications is the primary factor that defines the entire processing time because the multiplication time is typically considerably longer than the addition time; hence, multiplier design and implementation are more concentrated in DSP techniques (9) .Applications for digital filtering use a variety of multipliers.Due to its performance in terms of calculation time, area, latency, and speed, the Vedic multiplier is a specific https://www.indjst.org/multiplier utilized in rapidly developing technology, particularly in the enormous arenas of image processing, DSP, and realtime applications.
Vedic mathematics is the foundation of the Vedic multiplier (8) .The 16 sutras of Urdhva Tiryagbhyam provide the foundation of Vedic mathematics.It denotes that multiplication is performed "vertically and crosswise" to use the fewest possible calculations, take up the least amount of space, and speed up computing (3) .This work focused on designing a Vedic multiplier with a size of 16×16 in a wave-pipelined approach, and its performance was improved through a modified wave-pipelined architecture in terms of delay and clock skew minimization.

Methodalogy
The main objective of this work is to apply three different approaches to improving the area and delay performance of the Vedic multiplier.The above goal is accomplished by the methods shown in Figure 1.

Experimental setup for Vibroarthrography signal acquisition
Vibroarthrography (VAG) means joining vibrations." It is the process of recording knee vibration sounds due to flexion (10) .The most often injured joint in the human body is the knee.There is a wide range of variation in the types of causes of knee joint injuries, which can be broadly categorized into three subgroups: autoimmune illnesses, cartilage deterioration caused over time, and trauma-related injuries.The VAG is the vibration signal that is recorded from a joint during movement of the joints.Normal joint surfaces are smooth and produce little or no sound, whereas joints affected by arthritis and other degenerative diseases, which may have suffered cartilage loss, produce a grinding sound.Detection of knee joint problems via analysis of VAG signals could help avoid unnecessary exploratory surgery and aid in the better selection of patients who may benefit from surgery.A normal VAG signal has a frequency range of 10-120 Hz; otherwise, it is called abnormal, and the person has knee joint diseases.The characteristics of Vibroarthrography signals are listed below: • Vibroarthrography signals are non-stationary because they may vary from one angular position to another articulation joint.• The generated VAG signal from leg movement contains background noise and random noise based on the sensor placed at the knee. https://www.indjst.org/ • Identification of amplitude and frequency range from the measured VAG signal indicates whether the knee is normal or abnormal.
The experimental setup for vibroarthrography signal acquisition is shown in Figure 2. The vibroarthrography signal is acquired by an accelerometer sensor, an Arduino board (plotted as a waveform), and PLX-DAQ (a discrete set of values), as shown in Figure 2.An accelerometer is one of the most common inertial sensors, a dynamic sensor capable of a vast range of sensing.Accelerometers are available that can measure acceleration on one, two, or three orthogonal axes.The output of the Arduino block is shown in Figure 3, where more vibrations occur during extension and flexion, and this waveform is converted to discrete values through the PLX-DAQ unit.The output of PLA-DAQ unit is connected to the input of 3-tap FIR filter.The performance of the Vedic multiplier is evaluated by integrating it into the FIR filter with three different approaches: the wave pipeline and the modified wave pipeline.

Wave Pipeline Vedic Multiplier
In VLSI design, pipeline architectures are required to improve throughput.Pipelining is a very popular concept and provides parallelism.Because of this, it will increase the computation rate or throughput of the system, but the area of the overall hardware module is linearly increased when the number of pipelined stages is increased (11) .In addition, setup time, hold time, and clock skew also increased since all the registers were clocked by the common clock signal.Therefore, the wave pipelining technique can be an alternate solution for reducing the area, delay, and clock skew.To achieve this parameter, in wave pipelining and modified wave pipeline approaches, the intermediate latches are removed between input and output registers (12) .The Vedic multiplier with a 16×16 wave-pipeline architecture is shown in Figure 4, where the 16×16 Vedic multiplier is developed by four 8×8 Vedic multipliers.Table 1 is the comparison performance of the Vedic multiplier with a size of 16×16 by pipeline and wave pipeline approaches.

Proposed architecture
Reducing the wave-pipeline Vedic multiplier's clock and delay is the goal of the "modified wave-pipeline" architecture that is being suggested.Figure 5 illustrates the updated wave pipeline technique architecture for the 16×16 Vedic multiplier.To balance https://www.indjst.org/or shorten the delay of the Vedic multiplier architecture, a unique register called the "wave pipeline register (known delay) is introduced between the multiplier and adder unit, as shown in Figure 5.

Delay Minimization
Introducing the known discrete delay in the hardware module is one of the best approaches to balancing the path between the combinational logic, which reduces the delay of the overall logic module.The overall delay gets varied based on inserting the known delay between the two logics in the overall logic module.The known delay element (wave pipeline register) is placed between the multiplier and adder unit because, according to the synthesis report given by Xilinx 12.1, the worst path delay occurs between the 8×8 Vedic multiplier and the ripple carry adder. https://www.indjst.org/

Clock skew Minimization
Clock skew is, in the simplest terms, the interval of time between the arrival of a clock signal's identical edge at the clock pin of the capture flop and launch flop.Throughout all phases of chip design, clock signals play a critical role in the design of VLSIbased synchronous logic circuits (13) .Verifications like static timing analysis and gate-level simulations that are all part of the simulation depend on these clock signals, simply because the state transition happens at a clock transition.The manner in which digital systems function is significantly influenced by their clocking techniques (14) .The fundamental block architecture for the wave pipeline gadgets, wherein the source and destination registers are synchronized by a common clock signal and intermediate latches are removed (9) .The logic depth-timing diagram in Figure 6 illustrates this fundamental idea of wave pipelining, in which the clock frequency is lowered as much as feasible when region R1 rises (15) .The discrepancy between the tmin and tmax data delays over the combinational chunk, propagating times, register setup time and hold time, and unintentional clock skew are the issues currently limiting the clock period.The following variables affect the clock period of the wave-pipelined system (13) : 1.The longest and smallest path delays, respectively.2. The difference in delay between T min and T max .

Clock skew (T skew ).
The overall clock signal of a wave-pipelined clock signal depends on the sum of setup time, hold time, skew time, and the difference in delay between T min and T max .The clock period of the wave pipeline technique is defined in Equation ( 1).
It is clear from Equation (1) that the clock period of a wave-pipelined system's is improved by the delay variance among the T max and T min paths, and that clock overhead consists of setup, hold, and skew times.In this work, the clock skew performance is focused.The clock skew is minimized by adjusting the setup time violation of the clock signal (16) .The clock skew has been developed since the setup time and hold time violations of the clock signal.
The clock skew has been developed since the setup time and hold time violations of the clock signal (17) .The wave pipeline will eliminate the hold time, which is a more serious issue.In this work, setup time violations are concentrated upon.The setup time is the minimum time essential to sample the input, so the input should be steady at that time.Hold time is described as the least quantity of period in time, and the information must be constant subsequently the active edge clock.
In this work, the clock skew performance of the Vedic Multiplier with size 16×16 using Wave Pipelined and Modified Wave Pipelined architectures is analyzed through the Altera Quartus-II Time Quest Timing Analyzer FPGA tool.The Modified Wave Pipelined 16×16 Vedic multiplier architecture gives better performance in terms of clock skew.This can be achieved by adjusting the setup time violation of the clock signal.  1 indicates the comparison results for the implementation of Vedic multipliers through different architecture pipelined, wave pipelined, and modified wave pipelined approaches.When comparing the three synthesis tools' performances, Altera's performance was worse compared to that of the Xilinx synthesis tools for both pipelined and wave-pipelined techniques.All three synthesis tools offered the greatest delay for the Vedic multiplier computation but occupied less area as compared to the pipeline approach over the wave pipeline approach.Hence, a novel method called modified wave pipelined is used with the Vedic multiplier to enhance its performance in terms of area and delay.The modified wave-pipelined approach outperformed the wave-pipelined approach in terms of area and delay, according to the Xilinx Synthesis Tool report.Based on the comparison, Table 1 concludes that the modified wave pipeline strategy performs better than the wave pipeline approach in terms of area and delay.

Delay Minimization
Table 2 summarizes the comparison between the existing method over the proposed method of wave pipeline and the modified wave pipeline Vedic multiplier performance in terms of delay and area.It can be concluded that the proposed modified wave pipeline approach for the Vedic multiplier gives improved delay performance as well as area compared to the existing method.

Clock Minimization: Wave Pipeline approach for 16×16 Vedic Multiplier
The detailed timing report of the Vedic Multiplier with a size of 16×16 using the Wave Pipelined technique is shown in Figure 7.
The timing analysis report was generated from the Altera Quartus Time Quest timing analyzer tool (17) .The first two waveforms, launching clock and latch clock, indicate the clock signal at the input wave pipelined register (source register) and the output wave pipelined register (destination register), respectively.A dark line indicates the clock edges of the launch and latch.The time it will take for data to go from the input wave pipelined register to the output wave pipelined register is displayed in the next two waveforms, called information arrival and data delay (17) .Observe that the input wave pipeline register's positive edge of the Launch Clock is where the data delay is monitored.The time at which the data should have reached its destination-that is, the Wave Pipeline output register to be accurately saved, including the setup time (uTsu) displayed in the final line-is indicated by the second-to-last waveform, which is referred to as "data required." According to the Altera Quartus II-Time Quest timing analyzer report shown in Figure 7, the Vedic Multiplier architecture with a size of 16×16 using the Wave Pipelined technique has a clock skew of 0.048 ns (between the wave pipeline source (input) register and the destination (output) register) and a data pathway delay from the source wave pipeline register to the output register of around 6.628 ns.The clock skew performance of wave-pipelined architecture for Vedic multipliers improved by modified architecture.

Clock Minimization: Modified Wave Pipeline approach for 16×16 Vedic Multiplier
The detailed timing report for modified wave pipelined 16×16 Vedic Multiplier obtained through the static timing analyzer tool is shown in Figure 8.Here the slack value, i.e., setup time violation, is adjusted to -5.128 ns from -8.832 ns.According to the timing analyzer report revealed in Figure 9, the modified Vedic multiplier with a 16×16 architecture has a clock skew of 0.035 (between the wave pipeline input register and output register), and the data path delay from the source wave pipeline register to the destination register is around 4.506 ns.
From Figures 7 and 8, it is noted that the clock frequency for both wave-pipelined and modified wave-pipelined 16×16 Vedic multipliers remains the same, i.e., 1 ns, where the setup time violation is adjusted in the modified wave-pipelined 16×16 Vedic multiplier clock to -5.128 ns and -8.382 ns for the wave-pipelined 16×16 Vedic multiplier.Due to adjusting or reducing the setup violation time of the clock signal in the modified wave pipelined architecture, the clock skew was reduced to 0.035 from 0.048 ns compared to the wave pipelined 16×16 Vedic Multiplier.Also, the modified wave pipelined requires less delay, i.e., the delay compensation has reduced the setup time violation by about 2.122 ns (6.628-4.506)for the Vedic multiplier with size 16×16 https://www.indjst.org/modified wave pipelined from the wave pipelined architecture.Hence, the modified wave-pipelined 16×16 Vedic multiplier provides better performance in terms of clock skew and delay compared to the wave-pipelined approach.Xilinx and Altera FPGA software are used to implement digital circuits for image signal processing and digital signal processing applications.Xilinx lags in terms of design tools but performs marginally better.Altera offers significantly less performance in exchange for having superior design tools.Additionally, Altera offers specific timing analyzer tools.Hence, for delay optimization, we have chosen Xilinx tools, and for timing analysis (clock skew) performance, we have used the Altera Time Quest timing analyzer synthesis tool.

Delay and clock skew comparison
The delay and clock skew comparison between wave pipeline and modified wave pipeline approach for 16×16 Vedic multiplier implementation through Altera Quartus-II tool is shown in Figure 9.

Conclusion
In this work, the comparative analysis of a Vedic multiplier with a size of 16×16 using a pipelined, wave-pipelined, and modified wave-pipelined approach is discussed with a vibroarthrography signal as input.The performance is evaluated by three synthesis tools: Xilinx 12.1, Xilinx ISE 14.2, and Altera Quartus.The synthesis report generated from the Xilinx synthesis tool shows that the wave pipeline approach for Vedic multipliers consumed more delay compared to the three-stage pipeline approach.To resolve this problem, a new approach called modified wave pipelines was introduced on the Vedic multiplier architecture with the addition of a known delay between the multiplier and adder unit.According to the synthesis report, the modified wave https://www.indjst.org/pipeline approach saves 1.029 ns in delay compared to the wave pipeline approach.The clock skew problem with respect to setup time violation was analyzed through the Altera time quest analyzer tool, where only modified wave pipelined architecture has a clock skew of 0.035 ns compared to wave pipelined architecture 0.048 ns, i.e., modified wave pipelined provide 32.01%improved delay performance and 27.08% clock skew performance for Vedic multiplier computation.
Hence, the modified wave pipelined approach offers better delay and clock skew performance for the Vedic multiplier.As the modified wave-pipelined approach offered better results for the Vedic multiplier computation, further performance was assessed through a 3-tap digital FIR filter to extract the information presented in the input vibroarthrography signal.The vibroarthrography signal is acquired from the 31-year-old male candidate and connected to the input for a 3-tap digital FIR filter.Based on the frequency response of the FIR filter, the acquired Vibroarthrography signal is normal, and the person has no osteoarthritis diseases.

Fig 6 .
Fig 6.Temporal and spatial diagram of Wave pipelined system

Fig 8 .
Fig 8. Altera Quartus II-time quest timing analyzer timing report for 16×16 modified Wave Pipeline Vedic Multiplier

Table 1 . Comparison Table: Vedic multiplier: Pipeline Vs Wave Pipeline Vs Modified Wave pipeline approach Comparison parame- ters
Target device Altea: Stratix -II Based on the synthesis report generated from the synthesis tools Xilinx 12.1, Xilinx ISE 14.2, and Altera Quartus-II with target devices Vitex-4 for Xilinx and Stratix-II for Altera, respectively, Table