ISSN (Print): 0974-6846 ISSN (Online): 0974-5645 # An Efficient MAC Design for Image Processing Application S. Tamilselvan and A. Arun Department of Electronics and Communication Engineering, M. Kumarasamy College of Engineering, Karur - 639113, Tamil Nadu, India; tamilselvans.ece@mkce.ac.in, aruna.ece@mkce.ac.in #### **Abstract** **Objectives**: An efficient MAC design is planned for 2's complement numbers with no more than partial-product creation methodology and diminution tree, whereas the succeeding step trappings an extraordinary sign-conservatory solutions. **Methods/Statistical Analysis**: Accumulate operations form an important process in signal and image processing applications. To expand MAC execution, the basic way postponement can be diminished by embeddings an additional pipeline register, either in the interior the two rows or stuck between the twos and the final adder. **Findings**: The 16 bit MAC architecture is implemented using Baugh-Wooley (BW) multiplier with Twin precision concept. The proposed methodology is verified by implementing in a 16 x 16 MAC unit. The performance parameters extracted demonstrates that the proposed Modified Booth with Twin precision based 16 bit MAC demonstrates an area reduction of 45.3%, delay reduction of 30.8% and power saving of 60% when compared to the MAC architectures designed using Baugh-Wooley with twin precision multiplier. **Application/Improvements**: It performs better than the conventional MAC in all the aspects. **Keywords:** Baugh-Wooley Multiplier, MAC Design, Partial-Product, Sign-Conservatory Solutions, Twin Precision # 1. Introduction In Core technologies in multimedia and communique system, to increase the speed of the operation, a dedicated architecture is needed for these DSP operations without disturbing the ALU of the processor. So this project is done by concentrating on the high-speed and lower power of the MAC architecture. With the special and simple signextension technique the problem of extending sign bit till the 2Nth bit of the Modified Booth multiplier is reduced. The twin precision will further reduce length of the multiplication into half, and process the multiplication in parallel. Also by pipelining, the hardware reduction is also possible for this MAC architecture, which in turn reduces the area of the total design. For the multiplier row making, a innovative personalized Booth encoding (MBE) scheme<sup>1</sup> to enhance the performance of usual MBE scheme, there are two drawbacks: 1) An supplementaryrow terms at the (n-2)<sup>th</sup> bit location; 2) Poor overall concert at the LSB-element compared with the non-Booth propose by means of TDM. The planning will be useful on binary trees constructed by means of 4:2 reducerunits. Escalating rate of process is done by taking benefit of the offered at no cost input lines of the reducer units, which outcome from the likely quadrilateral figure of the generated rows and with the bits of the accumulated rate to fill up in these gaps. This consequence is assimilation the accumulation procedure within the multiplication process. To realize area-efficient and high speed MAC unit significant delays and hardware difficulty of conformist MAC design² were examined to obtain at aentity with stumpy significant delay and petite hardware complication. The innovative architecture is derived from binary trees erectby means of adapted 4:2 reducer units. The twin-precision technique can diminish the power debauchery by adapting a multiplier to bit wide of the operands being obtained. This technique also enables an enlarged computational throughput, by allowing quite a lot of narrow-width maneuver to be computed in correspondent. In this twin-precision using the modified-booth algorithm<sup>3</sup>, for the two 4-bit multiplications, 2 shorter prototypes of 1's and 0's are desirable. The circuits for indoctrination and interpret are very simple. The main problem of uneven delay occurred in adders can be avoided via a carry-look-ahead adder to reinstate the very last row<sup>4</sup>. Novel detectors with dynamic range to find the vibrant effectiveness of inputs are developed to decrease the power<sup>5</sup>. The recognition outcome is applied to not only select the operand with less important Booth encoding with dynamic range to amplify the partial products becoming nil possibility<sup>6-8</sup>. This study illustrates the pipeline construction of high-speed adapted Booth multipliers. The rate of the multipliers is enhanced to a great extent by appropriately deciding the amount of channel stages and the spot for the pipeline registers to be introduced. The customized Booth algorithms trim down the number of partial products to be engendered and are recognized as the best everrate multiplication algorithm. The investigational outcome make obvious so as to projected multiplier will offer a variety of configurable uniqueness for DSP and multimedia structure and saves extra power. The rest of the chapter organized as follows: section II describes about the general MAC structural design. Section III deals with the planned MAC architecture. Section IV discusses about the results and discussion section V gives the conclusion about the work. ## 2. The MAC Architecture In this segment, fundamental MAC function is introduced. Three common steps were used in all multipliers. The foremost is radix-2 Booth programming in which the rows of partial product iscreates from the X and Y respectively. The succeeding is adder assortment or partial product density to for partial products addition and renovates them into the structure of sum and carry. After everything else is the ultimate addition in which the concluding multiplication consequence is produced by adding up the sum and the carry. If the method to collect the multiplied consequences is incorporated, a MAC consists of four stepladders, as shown in Figure 1, which is evidence for the equipped steps unambiguously. Figure 2 reveals wide-ranging hardware construction of any MAC. This implement the multiplication by key in multiplicand Y with the multiplier X. This is added to the preceding multiplication outcome Z as the gathering footstep. **Figure 1.** Basic arithmetic steps of multiplication and accumulation. Figure 2. Hardware planning of wide-ranging MAC. Binary number X with N-bit of 2's complement values can be articulated as $$X = -2^{N-1}x_{N-1} + \sum_{i=0}^{N-2} x_i 2^i x_i \epsilon \, 0, 1 \tag{1}$$ If the equation number 1 is articulated in base-4 category outmoded mark numeral form so as to relate to Booth's algorithm having radix-2. $$X = \sum_{i=0}^{\frac{N}{2} - 1} d_i 4_i \tag{2}$$ $$d_i = -2x_{2i+1} + x_{2i} + x_{2i-1} \tag{3}$$ If the equation number 2 is worn, multiplication could bearticulated as, $$X * Y = \sum_{i=0}^{\frac{N}{2} - 1} d_i 2^{2i} Y \tag{4}$$ From the above equations, P equation is given by, $$P = X * Y + z = \sum_{i=0}^{\frac{N}{2}-1} d_i 2^{2i} Y + \sum_{i=0}^{2N-1} Z_i 2^i$$ (5) The equation 5 shows the standard MAC blueprint. For multiplication of data of N bits, the amount of produced partial products is relative to same bit wide. So as to append them successively, the implementation instant is also relative to N. The planning of a multiplier, which is the best ever, utilizes radix-2 Booth encoding that produces partial products and a Wallace tree derived from CSA as the adder array to append the partial products. If radix-2 Booth encoding is worn, the inputs to the Wallace tree, is abridged to N/2, consequentially diminishing in CSA tree footstep. Additionally, the signed multiplication derived from 2's complement records is also promising. By reason of most recent multipliers agree to the Booth encoding. # 3. Planned MAC Construction ### MAC Unit using Modified Booth Multiplier with Twin Precision Precursor multiplication is a tricky method. There is no need for sign extension in unsigned multiplication. On the other hand the identical method cannot be applied in signed bit multiplication, since this would yield faulty result, as the 2's compliment structure is used for the signed numbers. In order to triumph over this drawback the Booth's algorithm is helpful and this tactic conserves the sign of the product. #### Modified Booth Multiplier There are three steps involved in Multiplication process. Foremost footstep is to create the partial products. Succeeding stride is continued till the number of rows abridged to two. After everything else last two rows were added by calculate the concluding results of multiplication. The system of probing the even bit multiplier with Modified Booth with radix 4 is shown in Table 1. A LSB value of bit 0 is padded for grouping. An 8 bit signed bit multipliers can be grouped into 4 dissimilar clusters as shown in Figure 3. **Table 1.** Modified Booth Encoding Technique | Action | Input Bits | | | | |--------|--------------------|------------------------------|--------------------------------|--| | | $\mathbf{Y}_{i-1}$ | $\mathbf{Y}_{_{\mathrm{i}}}$ | $\mathbf{Y}_{_{\mathrm{i+1}}}$ | | | X * 0 | 0 | 0 | 0 | | | X * 1 | 1 | 0 | 0 | | | X * 1 | 0 | 1 | 0 | | | X * 2 | 1 | 1 | 0 | | | X * -2 | 0 | 0 | 1 | | | X * -1 | 1 | 0 | 1 | | | X * -1 | 0 | 1 | 1 | | | X * 0 | 1 | 1 | 1 | | **Figure 3.** 8-Bit Modified Booth Encoding. The number of rows abridged by N/2 with the accomplishment of modified Booth in initial step. This method is the most professional encoding and decoding scheme. The modified Booth algorithm begin from grouping Y by three bits and encoding into one of {-2, -1, 0, 1, 2} for multiplying X by Y. The different encoding methods were tabulated. The logic diagram for corresponding method is displayed in Figure 4. By using the carry save adder the PP rows were reduced to get last two rows. Product of the multiplication is obtained by adding the very last two rows with carry propagation adder. #### Modified Booth Multiplier with Twin Precision Figure 5 shows the MBE Multiplication of 8 bit numbers The LSB bit of the partial product is given by, $$P_{i LSB = y_0(x_{2i-1} \oplus x_{2i})}$$ (6) Figure 5. MBE Multiplication of 8 bit numbers. **Figure 4.** Modified Booth a) Encoder b) Decoder Circuits. The Sign bit reduction row has a, and is given by, $$a_i = x_{2i+1} (\overline{x_{2i-1} + x_{2i}} + \overline{y_0 + x_{2i}} + \overline{y_0 + x_{2i-1}})$$ (7) The N/2 bit multiplication is computed by calculating the PP rows in gray region. Then the parallel computations of Gray and Black regions of PP rows were possible with this win precision method. The planned MAC is shown in Figure 6. The Booth decoder first reduces the no of PP rows and the pipelined register supplies the rows to be added to the CSA adder. The accumulation unit then keeps updated with the received product values. #### • Sign Extension Reduction Technique The methodology in Figure 7 comes into act to while extending the sign bit of each rows. Here a simple two bit values like sign bit and 1's were used instead of extending the sign bit till the MSB position. But this will increases the number of partial product rows. These bits were calculated based on the following results. First all the sign terms were assumed to be '1' and are added to get default 1's and 0's for each row. By adding all the 1's using binary addition and by using the special sign extension reduction technique given below, the resultant partial products **Figure 6.** Architecture of MAC using Modified Booth Multiplier. **Figure 7.** Complete Booth 2 Multiplications with Height Reduction. becomes like the one shown in below. By this method only 2 bits are extended in each row, except first row. Using the Figure 8 it is proved that the above method is equal to that of the sign extended method. Figure 8. Sign Extended Method. # 4. Results and Discussion The ALTERA QUARTUS-II simulator software is used to obtain the simulated output for the proposed MAC architecture. The outputs obtained are completely with respect to the corresponding complementary inputs. Keeping in mind the end goal to enhance the execution of the MAC using Modified Booth multiplier, the pipeline technique is applied. This MAC is synthesized by utilizing the Design Compiler named Synopsys having the library file 90nm. Table 2 shows the area, power and delay comparisons of 32 bit and 64 bit multipliers. Among the all MAC, the proposed one shows the best performance. For signed mul- **Table 2.** Performance Analysis of the MAC Architectures with Modified Booth and Baugh-Wooley Multipliers | Parameter | MBE 32 bit<br>(proposed) | MBE 64 bit<br>(proposed) | |----------------------------|--------------------------|--------------------------| | Total cell area | 8472.81264 | 22446.19841 | | Total dynamic power (mW) | 18.60053 | 69.38324 | | Cell leakage<br>power (mW) | 0.04688 | 0.12676 | | Delay (ns) | 30.537 | 36.700 | | PDP (pJ) | 568.008 | 2546.392 | tiplication the Modified Booth algorithm was implemented in conventional as well as in Twin-precision multiplier. Table 3 lists the experimental values for the existing and proposed 16-bit MAC architectures. Table 3 lists area, delay and power for the 32 and 64 bit MAC using Twin-precision multiplier with Modified booth algorithm. With the proposed MAC unit with 16 bit as neutral point, the delay of the MAC using Baugh-Wooley multiplier is reduced by 30.8%; whereas the power dissipation was reduced by 60%. Similarly the proposed MAC architecture improved with 45% of area. #### • Performance Analysis The performance parameters are used to compare the MAC unit designed using Baugh-Wooley multiplier and the proposed MAC using Modified Booth multiplier architecture. Figure 9 shows that the Power-Delay product of the proposed MAC using Modified Booth multiplier architecture is abridged by 72 at the point when contrasted with the MAC using Baugh-Wooley multiplier. Table 3. Performance Analysis of the BIT Extended MAC Architectures with Modified Booth Multiplier | Paramet-er | MAC using CPA<br>16 bit | MAC using CSA<br>16 bit | MAC using CSA with twin precision 16 bit | MBE<br>16 bit (propose-d) | |------------------------------|-------------------------|-------------------------|------------------------------------------|---------------------------| | Total no of nets | 7701 | 7469 | 2551 | 1576 | | Combin-ational<br>Area (µm²) | 17649.00 | 17032.00 | 4707.75 | 2120.25 | | Total cell<br>Area (µm²) | 17649.00 | 17032.00 | 5785.75 | 3198.25 | | Total dynamic power (mW) | 54.15 | 44.64 | 12.47 | 4.99 | | Delay (ns) | 70.23 | 58.57 | 36.70 | 25.41 | **Figure 9.** PDP Analysis of MAC unit. **Figure 10.** Cell Leakage Power Analysis of MAC unit. Figure 10 shows that the cell leakage power in *mw* of the proposed MAC using Modified Booth multiplier architecture is reduced by 44% when compared to the MAC using Baugh-Wooley multiplier. Hence this proposed work is power efficient when properly pipelining the MAC architecture. ## 5. Conclusion In the proposed MAC architecture, the Modified Booth multiplier is used for generating the partial product rows. To further enhance the performance Twin precision concept is used in the partial product generation. Secondly the partial product reduction is carried out using carry save adders which reduces the delay further. Since this MAC architecture performs signed bit multiplication, a special sign extension reduction technique is applied to the partial product rows, which further reduces the unnecessary sign bits extension. The performance parameters extracted using Synopsys software (Design Vision and ICC compiler with 90 nm technology file) demonstrates that the proposed Modified Booth with Twin precision based 16 bit MAC demonstrates an area reduction of 45.3%, delay reduction of 30.8% and power saving of 60% when compared to the MAC architectures designed using Baugh-Wooley with twin precision multiplier. So it performs better than the conventional MAC in all the aspects. ## 6. References - 1. Yeh WC, Jen CW. High-speed booth encoded parallel multiplier design. IEEE Transactions on Computers. 2000; 4(7):692-701. - Abdelgawad A, Bayoumi M. High speed and area-efficient multiply accumulate (MAC) unit for digital signal processing applications. IEEE International Symposium on Circuits and Systems (ISCAS). 2007; p. 3199-202. - 3. Sjalander M, Larsson-Edefors P. Multiplication acceleration through twin precision. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2009; 17(9):1233-46. - Hatamian M, Cash GL. A 70-MHz 8-bit x 8-bit parallel pipelined multiplier in 2.5-μm CMOS. IEEE Journal of Solid-State Circuits. 1986; 21(4):505-13. Crossref. - Kuang S-R, Wang J-P. Design of power-efficient configurable booth multiplier. IEEE Transactions on Circuits and Systems I: Regular Papers. 2010; 57(3):568-80. Crossref. - Shiann-Rong K. Modified Booth Multipliers with a Regular Partial Product Array. IEEE transactions on circuits and systems-ii: express briefs. 2009; 56(5):548-52. Crossref. - Hoang TT, Sjalander M, Larsson-Edefors P. Double throughput multiply-accumulate unit for FlexCore processor enhancements. Proceedings of IEEE International Symposium on Parallel & Distributed Processing. 2009; p.18-25. - 8. Hoang TT, Sjalander M, Larsson-Edefors P. A high-speed, energy-efficient two-cycle multiply-accumulate (MAC) architecture and its application to a double-throughput MAC unit. IEEE Transactions on Circuits and Systems Part I: Regular Papers Special section on, IEEE system-on-chip conference. 2010; 57(12):119-22.