# Hardware Approach to Demodulate Satellite Relayed Video Signals

#### N. P. Dharani\*, V. Meenakshi and N. Harathi

Department of EIE, Sri Vidyanikethan Engineering College (SVEC), Tirupathi – 517102, Andhra Pradesh, India; npdharani16@gmail.com, vakameenakshi@gmail.com, nimmalaharathi@gmail.com

### **Abstract:**

The Finite Impulse Response is utilized for satellite video signal demodulation because of one of a kind properties and the utilization of limited exactness and coefficients to speak to signals for higher accuracy and SNR. The Perks MxClellan Alorithm is utilized to create ideal Low Pass Filter utilizing Remez Command utilizing large number of taps for accurate modeling. To overcome the area overhead of computationally concentrated multipliers that are utilized as a part of existing filters, a LUT based equipment multiplier generates optimized filter regarding range and number of computations. The FIR filter was designed in MATLAB and Xilinx ISE10.1i Tools, from which the coefficients separated, were thought about. The FPGA picked was Spartan3E, rationale enhanced 90nm XC3S500E-5FG320 CMOS innovation FPGA. The LUT based FIR Filter enhances territory by 20% and deferral by 4% roughly.

Keywords: FIR, FPGA, IIR, LUT, Optimal Filter

## 1. Introduction

The filter isolates the band of frequencies required or rejected for assessment of acquired data. They can be characterized into four sorts i.e., Low Pass Filter, High Pass Filter, Band Pass Filter and Band Reject Filter. Since the received signal has similar data in low frequency and the high frequency. LPF is anything but difficult to plan and process the signals. The filters can be composed with finite or infinite impulse response. Linear phase, higher stability and simplicity of execution are the principle advantages of FIR Filters. Thus by and by, FIR Filters are preferred.

Adaptive processing algorithms are more adaptable that can offer service quality and unwavering quality are required for development in execution for challenging sectors like satellite communication<sup>1</sup>. The application subordinate system may need to satisfy the constraints like multifaceted nature, cost, control utilization, speed and outline re-usability<sup>2</sup>.

Locally available preparing capacity is required in satellites with the goal that they can condition, intensify (amplify), or reformat the received uplink information (data) and route the information to fundamental areas or necessary locations, known as Base-Band (BB) processing which down-converts, demodulates, recognizes the signal and reconstructs the binary data stream. Likewise redresses any blunders that may have been experienced amid the uplink<sup>3</sup>. This procedure requires narrow transition band FIR filters whose execution is exorbitant because of more arithmetic operations and equipment

\*Author for correspondence

parts, for example, multipliers, adders and delay components. Additionally this requests more execution time and memory<sup>4</sup>.

To defeat (overcome) the Limitations of FIR filter usage, a few fast multipliers are utilized, for example, memory based approach or Look Up Table (LUT) based approach. FPGAs<sup>5</sup> end up being a productive or reconfigurable solution and supportreal time onboard changes of design of filters and filtercoefficients. By utilizing the inherent Look-Up Tables, the precomputed filter coefficients of FIR filters are stored.

# 2. Distributed Arithmetic Concept

Distributed Arithmetic  $(DA)^{6}$  is a different approach for implementing digital filters. It replaces all multiplications and additions by a table and a shifter-accumulator. DA relies on the fact that the filter coefficients are known, so multiplying c[n]x[n] yields a constant, an important prerequisite for a DA design<sup>7</sup>.

Distributed Arithmetic (DA) is actually used to compute sumof products. Many DSP algorithms like convolution and correlation are formulated in a Sum of Products (SOP) representation<sup>8</sup>. Consider the following sum of products:

$$y = \langle c, x \rangle = \sum_{n=0}^{N-1} c[n]x[n] = c[0]x[0] + c[1]x[1] + \dots + c[N-1]x[N-1] \quad (1)$$

Further assume that the known coefficients as c[n] and the variable x[n] is represented by

$$x[n] = \sum_{b=0}^{B-1} x_b[n] \times 2^b \text{ with} \mathbf{x}_b[n] \in [0,1], \quad (2)$$

Where  $x_b[n]$  represents the b<sup>th</sup> bit position of thenumber's binary representation. The SOP can be represented as:

$$y = \langle c, x \rangle = \sum_{n=0}^{N-1} c[n] \sum_{b=0}^{B-1} x_b[n] \times 2^b$$
(3)

Expanding the summations yields to:

$$y = \langle c, x \rangle = c[0] \times \left( x_{B-1}[0]2^{B-1} + x_{B-2}[0]2^{B-2} + \dots x_0[0]2^0 \right) \\ + c[1] \times \left( x_{B-1}[1]2^{B-1} + x_{B-2}[1]2^{B-2} + \dots x_0[1]2^0 \right) \\ \vdots \\ + c[N-1] \times \left( x_{B-1}[N-1]2^{B-1} + x_{B-2}[N-1]2^{B-2} + \dots x_0[N-1]2^0 \right) \\ \text{Redistributing the terms we have:}$$

 $y = \langle c, x \rangle = (c[0]x_{B-1}[0] + c[1]x_{B-1}[1] + \dots c[N-1]x_{B-1}[N-1]) \times 2^{B-1}$  $+ (c[0]x_{B-2}[0] + c[1]x_{B-2}[1] + \dots c[N-1]x_{B-2}[N-1]) \times 2^{B-2}$  $\vdots$  $+ (c[0]x_0[0] + c[1]x_0[1] + \dots c[N-1]x_0[N-1]) \times 2^0$ (5)

In more compact form:

$$y = \langle c, x \rangle = \sum_{b=0}^{B-1} 2^b \sum_{n=0}^{N-1} c[n] \times x_b[n]$$
(6)

The second summation in condition 6 can be mapped to a Look Up Table (LUT), where the coefficients c[n] are known and the the  $x_b[n]$  values are either 1 or 0 then each SOP is only a blend of the c[n]'s for which a true table can be developed. Consider

$$(c[0]x_{B-2}[0] + c[1]x_{B-2}[1] + \dots c[N-1]x_{B-2}[N-1]) \times 2^{B-2}$$
(7)

Where each  $x_{B-2}$  digit belongs to a different x[n] variable; hence we can form N bit word that can take  $2^N$  values, i.e. with N=7 one of the possible outcome is

$$(c[0] \times 0 + c[1] \times 1 + c[2] \times 1 + c[3] \times 0 + c[4] \times 1 + c[5] \times 0 + c[6] \times 0) \times 2^{B-2}$$

$$= (c[1] + c[2] + c[4]) \times 2^{B-2}$$
(8)

Multiplication by an power of 2 is close to a bit move, so we slice and concatenate the bits of the different x[n] in order to build to construct a table for the known c[n] values.

To deal with signed implementations of DA, A minor modification is introduced. In two's complement, MSB represents the sign of the number. At that point the B-bit is given by:

$$x[n] = -2^{B-1} \times x_{B-1}[n] + \sum_{b=0}^{B-2} x_b[n] \times 2^b$$
(9)

Then, the output y[n] is given by:

$$y[n] = -2^{B-1} \times \sum_{n=0}^{N-1} c[n] \times x_{B-1}[n] + \sum_{b=0}^{B-2} 2^{b} \times \sum_{n=0}^{N-1} c[n] \times x_{b}[n] \quad (10)$$

The block diagram for Distributed Arithmetic FIR filter is shown in Figure 1.



Figure 1. DA Block Diagram.

The DA design yields better area, power and speed trade offbalance when compared with the other available filter structures for FPGA execution. The sum of products calculation required for FIR filter is acknowledged utilizing LUTs, shifters and adders, which can be mapped efficiently onto a FPGA<sup>2</sup>. Bit-level models for vector to vector multiplications can be designed i.e., every vector word is represented as binary number in DAand the multiplications are reordered and blended, so that the arithmetic becomes distributed all through the structure<sup>§</sup>.

The DA in parallel multiply-accumulate hardware reduces sizesuited to reconfigurable FPGA designs. To additionally diminish the equipment/hardwarecomplexity, a symmetrical filter can be utilized to decrease the quantity of coefficients by 2 and one extra clock cycle is utilized before the output is acquired. The area/region saving can be acquired by utilizing DA is almost around half. Additionally dividing the DA based filter diminishes the single LUT (Look up table) into a desired number of LUTs<sup>3</sup>. The calculations are separated into littler components for parallel execution as appeared in Figure 1. Higher speed can be accomplished particularly for higher precision calculations, at the cost of area, i.e., a parallel DA based filter involves more area when contrasted with serial filter<sup>6</sup>.

## 3. Proposed Filter Design

The filter coefficients are derived from MATLAB and the number of co-efficient for a higher order filter will be more. In a DA based technique, the coefficients are represented in LUTs for all possible combinations.



**Figure 2.** Quadrature Multiplexing of two signals –  $m_1$  (t) and  $m_2$  (t).

Double Side band suppressed carrier modulated waves which occupy same channel bandwidth are developed by using Quadrature Carrier Multiplexed scheme. Hence it saves bandwidth. The corresponding block diagram is shown in Figure 2.

Two modulators are used in transmitter with same  $f_c$  but differ in phase by -90°. The transmitted signals (t) is given by

$$s(t) = A_c m_1(t) \cos(2\pi f_c t) + A_c m_2(t) \sin(2\pi f_c t)$$

Where  $m_1(t)$  is the 1<sup>st</sup> modulator message signal and  $m_2(t)$  is the 2<sup>nd</sup> modulator message signal.

Hence s(t) occupies a channel bandwidth of 2W centered at the carrier frequency  $f_c$ , with W is the message bandwidth of m(t).



Figure 3. The Inphase Signal at Transmitter.

Consider the sampling frequency,  $f_s$  is considered to be 1MHz and message frequency,  $f_m$  to be 10 KHz. The inphase and quadrature signals obtained in MATLAB tool are shown in Figures 3 & 4.



Figure 4. The Quadrature Signal at Transmitter.

Figure 5 shows the quadrature multiplexed signal at transmitter output.



**Figure 5.** The Quadrature Multiplexed Signal from Transmitter.

As shown in Figure 6, the receiver uses two separate coherent detectors same as that of transmitter whose output of the each detector is  $1/2 \text{ A}_{c1}\text{m}_1(t)$  and  $1/2 \text{ A}_{c1}\text{m}_2(t)$ .



**Figure 6.** Coherent Detection of Quadrature Multiplexed Signals.

The same  $\rm f_{\rm C}$  is applied to both transmitter and receiver for coherent operation.



Figure 7. The Low pass filtered inphase signal at the receiver.



**Figure 8.** The Low Pass filtered quadrature signal at the receiver.

The Parks-McClellan Algorithm Low Pass filter characteristics are as shown in Figure 9. It is an iterative algorithm used to design and implement efficient optimal FIR filters whose coefficients are found using an indirect method. This algorithm minimizes the error in the pass and stop bands by utilizing the Chebyshev approximation. The design of an FIR filter using Parks-McClellan algorithm is done by estimating the order of the optimal Parks-McClellan FIR filter to meet the design specifications using the remez command, and the corresponding filter coefficients are obtained.



**Figure 9.** Low Pass Filter characteristics of Parks-Mcclellan Algorithm.

The extractedLow Pass FIR Filter coefficients are shown in Figure 10 and the corresponding impulse response is shown in Figure 11.



Figure 10. Estimated Coefficients from Remez Algorithm.



Figure 11. FIR Filter Impulse Response.

As shown in Figure 12, the algorithm starts with design of transmitter and receiver, Low Pass Filter, filter coefficients extraction, coefficient and input values conversion from floating point to its equivalent binary values. Further it is evaluated by Storing of these values in LUTs and Bitshift registers respectively, implementation of DA-FIR in Xilinx Tool using VHDL and verification of result. Further the design is optimized for reduction in area and delay by utilizing the built-in embedded multipliers for base.

## 4. Results and Discussion

The Quadrature carrier multiplexed scheme is used for transmitter and receiver design in MATLAB. The Parks-McClellan Algorithm and Remez algorithm low Pass Filter is designed, the corresponding filter order and coefficients are extracted from MATLAB and there itself they are converted to equivalent binary values.

The coefficients obtained are 256 which are stored in LUT. The design is hardware implemented for Spartan3E FPGA. The device specification is XC3S500E-5FG320. It is a Logic Optimized Device fabricated with 90nm CMOS Technology. VHDL is the hardware description language



Figure 12. Algorithm for DA-FIR Implementation.

used for implementing the design. The simulation is performed by using ISIM simulator, anintegrated simulator used for functional verification of the design and synthesis in Xilinx ISE10.1i Tools.



Figure 13. Simulation Results for the DA-FIR Filter.

The simulation results are shown in Figure 13. where the results are verified for the developed input and coefficient combinations.

| Parameters                  | Dadda<br>Based<br>DA_FIR          | Fast<br>multiplier<br>based<br>DA_FIR | %<br>improvement |
|-----------------------------|-----------------------------------|---------------------------------------|------------------|
| LUTs                        | 93/4656<br>(nm) <sup>2</sup>      | 74/4656<br>(nm) <sup>2</sup>          | 20.43%           |
| Combinational<br>Path Delay | 6.907ns                           | 6.644ns                               | 3.8%             |
| Area Delay<br>Product       | 642.351<br>(nm) <sup>2</sup> - ns | 491.656<br>(nm) <sup>2</sup> - ns     | 23.45%           |

#### Table 1.Comparison Table

From the Table 1, it is clear that the implemented design is further optimized by using the embedded multipliers of the Spartan3E FPGA for multiplication of position with base of the coefficients. The area was reduced by nearly 20% and the combinational path delay was reduced by 3.8%. Hence in the overall design, the area delay product was reduced by 23.45%.

# 5. Conclusion

For computational intensive real time applications like video and image processing in satellites, the narrow band FIR Filter design and implementation is difficult. Also if the filter coefficients are to be modified in satellites, then reconfigurability is the only biggest cost effective solution, for which the promising solution is by utilizing the FPGAs the coefficients can be real time modified and implemented. Distributed arithmetic is one of the best possible solutions for such type of adaptive filter implementation by using multiplier lessdesign; it reduces cost, computations, delay etc. This paper presents the innovative way of utilizing the embedded multipliers for multiplying the design base value of coefficients with the obtained sum to produce the result. The FIR Filter designed was implemented using distributed arithmetic and was further optimized by using the embedded multipliers of the Spartan3E FPGA. The improvements in design were 20.43% in area, 3.8% in combinational path delay and 23.45% in the area delay product.

# 6. Acknowledgements

This paper is dedicated to our Parents, Colleagues and Friends for their support without which the successful completion of this paper is not possible.

# 7. References

- Srividya P, Nataraj KR, Rekha KR. Video Signals Demodulator for Satellite Communication, Emerging Research in Electronics, Computer Science and Technology. Book series Lecture Notes in Electrical Engineering. 2012; 248:99-105.
- Timotth Pratt. Wiley publication: Satellite Communication. 2008.
- 3. Nataraj KR, Ramachandran S, Nagabushana BS. Development of Algorithm for demodulator for processing satellite data communication. IJCSNS. 2009 June; 9(6):233-43.
- 4. Viswanath K and Gunasundari R. USA: Hindawi Publishing Corporation: VLSI Design. Analysis and Implementation of Kidney Stone Detection by Reaction Diffusion Level Set Segmentation Using Xilinx System Generator on FPGA. 2015; (2015), Article ID 581961:10. Crossref
- Nataraj KR, Ramachandran S, Nagabushana BS. Development of Algorithm, Architecture and FPGA implementation of demodulator for processing satellite data communication. IJCSNS. 2009 July; 9(7):137-47.
- 6. Wang Sen, Tang Bin, Zhu Jun. Distributed Arithmetic for FIR Filter Design on FPGA. ICCCAS. 2007; p. 620-23.
- Pranav J. Mankar, Ajinkya M. Pund, Kunal P. Ambhore, Shubham C. Anjankar. Design and Verification of low power DA - Adaptive digital FIR filter. (Elsevier) 7th International Conference on Communication, Computing and Virtualization. 2016; p. 367-73.
- Sravanthi P, Srinivasa Rao CH, Madhava Rao S. A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT. IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-ISSN: 2278-2834, p- ISSN: 2278-8735. 2013 July - August; 7(2):13-18.
- Daniel J Allred, Walter Huang, Venkatesh Krishnan Heejong Yoo and David V Anderson. An FPGA implementation for a High Throughput Adaptive Filter using Distributed Arithmetic. 2004 IEEE symposium on Field programmable custom computing machines. 2004; 324-25.
- 10. Viswanath K, Gunasundari R. Elsevier: Procedia Computer Science: VLSI Implementation and analysis of kidney stone detection by level set segmentation and ANN classifica-

tion. 2015; 48:612-22. DOI: 10.1016/j.procs.2015.04.143. Crossref

- 11. Vigneswaran T, Subbaramireddy P. Design of Digital FIR Filter Based on Dynamic Distributed Arithmetic Algorithm. Journal of applied Sciences. 2007; p. 2908-910.
- 12. Viswanath K and Gunasundari R. 3D Ultrasound imaging for automated Kidney Stone Analysis on FPGA. International Journal of Computer Science and Information Security (IJCSIS). 2016; 14:82-90. ISSN 1947-5500.
- Patrick Longa, Ali Miri. Area efficient FIR Filter Design on FPGAs using Distributed Arithmetic. IEEE International Symposium on Signal Processing and Information Technology. 2006; p. 248-52. Crossref

- 14. Narendra Singh Pal, Harjit Pal Singh, Sarin RK, Sarabjeet Singh. Implementation of high speed FIR filter using serial and parallel Distributed Algorithm, IJCA. 2011; 7:26-32.
- Chitra E, Vigneswaran T. An Efficient Low Power and High Speed Distributed Arithmetic Design for FIR Filter. Indian Journal of Science and Technology. 2016 January; 9(4):1-5. Crossref
- Viswanath K and Gunasundari R. USA: IGI Global Publication: Modified Distance Regularized Level Set Segmentation Based Analysis for Kidney Stone Detection. International Journal of Rough Sets and Data Analysis. 2015 July-December; 2(2):22-39. DOI: 10.4018/ijrsda. 2015070102. Crossref