# Power Optimization using Label Switching Router and Predictor Technique in 2 Dimensional Network on Chip

#### S. Sridevi<sup>1\*</sup> and G. Indumathi<sup>2</sup>

<sup>1</sup>Department of ECE, CMR Institute of Technology, Bangalore – 560037, Karnataka, India; sridevi.s@cmrit.ac.in <sup>2</sup>Department of ECE, Cambridge Institute of Technology, Bangalore – 560036, Karnataka, India; hod.ece@citech.edu.in

#### Abstract

**Objectives**: To design a label switched- predictor router in 2D Network on chip and compare the power consumption in label switched router and label switched predictor router. **Methods/Statistical Analysis**: To implement label switching technique with Maxflow Fulkerson algorithm to find the routing path. The modification added in this paper is the predictor which predicts whether the data is repeated data or new data and if it is a repeated data, the previous output will be considered by which power consumption is improved. An 8 bit data is provided and compared with the previous data to check for repeated data and Xilinx software and ISIM simulator was implemented. **Findings**: The power consumption in label switched router is 177.14mW and in label switched-predictor router is 155.85mW. There is 12 % improvement in power consumption. When the size of network increases; the power consumption for the repeated data inputs will be reduced by this implementation. **Application/Improvements:** This work can be extended by applying streaming video inputs to the 3x 3 mesh network and compare its performance.

Keywords: Label Switching, Network on Chip, Predictor, Power, 2 Dimension

## 1. Introduction

Currently, the number of transistors in a System on chip is 2 Billion which is equal to 10000 lakhs. The number of transistors are increasing over decades and it is a bottle neck for System on chip which incorporates bus based architecture<sup>1</sup>. As the number of IP modules in Systemson-Chip (SoCs) increases, bus-based interconnection architectures may prevent these systems to meet the performance required by many applications. For systems with intensive parallel communication requirements buses may not provide the required bandwidth, latency, and power consumption. A solution for such a communication bottleneck is the use of an embedded switching network, called Network-on-Chip (NoC), to interconnect the IP modules in SoCs. It is eminently scalable and flexible. These include power management, multiple clock domains, security requirements, error handling and others. Moreover, physical implementation through floor planning and place-and-route is made increasingly difficult by competing and sometimes conflicting area, wiring, frequency, clock domain and voltage domain constraints. This is achieved by network on chip where decoupling the computation and communication. The advantages

\*Author for correspondence

of NOC over SOC is reduced wire routing congestion, easy IP change, easy timing closure and higher operation frequencies. There are various switching techniques like circuit switching, packet switching and label switching in which label switching techniques shows reduced memory usage in terms of labels instead of IP address of the destination <sup>13</sup>

In label switching technique, Individual packets carry route information in the form of labels. Routers along the path use the label to identify the next hop, forwarding information, traffic priority, Quality of Service (QoS) guarantees and the next label to be assigned. Label switching inherently supports traffic engineering, as labels can be chosen based on desired next hop or required QoS services. The advantages of label switching technique are lower cost, Quality of service attributes, Scalability and traffic routing. Currently the power consumption in existing network on chip technologies are critical. So many data would be redundant and the power consumption can be saved by retaining the previous output for the redundant data. So a predictor is incorporated in the input of the source node to check for the redundancy and label switching technique contributes traffic engineered router with improvement in power consumption

## 1.2 Organization of the Paper

Section 2 provides the related work about this paper. Section 3 provides background and motivation of LS-pred Router design. Section 4 Experimental results are presented with the analysis of the power reduction. Section 5 The conclusion about this work and future enhancements are discussed.

# 2. Related Work

Quality of service in on chip networks is one of the major research problems in NoCs. Quality of service refers to area, power consumption, latency and cost. In the above Quality of service, this paper contributes in terms of area, power consumption and latency. In paper<sup>2</sup>, the various performance parameters for NoC was discussed with its advantages over SoC and implemented a deadlock free algorithm and improved throughput and average latency. In paper<sup>3</sup>, presented a multi-layer mesh NoC approach to improve the QoS of such communication hungry SoCs. While one mesh layer is fixed in the system for control purposes, other data layers can be configured at runtime to provide the desired data throughput required by the application. This is accomplished by partially and dynamically reconfiguring the data layer routers. Paper<sup>4</sup> introduced the concept of on chip networks and discussed about the advantages of structure, performance and modularity and challenges in the architecture and in the design of the networks. The requirement of network on chip over Multiprocessor SoC is surveyed with different topologies and routing protocols, switching techniques and flow control techniques<sup>5</sup>. The NoC design challenges are discussed in general. There is no implementation to pro.ve the betterment of NoC. The physical interconnection i.e. bus being a limiting factor for performance and energy consumption was discussed in paper<sup>6</sup>. The shared medium networks use arbitrary mechanism to serve based on priority. It has limited scalability Energy inefficiency is another limiting factor as data has to reach destination at great energy cost. Discussed about direct and indirect networks where data transfer happens as point to point and through set of switches respectively. A comparative study between circuit switched network and packet switched network is implemented<sup>Z</sup>. A general conclusion of circuit switching provides better performance when packet size is high than packet switching which gives a better performance with low load. Depending on the size of packet, the Quality of service is delivered. Providing Quality of service guarantees in on chip communication networks has been identified as one of the major research problem in NoCs<sup>8</sup>. Comparative analysis for MPEG decoder was implemented using circuit switched and packet switched network wherein the circuit switched network is better in its performance in terms of power, delay, area and bandwidth<sup>2</sup>. A comparative analysis of power, latency and throughput has been implemented using XY routing algorithm and Odd even routing algorithm for 3 x3 mesh network<sup>10</sup> using packet switching technique. Network power consumption was more in OE algorithm when compared to latency and Throughput. Various performance metrics Average

Latency, Energy, Message throughput and area are evaluated for different topologies and a comparative analysis was performed over the different performance metrics<sup>11</sup>. Average Latency was evaluated by

Latency L<sub>i</sub> for a given message i is given by

Li = sender overhead + transport latency + receiver overhead

(1)

and the average latency is given by

$$L_{avg} = \frac{\sum_{i}^{P} L_{i}}{P}$$
<sup>(2)</sup>

Energy was evaluated by Energy per flit per hop:

### Ehop = E switch + E interconnect (3)

The energy dissipated in transporting a packet consisting of n flit over h hops was evaluated by

$$E_{packet} = n \sum_{j=1}^{h} E_{hop.j} \tag{4}$$

Message Throughput was evaluated as

$$TP = \frac{(Total messages completed) \times (Message length)}{(Number of IP blocks)(Total time)}$$

(5)

# 3. Background

## 3.1 Design of LS-NoC for 3 x 3 Mesh Network

Figure 1 shows an example 3 x 3 LS-NoC in a 2-Dimensional mesh topology, each is identified by its coordinates (x,y). Each block consists of an identical processing element(PE) or customized IP block.



Figure 1. 2D 3 x 3 mesh LS-NoC <sup>12</sup>

| Input Label | Direction bits | New label |
|-------------|----------------|-----------|
| 0000        | 00001          | 0000      |
| 0001        | 00010          | 0001      |
| 0010        | 00100          | 0010      |
| 0011        | 01000          | 0011      |
| 0100        | 10000          | 0100      |
| 0101        | 10001          | 0101      |
|             |                |           |
| 1111        | 01011          | 1111      |

## Table I -Routing table for a single input port in a router with 4 bit 1

The Router consists of 5 ports, local, east, west, north and south ports. The local port connects to the PE. The Router is connected to the processing element by network interface. The connectivity between each block is bidirectional.

Each block is identified by labels. Labels are sent through dedicated set of links with the data. The label on the data bus is used to identify the intended outgoing port by the routing Table 1 <sup>12</sup>. Labels can be reused across sources. Label reuse may cause collision sharing the link. Label swapping reassigns unused labels to avert label collisions. When a data from one PE block has to be communicated with another, they hop by labels which are assigned in the routing table.

#### 3.1.1 Motivation

Due to technology scaling into nanometer scale, power is becoming a dominant factor in overall parameters in Network on chip. Input data which has to be transferred from source node to destination node may be redundant and the power can be optimized in such scenarios. In this paper we have implemented predictor technique involves an 8-bit shift register which is included in the input register as a predictor to indicate whether the data is a new data or repeated data based on which the label and data are concatenated for further communication. By this technique, the power consumption in the network is optimized.

## 3.2 LS-pred Router Design in 2D Network on Chip

The combinational circuit for a single input port to single output port in a label switched-predictor (LS-pred) router is shown in Figure 2.

It consists of two blocks, INPUT AND OUTPUT port. The input port consists of input register, Forward Control Block (FCB), multiplexers, FIFO block and Routing table. The output port consists of Arbiter, multiplexer and Output register.

## 3.2.1 Input Port

**Input register:** When a data enters an input port of a router, the router assigns the label to the data. In this paper, label is implemented with 4 bit and data is implemented with 8 bit. It is activated on positive edge clock.

Based on the valid signal true condition, the input register concatenates the data and label. Based on the destination block, the hops are decided and labels are assigned for each hop in the routing table.



Figure 2. Label Switched Predictor Router 12

**FIFO:** The concatenated data from the input register block is written into FIFO whose depth is 16 starting from address 00000 to 01111. Fifth bit in FIFO address is used to indicate that the FIFO is full and the data is sent to another input port where FIFO is free.

**FCB:** The forward control block handles FIFO pointer and controls flow control signal of the corresponding input port. It has a pause out signal to indicate that the FIFO is full and directed to next router.

**Multiplexer:** The multiplexer is activated by the selection line which is an output of FCB. When the FCB\_out is low, data from the input register is considered for next hop. If the FCB\_out is high,data from FIFO is read and considered for next hop.

**Routing Table:** The routing table in LS-pred router has input label direction bits and new label as shown in Table 1. A bit corresponding to output port is set if the label to be routed is to exit the router from that output port. Multiple bits can be set corresponding to the output port which enables multicast and broadcast. A new label field is maintained to enable label swapping. The size of routing table at each input port of the LS-pred router is entries where lw is the label width.

#### 3.2.2 Output Port

**Arbiter:** It main function is to keep the updated status of the free output ports. It receives valid signal from routing table and check whether the output port is free and allows the input data to be transmitted through multiplexer by the selection signal from arbiter.

**Output Register:** The data from multiplexer sends only the data from the ouptut port by removing the label.

#### Maxflow Fulkerson Routing Algorithm:

In this algorithm, a capacity of 3 links is provided for each node in all possible directions. When the capacity of the source node is compared to choose the path or pipe. If it is equal, upper path is chosen, if not other path is chosen as shown in Figure 3. It consists of nine nodes a0 to a8. Input is considered to be given in a0 and destination in a8 node.



Figure.3 Maxflow Fulkerson Algorithm

#### 3.3 Establishment of a Pipe

A pipe is the link between the source and destination nodes. The routing table is configured in the routers according to the path of the pipe to complete the pipe establishment. During pipe establishment, fault tolerance is taken into consideration by the link status.

A pipe is a triplet (S,D,R) where S and D are source and destination nodes respectively and R is set of intermediate routers connecting S and D.A source and destination can have multiple pipes with varying set of intermediate routers maintaining the throughput.

#### 3.3.1 Labels

A label is assigned for each source node S and it uniquely identifies the pipe and intended destination D even though the label may change in between the pipe establishment. Labels can be reused across sources. Using labels, a pipe can be represented as set of  $S, l_0, l_1, l_2, ..., l_{h-1}$ ,D where the pipe connects source S to destination D through h routers.  $l_0$  is the label connected to Router R0 and label  $l_1$  is the label connected to Router R1. Router

R0 connects to source and Router  $R_{h-1}$  connects to destination D. Label switching offers a low-overhead solution by reducing meta-data in packets. Smaller labels result in simpler routing tables and lesser logic in the router. Using labels instead of node ids decreases routing table sizes.

#### 3.3.2 Label Swapping

Advantage of Label switching NoC(LS-NoC) is provision of guaranteed throughput between nodes. As the node increases, probability of label collision increases. There occurs a routing table entry clash when label collision occurs in the link. From Figure 3(a), Consider two routers R0 and R1. Pipes north and west ports of Router 0 have label 0. Let us consider both pipes leave south port to reach north port of Router 1. As each input port has a separate routing table, there is no label

If already label 1 is used, then both the pipes having label 1 will not be able to pass through north port of Router 1. Label swapping reassigns labels to the conflicting pipes using available label space at the next router



Figure 4. Label conflict and label swapping<sup>12</sup>

as in Figure 3(b). Now two different labels 2 and three are assigned and sent to free output port south and east respectively<sup>12</sup>.

# 4. Experimental Results

The LS -NoC Router and LS-pred Router for 3 x3 mesh network were coded in Verilog language and the func-



Figure 5(a)



Figure5(b)



Figure 5(c)

#### Table II : An example of Label conflict and label swapping technique 12

01

2

01

3

| RTO |             |    | RT | 0           |
|-----|-------------|----|----|-------------|
| il  | Direction   | ol | il | Direction   |
|     | bits(LNSWE) |    |    | bits(LNSWE  |
| 1   | 00100       | 1  | 1  | 00100       |
| RT1 |             |    | RT | 1           |
| il  | Direction   | ol | il | Direction   |
|     | bits(LNSWE) |    |    | bits(LNSWE) |
| 1   | 00100       | 1  | 1  | 00100       |

| RT2 |             |    |  |  |
|-----|-------------|----|--|--|
| il  | Direction   | ol |  |  |
|     | bits(LNSWE) |    |  |  |
| 2   | 00100       | 2  |  |  |
| 3   | 00001       | 3  |  |  |

where il - input label, ol - output label and Direction bits(LNSWE)- local, north, south, west, east.

tional verification of verilog design were done using Xilinx software with ISIM simulator. The Power consumption by the Label switched router and the power consumption for LS-pred technique is 177.14mW and for LS-NoC is 155.85mW as shown in Figure 5(c)There is an improvement of 12% in power. Other parameters like Time consumption for circuit implementation for LS-NoC is 4.839ns and LS-pred NoC is 4.249ns as in Figure 5(a). The area in terms of number of occupied slices is 94 with 9% utilization for LS-NoC router and number of occupied slices is 122 with12% utilization for our proposed technique as in Figure 5(b).

## 5. Conclusion and Future Scope

In this paper, Label switched router with the predictor technique is designed and implemented. The power is optimized by 20mW, Latency is reduced by 0.6ns and area overhead is 3%. Further this work can be motivated to improve Power and can be implemented for streaming applications in cadence tool with 45nm technology to optimize area and power.

## 6. References

- 1. Arteries SA. From "Bus" and "Crossbar" to "Network -On-Chip".
- Kamal R, Yadav N. NoC and bus architecture: A comparison. International Journal of Engineering Science and Technolog. 2012; 4(4):1438–42.
- Möller L, Fischer P, Moraes F, Indrusiak LS, Glesner M. Improving QoS of multi-layer networks-on-chip with partial and dynamic reconfiguration of routers. International Conference on Field Programmable Logic and Applications 2010, India, IEEE; 2010. p. 229–33. https://doi.org/10.1109/ FPL.2010.53.
- Dally WJ, Towles B. Route packets, not wires: On-chip inteconnection network. Proceedings of the 38th annual Design Automation Conference; 2001. p. 684–9. https:// doi.org/10.1145/378239.379048.
- Ankur A, Iskander C, Shankar R. Survey of Network on Chip (noc) architectures and contributions. Journal of Engineering, Computing and Architecture. 2009; 3(1):21–7.
- Luca B, De Micheli G. Networks on chips: A new SoC paradigm. Computer-IEEE Computer Society-35. EPFL-ARTICLE-165542; 2002. p. 70–8.

- Liu S, Jantsch A, Lu Z. Analysis and evaluation of circuit switched NoC and packet switched NoC. Digital System Design (DSD), Euromicro Conference, IEEE; 2013. p. 21–8. https://doi.org/10.1109/DSD.2013.13.
- Marculescu R, Ogras UY, Peh LS, Jerger NE, Hoskote Y. Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2009; 28(1):3–21.https://doi. org/10.1109/TCAD.2008.2010691.
- Manfred S-S. Circuit switching versus packet switching. International Journal of Open Information Technologies. 2015; 3(4):27–37.
- Singh JK, Swain AK, Reddy TN, Mahapatra KK. Performance evalulation of different routing algorithms in Network on Chip., IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia); 2013. p. 180–5. https://doi.org/10.1109/ PrimeAsia.2013.6731201.
- Pande PPP, Grecu C, Jones M, Ivanov A, Saleh R, Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers; 2005. P.1025–40. https://doi.org/10.1109/TC.2005.134.
- Talwar, Basavaraj, Amrutur B. Traffic engineered NoC for streaming applications. Microprocessors and Microsystems. 2013; 37(3):333–44. https://doi.org/10.1016/j. micpro.2013.02.003.