Power Optimization using Label Switching Router and Predictor Technique in 2 Dimensional Network on Chip

Objectives : To design a label switched-predictor router in 2D Network on chip and compare the power consumption in label switched router and label switched predictor router. Methods/Statistical Analysis: To implement label switching technique with Maxflow Fulkerson algorithm to find the routing path. The modification added in this paper is the predictor which predicts whether the data is repeated data or new data and if it is a repeated data, the previous output will be considered by which power consumption is improved. An 8 bit data is provided and compared with the previous data to check for repeated data and Xilinx software and ISIM simulator was implemented. Findings : The power consumption in label switched router is 177.14mW and in label switched-predictor router is 155.85mW. There is 12 % improvement in power consumption. When the size of network increases; the power consumption for the repeated data inputs will be reduced by this implementation. Application/Improvements: This work can be extended by applying streaming video inputs to the 3x 3 mesh network and compare its performance.


Introduction
Currently, the number of transistors in a System on chip is 2 Billion which is equal to 10000 lakhs. The number of transistors are increasing over decades and it is a bottle neck for System on chip which incorporates bus based architecture 1 . As the number of IP modules in Systemson-Chip (SoCs) increases, bus-based interconnection architectures may prevent these systems to meet the performance required by many applications. For systems with intensive parallel communication requirements buses may not provide the required bandwidth, latency, of NOC over SOC is reduced wire routing congestion, easy IP change, easy timing closure and higher operation frequencies. There are various switching techniques like circuit switching, packet switching and label switching in which label switching techniques shows reduced memory usage in terms of labels instead of IP address of the destination 13 In label switching technique, Individual packets carry route information in the form of labels. Routers along the path use the label to identify the next hop, forwarding information, traffic priority, Quality of Service (QoS) guarantees and the next label to be assigned. Label switching inherently supports traffic engineering, as labels can be chosen based on desired next hop or required QoS services. The advantages of label switching technique are lower cost, Quality of service attributes, Scalability and traffic routing. Currently the power consumption in existing network on chip technologies are critical. So many data would be redundant and the power consumption can be saved by retaining the previous output for the redundant data. So a predictor is incorporated in the input of the source node to check for the redundancy and label switching technique contributes traffic engineered router with improvement in power consumption

Organization of the Paper
Section 2 provides the related work about this paper. Section 3 provides background and motivation of LS-pred Router design. Section 4 Experimental results are presented with the analysis of the power reduction. Section 5 The conclusion about this work and future enhancements are discussed.

Related Work
Quality of service in on chip networks is one of the major research problems in NoCs. Quality of service refers to area, power consumption, latency and cost. In the above Quality of service, this paper contributes in terms of area, power consumption and latency. In paper 2 , the various performance parameters for NoC was discussed with its advantages over SoC and implemented a deadlock free algorithm and improved throughput and average latency.
In paper 3 , presented a multi-layer mesh NoC approach to improve the QoS of such communication hungry SoCs. While one mesh layer is fixed in the system for control purposes, other data layers can be configured at runtime to provide the desired data throughput required by the application. This is accomplished by partially and dynamically reconfiguring the data layer routers. Paper 4 introduced the concept of on chip networks and discussed about the advantages of structure, performance and modularity and challenges in the architecture and in the design of the networks. The requirement of network on chip over Multiprocessor SoC is surveyed with different topologies and routing protocols, switching techniques and flow control techniques 5 . The NoC design challenges are discussed in general. There is no implementation to pro.ve the betterment of NoC. The physical interconnection i.e. bus being a limiting factor for performance and energy consumption was discussed in paper 6 . The shared medium networks use arbitrary mechanism to serve based on priority. It has limited scalability Energy inefficiency is another limiting factor as data has to reach destination at great energy cost. Discussed about direct and indirect networks where data transfer happens as point to point and through set of switches respectively. A comparative study between circuit switched network and packet switched network is implemented 7 . A general conclusion of circuit switching provides better performance when packet size is high than packet switching which gives a better performance with low load. Depending on the size of packet, the Quality of service is delivered. Providing Quality of service guarantees in on chip communication networks has been identified as one of the major research problem in NoCs 8 . Comparative analysis for MPEG decoder was implemented using circuit switched and packet switched network wherein the circuit switched network is better in its performance in terms of power, delay, area and bandwidth 9 . A comparative analysis of power, latency and throughput has been implemented using XY routing algorithm and Odd even routing algorithm for 3 x3 mesh network 10 using packet switching technique. Network power consumption was more in OE algorithm when compared to latency and Throughput. Various performance metrics Average Latency, Energy, Message throughput and area are evaluated for different topologies and a comparative analysis was performed over the different performance metrics 11 . Average Latency was evaluated by Latency L i for a given message i is given by (1) and the average latency is given by (2) Energy was evaluated by Energy per flit per hop: The energy dissipated in transporting a packet consisting of n flit over h hops was evaluated by (4) Message Throughput was evaluated as (5) Figure 1 shows an example 3 x 3 LS-NoC in a 2-Dimensional mesh topology, each is identified by its coordinates (x,y). Each block consists of an identical processing element(PE) or customized IP block.

Design of LS-NoC for 3 x 3 Mesh Network
The Router consists of 5 ports, local, east, west, north and south ports. The local port connects to the PE. The Router is connected to the processing element by network interface. The connectivity between each block is bidirectional. Each block is identified by labels. Labels are sent through dedicated set of links with the data. The label on the data bus is used to identify the intended outgoing port by the routing Table 1 12 . Labels can be reused across sources. Label reuse may cause collision sharing the link. Label swapping reassigns unused labels to avert label collisions. When a data from one PE block has to be communicated with another, they hop by labels which are assigned in the routing table.

Motivation
Due to technology scaling into nanometer scale, power is becoming a dominant factor in overall parameters in Network on chip. Input data which has to be transferred from source node to destination node may be redundant and the power can be optimized in such scenarios. In this paper we have implemented predictor technique involves an 8-bit shift register which is included in the input register as a predictor to indicate whether the data is a new data or repeated data based on which the label and data are concatenated for further communication. By this technique, the power consumption in the network is optimized.

LS-pred Router Design in 2D Network on Chip
The combinational circuit for a single input port to single output port in a label switched-predictor (LS-pred) router is shown in Figure 2. It consists of two blocks, INPUT AND OUTPUT port. The input port consists of input register, Forward Control Block (FCB), multiplexers, FIFO block and Routing table. The output port consists of Arbiter, multiplexer and Output register.

Input Port
Input register: When a data enters an input port of a router, the router assigns the label to the data. In this paper, label is implemented with 4 bit and data is implemented with 8 bit. It is activated on positive edge clock.
Based on the valid signal true condition, the input register concatenates the data and label. Based on the destination block, the hops are decided and labels are assigned for each hop in the routing table.

FIFO:
The concatenated data from the input register block is written into FIFO whose depth is 16 starting from address 00000 to 01111. Fifth bit in FIFO address is used to indicate that the FIFO is full and the data is sent to another input port where FIFO is free.
FCB: The forward control block handles FIFO pointer and controls flow control signal of the corresponding input port. It has a pause out signal to indicate that the FIFO is full and directed to next router.
Multiplexer: The multiplexer is activated by the selection line which is an output of FCB. When the FCB_out is low, data from the input register is considered for next hop. If the FCB_out is high,data from FIFO is read and considered for next hop.
Routing Table: The routing table in LS-pred router has input label direction bits and new label as shown in Table 1. A bit corresponding to output port is set if the label to be routed is to exit the router from that output port. Multiple bits can be set corresponding to the output port which enables multicast and broadcast. A new label field is maintained to enable label swapping.
The size of routing table at each input port of the LS-pred router is entries where lw is the label width.

Output Port
Arbiter: It main function is to keep the updated status of the free output ports. It receives valid signal from routing table and check whether the output port is free and allows the input data to be transmitted through multiplexer by the selection signal from arbiter.
Output Register: The data from multiplexer sends only the data from the ouptut port by removing the label.

Maxflow Fulkerson Routing Algorithm:
In this algorithm, a capacity of 3 links is provided for each node in all possible directions. When the capacity of the source node is compared to choose the path or pipe. If it is equal, upper path is chosen, if not other path is chosen as shown in Figure 3. It consists of nine nodes a0 to a8. Input is considered to be given in a0 and destination in a8 node. 6

Establishment of a Pipe
A pipe is the link between the source and destination nodes. The routing table is configured in the routers according to the path of the pipe to complete the pipe establishment. During pipe establishment, fault tolerance is taken into consideration by the link status.
A pipe is a triplet (S,D,R) where S and D are source and destination nodes respectively and R is set of intermediate routers connecting S and D.A source and destination can have multiple pipes with varying set of intermediate routers maintaining the throughput.

Labels
A label is assigned for each source node S and it uniquely identifies the pipe and intended destination D even though the label may change in between the pipe establishment. Labels can be reused across sources. Using labels, a pipe can be represented as set of S,l 0, l 1, l 2 .....l h-1 ,D where the pipe connects source S to destination D through h routers. l 0 is the label connected to Router R0 and label l 1 is the label connected to Router R1. Router R0 connects to source and Router R h-1 connects to destination D. Label switching offers a low-overhead solution by reducing meta-data in packets. Smaller labels result in simpler routing tables and lesser logic in the router. Using labels instead of node ids decreases routing table sizes.

Label Swapping
Advantage of Label switching NoC(LS-NoC) is provision of guaranteed throughput between nodes. As the node increases, probability of label collision increases. There occurs a routing table entry clash when label collision occurs in the link. From Figure 3(a), Consider two routers R0 and R1. Pipes north and west ports of Router 0 have label 0. Let us consider both pipes leave south port to reach north port of Router 1. As each input port has a separate routing table, there is no label If already label 1 is used, then both the pipes having label 1 will not be able to pass through north port of Router 1. Label swapping reassigns labels to the conflicting pipes using available label space at the next router Vol 12 (2) | January 2019 | www.indjst.org as in Figure 3(b). Now two different labels 2 and three are assigned and sent to free output port south and east respectively 12 .

Experimental Results
The LS -NoC Router and LS-pred Router for 3 x3 mesh network were coded in Verilog language and the functional verification of verilog design were done using Xilinx software with ISIM simulator. The Power consumption by the Label switched router and the power consumption for LS-pred technique is 177.14mW and for LS-NoC is 155.85mW as shown in Figure 5(c)There is an improvement of 12% in power. Other parameters like Time consumption for circuit implementation for LS-NoC is 4.839ns and LS-pred NoC is 4.249ns as in Figure 5(a). The area in terms of number of occupied slices is 94 with 9% utilization for LS-NoC router and number of occupied slices is 122 with12% utilization for our proposed technique as in Figure 5(b) .

Conclusion and Future Scope
In this paper, Label switched router with the predictor technique is designed and implemented. The power is optimized by 20mW, Latency is reduced by 0.6ns and Vol 12 (2) | January 2019 | www.indjst.org area overhead is 3% . Further this work can be motivated to improve Power and can be implemented for streaming applications in cadence tool with 45nm technology to optimize area and power.