A comprehensive review of parallel concatenation of LDPC code techniques

Objective: In the code theory, parallel concatenation of codes becomes more popular after the introduction of turbo codes. In recent years, the Low Density Parity Check (LDPC) code has found remarkable advancement and has seen them outshine turbo codes in terms of performance especially in the error floor and higher code rate. The main objective of this paper is to address the various techniques of a parallel concatenation of LDPC code in code theory. Method/Finding: To reduce the complexity of encoding and decoding for longer codes various parallel concatenation of LDPC coding methods were introduced and the performance was compared with other work. Novelty: When a longer block length is used, the parallel LDPC decoder is suffered from the complexity of prohibitive implementation. To overcome this issue and to achieve the best performance for longer codes, the different methods for parallel concatenation of LDPC codes were introduced with reduced complexity. This will helps to break the long and complex LDPC code into less complex and smaller LDPC to distribute the decoding and encoding load. Also, this will provides scalability and scope for improving the performance of LDPC codes in practical delay-sensitive and energy-aware applications.


Introduction
The Low-density parity-check (LDPC) codes were first introduced by Robert Gallager in his research work in 1962 (1)(2)(3) . This introduced code will fit in a class of linear block code. But due to implementation complexity, LDPC codes were not suitable for many applications. Later, in 1995 these codes are modified by D. MacKay and R. Neal after the discovery of turbo codes by Berrou in 1993. In their work, prove that LDPC codes gain good performance in code theory. After modified in LDPC codes, it provides relatively low decoding complexity and a high degree of parallelism, hence it becomes a more attractive solution for error correction of codes. However, encoding complexity https://www.indjst.org/ in LDPC codes increases with block length and for longer block length it will become relatively high. On the other hand, the decoder of fully parallel LDPC is suffering from the complexity in implementation of the longer block length. The insensitive application may limit the use of LDPC codes. Therefore, LDPC codes with the parallel concatenation of different methods were introduced for longer codes to get better performance. Also, parallel concatenation reduces the encoding complexity for long LDPC codes.
Code concatenation of binary irregular LDPC codes is studded before (3)(4)(5) . Original parallel concatenated LDPC codes, called Parallel Concatenated Gallager Codes (PCGCs), were proposed in (5) in which they showed how different component codes with different parameters affect the overall performance in a Gaussian Channel. They restricted their description of PCGC to rate 1/3 codes constructed by combining two rates 1/2 binary irregular LDPC codes without an interleaver.
The authors in (6) showed that the interleaver is not needed for their proposed code. Also, they predicted that the conclusions are easily extended to the case where three or more codes are used, as introduced in (7) in which a serial concatenated irregular LDPC codes, without interleaver, is introduced. In (8) a PCGC was modified to use an interleaver. The interleaver swaps the element positions of the code; i.e.it changes the weight distribution of the code. Thus, it helps increase the minimum distance of the code. Also, the authors in (9) showed that high performance could be obtained, while fewer requirements of iterations, with a serial concatenated of two binary irregular LDPC codes, with an interleaver. In (8) non-binary LDPC codes were applied in a serial concatenation scheme with an interleaver. The authors in (9) showed that the proposed code scheme outperforms a single nonbinary LDPC code. A serial concatenation of nonbinary LDPC codes is proposed in (9) . In this paper, we present the different methods of LDPC codes with parallel concatenation.
The rest of this paper is organized as follows: Overview about Parallel Concatenated Gallager Codes (PCGC) is presented in section II, followed by PCGC structured as presented in section III. PCGC with the sets of two source bits, is presented in section IV. Single encoder PCGC is presented in section VI. The modified PCGC is presented in section IV. PCGC with interleaver is presented in section VII. At last conclusion is presented in section VIII.

Conventional PCGC
The Parallel Concatenated Gallager Codes (PCGC) is the class of concatenated codes (4) (5) . It allows to concatenate the two LDPC encoders in parallel without using an interleaver between them. The PCGC conventional with concatenated codeword and encoder structure is shown in Figure 1. The idea behind the development of PCGCs was to include the principle of turboencoding in LDPC codes. In the long LDPC code, the complexity in decoding and encoding is divided into lower complexity steps of decoding and encoding. Also, at the same time maintain the flow of information between decoder components and reduces the information loss during the decoding steps. The component codes are selected based on the Mean Column Weight (MCW) of the code's parity-check matrix. MCW is the average column weight over all columns in the parity check matrix. It has been found that by choosing low MCW code as the first component code and high MCW code as the second component code, performance is improved. Figure 2 shows the effect of MCW on the extrinsic information in PCGC conventional encoder for the various signal-to-noise range.  The PCGCs decoding procedure is very similar to the turbo code decoding and no interleave between decoder components. The Conventional PCGC decoder is shown in Figure 3. A posteriori probability is computed by each component of LPDC decoder during the process of decoding as per (6) using a modified sum-product algorithm (3) . LDPC codes decoding is https://www.indjst.org/ an iterative process and this process will exchange the information between the decoder components. The LPDC decoder component performs the algorithm of the sum-product in one complete cycle, which is referred to as local iteration. Any information passing between decoder components, the local iterations must be performed by decoder components, and this process called super iteration. The information exchange process between decoder components remains to continue till valid codeword converges by both decoders. In the latter case, the output from the second component decoder is declared as the best estimate of the transmitted sequence. In conventional PCGC procedure, there are some limitations are reported. In each decoder component after performing a fixed number of local iterations, the reliability i.e. log-likelihood ratio magnitudes of the systematic message bits are increased, so it becomes difficult for the next component decoder to correct a maximum number of errors. Hence, coding gain provided by concatenation of codes cannot be achieved. Further, local iterations perform by one decoder component, at the same time other decoder component remains idle. After the fixed numbers of local iterations, information exchange may take place. Therefore, in conventional PCGC delay in the decoding process is more. An "extensive computer search" was required to select the component codes with different MCWs. As such, the efficacy of the scheme depends mainly on the complementary behaviors of the individual LDPC codes.

Multiple PCGC
The generalization of PCGCs codes are from the Multiple Parallel Concatenated Gallager Codes [MPCGC] (7) . The encoder for MPCGC is shown in Figure 4. Multiple LDPC encoders are connected in parallel in the structure of MPCGC instead of only two encoder components in conventional PCGC. The LPDC component parity check matrices are constructed based on the MCW and also make sure that in parity check matrices weights of the row must be uniform. In order to reduce the complexity of MPCGC decoding and encoding, divide the long LPDC code into small subcodes, without compromising in performance. The encoding complexity of conventional LDPC code is quadratic to the block length which is comparatively higher than the sum of the complexities of the individual shorter component codes making up the MPCGC.
https://www.indjst.org/ In terms of code parameters, the architecture of MPCGC has got flexibility in the greater potential to achieve the different code rates using different subsets of component codes. In order to adapt to different channel conditions, it is also possible to switch to different decoder configurations. This makes the architecture a desirable one for various applications.
It is observed that, with low MCWs, LDPC codes outperform, and with high MCWs at low to moderate SNR, while they exhibit low-grade performance at high SNR. (8) . Based on this statement MPCGC is designed to combine the codes of MCW LPDC with low strength to a region of moderate SNR. The good design of MPCGC depends mainly on the choosing component codes in design parameters. An "extensive computer search" option was used to obtain relatively higher, moderate, and low LDPC codes.  The serial decoder structure of the MPCGC is shown in Figure 6. The process of decoding in the serial decoder of MPCGC is similar to conventional PCGC, except that serial combinations of M component decoder are used. This serial process will lead to more delay in producing the required results. Therefore, parallel decoding process is presented in (9) and the corresponding structure of the MPCGC parallel decoder is shown in Figure 7. The decoding delay in the parallel decoding process is significantly reduced by performing an operation in parallel by all M decoders. From the analysis, it was found that at low SNR it is beneficial to increase the number of local iterations and decrease the number of super iterations. While at moderate to high SNR, a few local iterations and more super iterations are recommended (10) . The main drawback of the parallel decoding process was that combination of high, medium, and low MCW LDPC were essential as component codes. Therefore, this scheme's efficiency is rested with the individual LDPC codes complementary behavior.

PCGC structure
The study shows that excellent bit-error rate (BER) performance can be obtained for large block lengths with the randomly constructed LPDC codes. But in hardware implementation, the memory required for non-zero elements representation for random matrix becomes more challenging. For the purpose of encoding and decoding the structured LDPC codes provides simpler implementations. The effectiveness of the decoding algorithms for LPDC codes is characterized by the parameter called Girth and it regulates the number of independent iterations. In the Tanner graph, the shortest cycle length is the parity check matrix or Girth of the LDPC code. The performance of the LDPC decoders degrades in the Tanner graph due to the short cycle having low girth because iterative decoding affects the exchange of extrinsic information (11) . Hence, LDPC code design with a large girth is of great interest. The construction methods of structured LDPC, the main objective is to maintain girth as high as possible in design LDPC codes. Based on Graphical models and Cayley graphs, the performance of half-rate, two structured, and parallel concatenated LDPC codes are evaluated (12) . The performance is obtained through well-structured procedures of two concatenated LDPC codes and compared with the same length of two randomly created LDPC components of conventional PCGC.
This method is less attractive for long block lengths because, as the length of the individual codes increases the SNR value at which they will be outperformed by the parallel concatenated code is increased. Therefore, it can be concluded that this method is a very good option only for short block lengths and high SNR values. Also, it could replace a single LDPC code of the same length.

Two sets of source bits of PCGC
The two sets of source bits structure gas got better performance compared to conventional PCGC because it depends on the architecture of concatenation (13) . The code selection of distinct components is not required for this method and used two different parity check matrices (14)(15)(16) . The data interleaving between decoders component needs to be ensured, because of using parity check matrices of two different. At the side of the transmission, this structure has two copies of the source message bits, therefore, channel information is needed not be removed from the posterior metrics. Also, during extrinsic information computing, this information was shared between the two decoder components and prior information was removed. The two copies of source message bits are transmitted because one decoder component received information is different, hence source bit information received independently by the other decoder component.
First, at the transmitting side duplicate the message bits and then different generator matrices are used to encode the set message bits to develop the set of two codewords, which are then concatenated. The structure of the concatenated codeword at the transmitting side is shown in Figure 8. The resulting code is modulated further using binary phase-shift keying (BPSK) and is then sent through the additive white Gaussian noise (AWGN) channel. The PCGC decoder architecture with source bits of two sets is shown in Figure 9. At the side of the receiving link, the received packet of the first set is assigned to the first decoder, and vectors of the second set are assigned to the second decoder.
https://www.indjst.org/ The extrinsic information is shared between the systematic message bits of the two-component decoders.

Single encoder PCGC
Only a single component LDPC code with MCW >2.5 is used in a single encoder PCGC (17) . The message bits corresponding same parity bits are transmitted twice. A single encoder PCGC structure with concatenated codeword and encoder is shown in Figure 10. In this structure complexity of encoding is significantly reduced and is less compared to the other parallel concatenation techniques. The decoder for PCGC with a single encoder is shown in Figure 11. At the side of the decoding two-component decoders are connected in parallel by using check nodes called the interconnecting check node. Each interconnecting check node corresponds to one parity-check equation of the corresponding parity-check matrix and an edge joins an interconnecting check node and a bit node of each component decoder if that particular bit is involved in the corresponding parity-check equation. In the first step of the super iteration, both component decoders will independently and simultaneously perform one local iteration of the iterative decoding i.e. the intrinsic information from bit nodes are transferred to the check nodes of the corresponding decoder, and bit nodes are updated based on the extrinsic message available from the check nodes.

Modified PCGC
In the modified PCGC structure, two different sets of parity bits are transmitted for the same information block (18)(19)(20)(21) . The zero syndrome criteria satisfy by parity bits of one set and non-zero syndrome criteria satisfy another set of parity bits. At the starting of the message block a '1' is added to generate the non-zero syndrome parity set from the second decoder. The performance can be slightly increased compared to PCGC with a single encoder, due to the increase in minimum distance for the carefully chosen syndrome vector, without any change in the complexity of the decoder.
The modified PCGC structure with concatenated codeword and encoder is shown in Figure 12.
Encoder for modified PCGC consists of a pair of simple LDPC encoders, one for obtaining code word that satisfies zero syndrome criteria and the other for obtaining code word that satisfies non-zero syndrome criteria. For decoding, message-passing algorithm needed slight modifications to accommodate the non-zero syndrome criteria. Bit flipping algorithm is used for decoding. In the first step of the super iteration, the decoder will perform a local iteration with code word satisfying zero syndrome criteria. In the second step, the decoder will perform a local iteration with code word satisfying non-zero syndrome criteria. In the third step, intrinsic information is updated using the extrinsic information that each bit node received from the check nodes. Bit nodes corresponding to parity bits and systematic message bits are updated separately. This process of super iteration continues until one of the decoders converges to a valid codeword or a maximum number of super iterations are over.
The structure of the modified PCGC also has some limitations. The extrinsic messages received by each bit node during a super iteration are not fully independent and are correlated, which worsens the performance of the iterative decoding. It is because if there is a bit in error, then there is a possibility that the messages received by the bit nodes from the check nodes will be contradictory. So as to have improved performance, all the information received by a bit node has to be independent. Other demerits include its applicability only in Binary Symmetric Channel (BSC) and random selection of non-zero syndrome vector (22) . https://www.indjst.org/

PCGC with Interleaver
An interleaver is used, to make PCGC to be independent, when nodes received the extrinsic messages bit by bit. In PCGC with interleaver, nodes will receive the messages independently, because the parity equations will become entirely different by using an interleaver and this will make to improve the performance (23)(24)(25)(26) . An interleaver PCGC structure with concatenated codeword and encoder is shown in Figure 13. In this interleaver structure message bits of parity, sets are provided by the first encoder, and message bits for interleaved parity set are provided by the second encoder. Here syndrome criteria are satisfied by both parity bits. The message bits randomly permutes by using a random interleaver. Because of the use of interleaver, the complexity of the structure is increased and this becomes the main drawback of this structure. The decoder for PCGC with interleaver is shown in Figure 14. A simple LDPC decoder is used to decode both the codeword. It also consists of a register block, a random interleaver, and a de-interleaver.
In the structure of decoder for PCGC with interleaver contains two registers set Reg1 and Reg 2 to store code word. Also, total extrinsic information and intrinsic information is stored in these registers of a corresponding codeword of each bit. Decoder outputs the total extrinsic message for each bit after a local iteration. Decoder is connected directly as well as through interleaver and de-interleaver to register block. De-interleaver does the reverse operation of interleaver. Complexity and decoding delay will be more compared with other techniques due to the presence of interleaver and de-interleaver. Decoding delay can be reduced by using parallel set of decoders.
The algorithm called Bit flipping is used for the purpose of decoding. In the super iteration first step decoder performs a local iteration with the first codeword and the total extrinsic information is sent back to the Reg1 of the register block. The intrinsic information of message bits and parity bits are taken directly to decoder and total extrinsic information of the message bits and parity bits are transferred directly to the Reg1. In the second step of the super iteration, decoder performs a local iteration with permuted codeword and the total extrinsic information is sent back to the Reg2 of the register. Here the intrinsic information of message bits is taken to decoder through interleaver and that of parity bits are taken directly. While the total extrinsic message corresponding to parity bits is sent back through the direct path and the total extrinsic message corresponding to message bits is sent back through the de-interleaver. In the third step of the super iteration, intrinsic information in the register block (Reg1 and Reg2) is updated using the total extrinsic information received from the decoder. But the parity bits and the systematic message bits are updated separately. This process of super iteration continues until one of the decoder converges to a valid codeword or a maximum number of super iterations are over.
The performance of PCGC with interleaver is compared with modified PCGC in BSC (27)(28)(29)(30) and results show PCGC with interleaver has better performance. In an AWGN channel, the performance of PCGC with interleaver is increased by a factor, which is the indirect increase of column-weight. This is because there is an indirect increase in the number of check nodes connected to each bit node without changing the number of bit nodes connected to each check node. Hence, this scheme of concatenation provides the better performance as good a dedicated LDPC but with increased complexity due to deployment of an interleaver. The implementation problem associated with a longer block length iterative decoder is solved with PCGC with interleaver without much compromise in the performance.

Conclusion
The parallel concatenation of multiple LDPC codes is evaluated based on different decoding techniques. By breaking a long, high-complexity single LDPC code into multiple lower-complexity LDPC codes and appropriately exchanging extrinsic information among the component decoders, it is shown that a competitive BER performance can be achieved while still maintaining the low complexity and flexibility attributes of these strategies. By optimizing the construction of the component LDPC codes, the performance of MPCGC can be significantly improved and the complexity can be further reduced. MPCGC strategies can be attractive in applications where an on-line trade-off of resource vs coding performance is required; these are also advocated for the emerging multicore processing platform.
This study provides complete fundamental details of the design and implementation of LDPC codes by considering forward error corrections, error floor, and higher code rate for performance analysis. In recent years, the Low Density Parity Check (LDPC) code has found remarkable advancement and has seen them outshine turbo codes in terms of performance especially in the error floor and higher code rate. The performance and other inherent advantages make the LDPC codes the best choice for error-correction. However, complexity in the implementation of longer block length code is the main shortcoming of this code. In many practical applications, the LDPC codes outperform the turbo codes, therefore over turbo codes, the LPDC codes are not preferred because of this drawback. In terms of performance, parallel concatenation with interleaver structure is a good choice than the existing structure. However, the presence of interleaver makes this structure more complex.