0% found this document useful (0 votes)
105 views9 pages

A High-Throughput LDPC Decoder Architecture With Rate Compatibility

This paper presents a high-throughput decoder architecture for rate-compatible (RC) low-density parity-check (LDPC) codes. It supports arbitrary code rates between the rate of mother code and 1. Puncturing techniques are applied to produce different rates for quasi-cyclic (QC) LDPC codes with dual-diagonal parity structure.

Uploaded by

Vimala Priya
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views9 pages

A High-Throughput LDPC Decoder Architecture With Rate Compatibility

This paper presents a high-throughput decoder architecture for rate-compatible (RC) low-density parity-check (LDPC) codes. It supports arbitrary code rates between the rate of mother code and 1. Puncturing techniques are applied to produce different rates for quasi-cyclic (QC) LDPC codes with dual-diagonal parity structure.

Uploaded by

Vimala Priya
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO.

4, APRIL 2011

839

A High-Throughput LDPC Decoder Architecture With Rate Compatibility


Kai Zhang, Xinming Huang, Senior Member, IEEE, and Zhongfeng Wang, Senior Member, IEEE
AbstractThis paper presents a high-throughput decoder architecture for rate-compatible (RC) low-density parity-check (LDPC) codes which supports arbitrary code rates between the rate of mother code and 1. Puncturing techniques are applied to produce different rates for quasi-cyclic (QC) LDPC codes with dual-diagonal parity structure. Simulation results show that our selected puncturing scheme only introduces the BER performance degradation of less than 0.2 dB, compared with the dedicated codes for different rates specied in the IEEE 802.16e (WiMax) standard. Subsequently, parallel layered decoding architecture (PLDA) is employed for high-throughput decoder design. While the original PLDA is lack of rate exibility, the problem is solved gracefully by incorporating the puncturing scheme. As a case study, an RC-LDPC decoder based on the rate-1/2 WiMax LDPC code is implemented in the CMOS 65-nm process. The clock frequency is 1.1 GHz, and the synthesis core area is 1.96 mm2 . The decoder can achieve an input throughput of 1.28 Gb/s at ten iterations and supports any rate between 1/2 and 1. Index TermsLDPC decoder, puncturing scheme, rate compatibility, VLSI design, WiMax.

I. INTRODUCTION OW-DENSITY parity-check (LDPC) codes [1] have been investigated extensively in recent years because of their powerful error performance and inherent parallelism for hardware implementation. They have been adopted as optional forward error correction (FEC) codes in the emerging communication standards such as IEEE 802.11n, 802.16e (WiMax), and DVB-S2. Random LDPC codes [2], [3], although with error performance near Shannon-limit, they are not practical due to some drawbacks such as high encoding complexity, large memory requirement for the random parity check matrix, and the complicated interconnect between check node units (CNUs) and variable node units (VNUs). Structured codes such as quasi-cyclic (QC) LDPC codes [4], [5] have been proposed to use simple and regular connections between VNUs and CNUs, while still maintaining comparable error performances to those of random codes. In [5], a special class of QC LDPC codes with dual-diagonal parity structure was proposed which guarantees linear encoding complexity regardless of the size of the parity check matrix. This type of
Manuscript received November 11, 2009; revised June 18, 2010; accepted September 26, 2010. Date of publication November 18, 2010; date of current version March 30, 2011. This work was supported in part by the National Science Foundation under Grants ECS-0725522. This paper was recommended by Associate Editor G. M. Maggio. K. Zhang and X. Huang are with the Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA (e-mail: kzhang@wpi.edu; xhuang@wpi.edu). Z. Wang is with Broadcom Corp., Irvine, CA 92617 USA (e-mail: zfwang@broadcom.com). Digital Object Identier 10.1109/TCSI.2010.2089551

structured QC LDPC codes has become popular in FEC for wireless communications such as the WiMax systems. Recently, there is a growing interest in rate-compatible (RC) LDPC codes and their applications. On time-varying channels, it is desirable to adjust the FEC code rate according to the channel state information (CSI). Rate compatibility is also desirable for communications in the type-II hybrid automatic-repeat-request (ARQ) protocols [6]. The concept of RC LDPC codes was rst proposed in [7][9] by puncturing the parity bits of a low-rate randomly constructed LDPC code (called the mother code). The punctured codes showed good error performance, but the complicated puncturing algorithm and large block size were not practical for hardware implementation. Later, nite-length puncturing patterns were proposed in [10] by introducing the denition of -step recovery ( -SR) variable node. Based on the -SR theory, several studies were presented on puncturing schemes for QC LDPC codes with dual-diagonal parity structure [11], [12]. Nevertheless, the hardware design of RC LDPC decoder has not yet been well studied. From the hardware implementation prospective, there are many existing research work on VLSI implementation of LDPC decoders for single-rate codes [13][17] and multi-rate codes [18][20]. With the increasing demand for high-data-rate wireless applications, high-throughput LDPC decoder architectures are in need. In our previous work [21], we proposed a parallel layered decoding architecture (PLDA) that can provide up to 2-Gb/s decoding throughput owing to the concurrent processing of all layers and the split critical path resulting in much faster clock speed. However, the PLDA architecture has a major drawback. Because it uses xed connections among layers, the custom designed decoder only ts one specic code. That provides the motivation for us to conduct further research to solve the exibility problem. In this paper, we investigate the puncturing schemes for rate-compatible LDPC codes and their hardware implementations. For QC LDPC codes with dual-diagonal parity structure, an efcient puncturing scheme is selected which includes the weight-3 vertical submatrix in the puncturing block. As a case study using the rate-1/2 WiMax LDPC code, we show that the selected puncturing scheme results in the bit-error-rate (BER) performance degradation of less than 0.2 dB compared with dedicated WiMax codes at four different code rates. Subsequently, we incorporate the puncturing scheme into the PLDA-based architecture for the design of a RC LDPC decoder. The decoder is implemented in the CMOS 65-nm process and can achieve an input throughput of 1.28 Gb/s at ten iterations. It supports any arbitrary rates between rate of the mother code and 1. The rest of this paper is organized as follows. Section II introduces the background of RC LDPC codes and a comparison of different puncturing schemes. The high-throughput RC LDPC

1549-8328/$26.00 2010 IEEE

840

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 4, APRIL 2011

decoder architecture is presented in Section III. Section IV gives the implementation results by a comparison with some existing WiMax LDPC decoders. Finally, conclusions and future work are presented in Section V. II. PUNCTURING SCHEMES FOR RATE-COMPATIBLE LDPC CODES A. Quasi-Cyclic LDPC Codes With Dual-Diagonal Parity Structure sparse Generally, LDPC codes can be described by an parity check matrix , in which most of the elements are 0s and only a few are 1s. denotes the number of parity check equations, which is the number of check nodes. is the block length, which is the number of variable nodes. QC LDPC codes are a special class of the LDPC codes with structured matrix which can be generated from an base matrix :

puncturing schemes and decoder design for rate-compatible LDPC codes with such dual-diagonal parity structures. B. Rate-Compatible LDPC Codes Rate compatibility can be achieved by puncturing the parity bits of the mother code. If belief-propagation (BP) algorithm [1] is employed for decoding, puncturing can be carried out by simply setting the log-likelihood ratios (LLRs) of the punctured bits to zeros (in logarithmic domain). Several puncturing schemes have been proposed for randomly constructed LDPC codes [7][9] and for short-length LDPC codes [10]. The punctured codes, though at different code rates, are originated from the same mother code. Therefore, only one encoder and decoder is needed. A major advantage of using rate-compatible codes is to reduce the storage memory. For instance, the WiMax standard lists six different LDPC codes for four different rates . A typical WiMax LDPC encoder/decoder design [23][26] must store the parity-check matrices of all six codes. In addition, it also requires a complicated switching network that can change the connections among VNUs and CNUs for each different code. Rate-compatible codes avoid the storage overhead as well as the potential latency problem caused by the large switching network. In communication systems, rate-compatible codes are also suitable for the hybrid ARQ protocols [6], because the transmitter can add redundancies progressively by puncturing the mother code based on the CSI. Since rate-compatible codes have many advantages listed above, a prominent question is, Can they provide comparable error performance with the unpunctured codes? For fair comparison, the unpunctured codes are referred to dedicated codes with the same length of the punctured codes. Before presenting the detail puncturing schemes, we rst introduce the denition of the -SR variable node [10]. is called 1-step recoverA punctured variable node able (1-SR) if there is at least one connected check node , called survived check node, such that all other variable nodes connected to are not punctured except for . It is called 1-SR because such a punctured variable node can be recovered in one iteration on binary erasure channel (BEC). An example of 1-SR node is shown in Fig. 1(a). The punctured variable node is connected to two check nodes, i.e., node and . One of these two check nodes, node , has four neighboring variable and . Except for node , all other nodes, namely , and , are not punctured. Therefore, variable nodes, is called 1-SR node. variable node The denition of -SR node can be extended from 1-SR node such that at least one connected check node , called the sur-SR variable vived check node, contains one or more [10]. nodes while others are -SR node, where On BEC, a -SR node can be recovered in the th iteration. Obviously, a -SR node is more reliable than a -SR node if . An illustration of -SR node is shown in Fig. 1(b). C. Selected Puncturing Scheme Since a -SR node is more reliable than a -SR node if , we rst maximize the size of the 1-SR group in

. . .

. . .

..

. . .

(1)

Each nonzero element in the base matrix is a submatrix that can be expanded by circularly right-shifting an iden. The structure tity matrix with the shift value dened by of the parity check matrix makes it convenient to determine the locations of the nonzero elements. Random connections between CNUs and VNUs now become well regulated and easy to handle. In [5], a special class of systematic codes is dened based can be partion QC LDPC codes. The base matrix on the tioned into two parts, the systematic bits matrix on the right, such that left and the parity bits matrix . . , where . . can be further partitioned into two The parity bits matrix is a weigh-3 matrix, and the sections: the left most column is a dual-diagonal matrix: remaining columns

. . . . . . . . . . . . . . . . . . . . . . . .

(2) We refer such structure as dual-diagonal parity structure. The original purpose of this structure as in [5] was intended for fast encoding because it guarantees linear time encoding efciency. Consequently, the dual-diagonal parity structure has been adopted in many LDPC codes including those in the WiMax standard [22]. In WiMax codes, the secondary diagonal are also identity matrices. Then elements the connections between the parity bit nodes and the check bit nodes become zigzag edges. In this paper, we focus on the

ZHANG et al.: A HIGH-THROUGHPUT LDPC DECODER ARCHITECTURE WITH RATE COMPATIBILITY

841

Fig. 1. Description of 1-SR node and k -SR node. TABLE I THREE PUNCTURING SCHEMES FOR ACHIEVING RATE 2/3 FROM RATE 1/2 MOTHER CODE

Table I, has an impact on the error performance of the punctured code. Scheme 1 and scheme 3 both have more survived checks than scheme 2. Therefore, the BER performances of scheme 1 and scheme 3 are better than that of scheme 2. Scheme 1 is the puncturing scheme proposed in [12], in , is not punctured. In contrary, which the weight-3 block, i.e., scheme 3 selects the weight-3 block for puncturing. Simulation results in Fig. 3 show that scheme 3 has better error performance than scheme 1 over AWGN channels. This is largely because the punctured weight-3 variable nodes have more neighboring checks which can provide more information during the decoding iterations. Thus, puncturing the weight-3 block as in scheme 3 is recommended. Table II shows the selected puncturing scheme blocks are punctured, which corresponds to a when . Here, we name group of code rates of this group of rates as block rate. If the desired rate is between two consecutive block rates, we can puncture a block partially by taking some bits out of a block. In the Punc Blk Idx (PBI) column of Table II, the normal numbers indicate that the entire blocks are punctured and an italic number indicates the designated block which may be punctured partially if needed. The punctured bits within that block are selected based on the following procedure [12]. In order to disperse the punctured bits within a block, a special is generated recursively from as in (3) through sequence (4) and (5), where is the size of the submatrix: (3) (4) (5) is adjusted from Next, the actual punctured bit sequence based on the values of and using (6), where is the is the permutation value of and is row number of in the weight-3 submatrix from the permutation value of , and (2). For the selected rate-1/2 WiMax code, , if (6) if is a sequence with length of , and each element , indicates the column index of the bit to be punctured within the block. In practice, the sequence is computed and stored in a LUT. As soon as the number of punctured bits is calculated from the given rate, the indexes for the punctured blocks and the bits of the partially punctured block can be looked up instantly. More details will be discussed in the decoder implementation. Note that III. HIGH-THROUGHPUT RATE-COMPATIBLE LDPC DECODER ARCHITECTURE A. Summary of the Parallel Layered Decoding Architecture LDPC codes can be effectively decoded using belief-propagation (BP) algorithm [1]. Two phases of messages, check-tovariable (CTV) messages and variable-to-check (VTC) messages, are transmitted along the edges of Tanner graph to update each other iteratively. The min-sum algorithm and modied

order to minimize the error performance loss of the punctured codes. Then we try to maximize the size of the 2-SR group , , the and so on. In other words, for the groups punctured scheme will puncture the low indexed group rst. The puncturing procedure can be divided into two steps: 1) the punctured block selection and 2) the punctured bits selection inside a block. First, we will investigate how to select punctured blocks. As an example, we present an efcient puncturing scheme by puncturing the rate-1/2 2304-bit mother code from WiMax standard [22], as shown in Fig. 2. It includes 1152 parity bits which are partitioned into 12 blocks with the block size of 96 bits. Due to the zigzag pattern of the parity structure, the grouping of -SR node becomes easy to handle [11]. For example, if a rate-2/3 LDPC code is obtained by puncturing the mother code, half of the parity bits, i.e., six blocks (or 576 bits) are punctured. Table I shows three puncturing schemes in which the 6 punctured blocks are all 1-SR nodes. PBI denotes the punctured block index and SC denotes the number of survived checks corresponding to each punctured block. Despite of the same number of 1-SR nodes, the error performances of the three schemes still differ from each other, as shown in Fig. 3. The number of survived checks, as listed in

842

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 4, APRIL 2011

Fig. 2. Parity-check matrix for the selected rate-1/2 LDPC code in WiMax. TABLE II INDEX OF PUNCTURED BLOCKS AT DIFFERENT DESIRED RATES

coding throughput can be improved without any bit error performance loss. In BP algorithm, VTC updates do not start until all of the CTV messages are received, and vice versa. In the horizontal layered decoding algorithm, the updated CTV messages from the current layer are passed vertically to all layers below for the same variable node. In each iteration, the horizontal layers are processed sequentially from the top to the bottom layers. parity check The overall decoding procedure for an matrix with min-sum and layered decoding algorithm is summarized as follows: (7)

Fig. 3. BERs of the three punctured codes and the dedicated code at rate 2/3 over AWGN channels.

(8) (9) The details about the layer decoding algorithm can be referred to [13], [23], [24], [26], [30], and [31]. However, the traditional layered decoding algorithm processes layers in sequential

min-sum algorithm [27][29] have been introduced to reduce the complexity of CTV message updating. Layered decoding algorithm [13], [23], [24], [26], [30], [31] has been adopted to reduce the number of iterations by a factor of two, compared with the standard BP algorithm. Hence, the de-

ZHANG et al.: A HIGH-THROUGHPUT LDPC DECODER ARCHITECTURE WITH RATE COMPATIBILITY

843

Fig. 4. Offset-modied parity-check matrix for rate-1/2 LDPC code in 802.16e.

order, which results in a large decoding latency per iteration. A method of increasing parallelism inside a layer is proposed in [13], [26] but all layers are still processed in series. Although decoding throughput can be improved, this method introduces crossbar-based interconnection networks that increase the hardware complexity. In our previous work [21], a parallel layered decoding architecture (PLDA) is proposed which allows all layers to be processed in parallel. Each layer has an individual check node processing unit (CNU) which generates and sends updated messages and at the same time also receives the updated messages from other layers. Unlike the method proposed in [13] and [26], PLDA uses parallel processing among all layers and serial processing within each layer. In PLDA, the message passing routes among layers are based on their permutation values as in the parity-check matrix. Fig. 4 shows the values in the parity check of the rate 1/2 WiMax codes. For each vertical block, we rst sort the permutation values of all layers in descending order. Subsequently, we designate each layer to pass its message to another layer which has the next smaller permutation value in the same column. Finally, the layer with the smallest permutation value loops back and connects to the layer with the largest value. This message passing scheme guarantees the updated messages among all layers are processed progressively within each iteration. This designated message passing scheme works well, because the rows within each layer are processed sequentially, and the updated messages are passed only to the layer which will process the current column next. It is worth mentioning that we have tentatively added an offset to the permutation values for each layer as in Fig. 4, which is equivalent to show that the CNUs start to process from different rows (instead of already start from row 1). From the decoding prospective, changing the processing order in each layer does not affect the performance nor throughput of a decoder. In fact, the offset values are carefully selected such that the difference of the modied permutation values between any two layers is at least 5. It means that each layer (or CNU) has a time span of ve clock cycles to read, update, store, and then pass the message to the next connected layer. The main advantage is that we can design the CNUs, which are usually the critical path in decoder design, into the ve-stage pipeline architecture. This is called

the critical path splitting technique in [21], which reduces the latency of the critical path and thus improves the clock speed and decoding throughput. The major issue of the aforementioned PLDA design is that the concurrent message passing routes among all layers are xed and optimized for the specic code. As mentioned in Section I, rate compatible LDPC codes or at least multiple rates are desirable for communication systems on time-varying channels. Thus, we investigate various puncturing schemes and incorporate the rate compatibility into the PLDA design to provide the much needed exibility, without sacricing its advantage of high throughput. B. RC LDPC Decoder Design The overall architecture of the RC LDPC decoder is shown in Fig. 5. It consists of CNUs, LLR initialization block, APP memory banks, CTV memory banks, and bit decision units. It is similar to a regular PLDA design except for the initialization of the APP memory. Using the rate-1/2 WiMax code as matrix is divided into 12 layers, and an example, the entire each layer has a dedicated CNU. APP memory banks and CTV memory banks are used to store APP messages and CTV messages. Each nonzero element in the base parity check matrix corresponds to one APP memory unit and one CTV memory unit. Each APP memory exports APP messages to the CNU, and each CTV memory also exports CTV messages to the CNU. The CNU rst calculates the VTC messages [as indicated in (7)], then calculates the updated set of CTV and APP messages [as indicated in (8) and (9)], and nally imports the updated CTV message to CTV memory banks and the updated APP messages to the APP memory banks. Therefore, the CNU becomes the critical path of the PLDA which limits the maximum frequency. However, as indicated in Section III-A, an interval of ve clock cycles is available for the APP message passing from one layer to another if the offset-modied parity-check matrix in Fig. 4 is used. Considering the memory writing operation which will cost one clock cycle, a split CNU with four pipeline stages can be designed by inserting some registers to the original CNU, bringing in reduced critical path delay and improved maximum frequency.

844

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 4, APRIL 2011

Fig. 5. Overall architecture of a rate-compatible LDPC decoder.

Fig. 6. Architecture of LLR Initialization Block.

The puncturing scheme is implemented by patching the LLRs of the punctured parity bits to be 0s, which is called LLR initialization. The length of the actual received codeword is smaller than that of the original mother code because of the punctured bits. For the selected puncturing scheme, the punctured blocks and bits listed in Table II should be initialized to zeros. Fig. 6 shows the architecture of the LLR initialization block, including a group of comparators, a decoder, an address generator and a 12 12 lookup table (LUT). The LLRs from the channel are stored in assigned memory banks, and the others are set to be 0s. Based on the length of the punctured bits (n punc), the rate of the code can be deduced using a group of comparators and an adder. The thresholds of the comparators are set to be the multiples of the submatrix size , which is 96 for the mother code. Each comparator compares n punc with one

threshold and returns 1 if n punc is greater and 0 otherwise. In total, there are 11 comparison results. These results are sent to a decoder to determine the range of the punctured code. Here the decoder is simply composed of a group of adders which add all of the comparison results to get the idx rate signal. For example, if the length of the punctured bits is smaller than 96, then each comparator returns a 0, and the idx rate is therefore 0. This corresponds to row 1 in Table II, and the rate lies between 1/2 and 12/23. The address generator generates addresses for the memory banks, as well as the 12 12 LUT. Signal idx blk denotes which memory bank is being written with the LLRs at a time instant. The 12 12 LUT stores the enable signal for the memory banks to decide which memory should be written by LLRs at a time constant. Thus, the contents of the 12 12 LUT represent the puncturing bit selection as in Table II.

ZHANG et al.: A HIGH-THROUGHPUT LDPC DECODER ARCHITECTURE WITH RATE COMPATIBILITY

845

Fig. 7. Architecture of the address generator.

performance between the punctured codes and the dedicated codes at three different rates. Furthermore, the rate compatible LDPC decoder is developed based on the PLDA architecture and then implemented using standard cell ASIC design ow. A. Simulation Results for Punctured WiMax Codes The BER performance of a group of punctured LDPC codes are presented in Fig. 8. The rate of mother code is 1/2, and the code length is 2304 bits. Five different rates are generated using the selected punctured schemes, i.e., 3/5, 2/3, 3/4, 5/6, and 6/7. The corresponding number of punctured bits are 384, 576, 768, 922, and 960. Thus, the code length of the punctured codes are 1920, 1728, 1536, 1382, and 1344, respectively. To verify the selected puncturing scheme, dedicated LDPC codes at rate 2/3, 3/4, and 5/6 from WiMax standard are simulated to compare their performance with the punctured codes, also shown in Fig. 8. At each rate, a specic mode is selected from the 19 modes of each WiMax code to make the selected code lengths of the dedicated codes equal or similar to those of the punctured codes. The code lengths of the dedicated codes at rate 2/3, 3/4, and 5/6 are adjusted to 1728, 1536, and 1344, equivalent or similar to those of the corresponding punctured codes whose code lengths are 1728, 1536, and 1382. Fig. 8 shows that the BER of the punctured code is very close to the dedicated code, with less than 0.2-dB performance loss at BER . of B. Hardware Implementation Results In order to demonstrate the combined system performance of the selected rate compatible LDPC codes and the PLDA architecture, we implement the rate LDPC decoder in TSMC 65-nm technology with eight layers. We complete the synthesis and core area place and route using the standard Synopsys tools. Implementation results show that the decoder can operate at a maximum frequency of 1.1 GHz after synthesis, which corresponds to a constant input throughput of 1.28 Gb/s for all code rates. Fig. 9 shows the layout view of the decoder with the core area of 1.96 mm and the logic density of 70%. Read/write memories are generated by the Synopsys DesignWare tool and thus attened during synthesis and the place and route process. The estimated power consumption of the decoder core is 908 mW running at a 1.1-GHz clock frequency.

Fig. 8. BERs of the punctured LDPC codes over AWGN channels.

The detail design of the address generator is shown in Fig. 7. The core component is an accumulator which accumulates every clock cycle and outputs two signals idx blk and addr mem, one for the 12 12 LUT to select the enable signal and another for the memory banks as the write address. However, it can be observed from Table II that for every rate range, only one vertical block is partially punctured, and the rest are entirely punctured. In other words, the clock accumulator will accumulate every 96 cycles for a fully punctured block. While for a partially punctured block, it accumulates at a smaller number of cycles determined by the number of punctured bits in this block. Therefore, an LUT is used here to store the index of the partially punctured block based on Table II, in order to indicate the clock accumulator to accumulate at a different step when meeting with the partially punctured block. The content of the LUT is exactly the italic numbers in Table II. IV. EXPERIMENTAL RESULTS For experimental study, we implement the selected puncturing scheme for the WiMax LDPC codes. We choose the rate-1/2 LDPC code in the WiMax standard as the mother code. Numerical simulations are performed to verify the BER

846

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 4, APRIL 2011

TABLE III OVERALL COMPARISON BETWEEN PROPOSED DECODER AND OTHER EXISTING WIMAX LDPC DECODERS

Fig. 9. Layout of the proposed decoder chip.

We also compare the proposed RC LDPC decoder design with several other existing WiMax LDPC decoder implementations, as listed in Table III. For fair comparison, we rst scale all the decoders to 65-nm technology node. Then, a metric called the throughput-to-area ratio (TAR) is introduced to show how much throughput a decoder can achieve per area unit. Table III shows that the proposed decoder can provide higher throughput using a smaller chip area. More interestingly, the proposed decoder design can provide any arbitrary code rate between 1/2 and 1 as opposed to only four selected rates in the existing WiMax LDPC decoders. V. CONCLUSION This paper presents the algorithm, design, and implementation of a rate-compatible LDPC decoder. Using the selected puncturing scheme, the BER performances of the punctured codes are comparable with the dedicated codes with less than 0.2-dB performance degradation in simulation results. In addition, rate compatible LDPC codes provide an ideal solution to the exibility problem of the parallel layered decoding architecture. Considering the WiMax standard, a rate compatible LDPC decoder is designed using the rate-1/2 code as mother code. The hardware implementation shows the maximum input throughput of 1.28 Gb/s. Comparing to a multirate LDPC decoder, the rate compatible design can eliminate the memory to store multiple codes and the network to switch among them. Therefore, rate-compatible LDPC coder are highly desirable for advanced wireless communication systems. REFERENCES
[1] R. Gallager, Low-density parity-check codes, IRE Trans. Inf. Theory, vol. 8, no. 1, pp. 2128, 1962.

[2] M. Luby, M. Mitzenmacher, M. Shokrollahi, and D. Spielman, Improved low-density parity-check codes using irregular graphs, IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 585598, Feb. 2001. [3] T. Richardson, M. Shokrollahi, and R. Urbanke, Design of capacityapproaching irregular low-density parity-check codes, IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 619637, Feb. 2001. [4] M. Fossorier, Quasicyclic low-density parity-check codes from circulant permutation matrices, IEEE Trans. Inf. Theory, vol. 50, no. 8, pp. 17881793, Aug. 2004. [5] S. Myung, K. Yang, and J. Kim, Quasi-cyclic LDPC codes for fast encoding, IEEE Trans. Inf. Theory, vol. 51, no. 8, pp. 28942901, Aug. 2005. [6] J. Li and K. Narayanan, Rate-compatible low density parity check codes for capacity-approaching ARQ schemes in packet data communications, in Proc. Int. Conf. Comm., Internet, Info. Tech.(CIIT), Nov. 2002, pp. 201206. [7] J. Ha and S. McLaughlin, Optimal puncturing distributions for ratecompatible low-density parity-check codes, in Proc. Inter. Symp. Inform. Theory (ISIT), Jun. 2003, p. 233. [8] J. Ha, J. Kim, and S. McLaughlin, Puncturing for nite length low-density parity-check codes, in Proc. Inter. Symp. Inform. Theory (ISIT), Jun. 2004, p. 151. [9] J. Ha, J. Kim, and S. McLaughlin, Rate-compatible puncturing of lowdensity parity-check codes, IEEE Trans. Inf. Theory, vol. 50, no. 11, pp. 28242836, Nov. 2004. [10] J. Ha, J. Kim, D. Klinc, and S. McLaughlin, Rate-compatible punctured low-density parity-check codes with short block lengths, IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 728738, Feb. 2006. [11] E. Choi, S. Suh, and J. Kim, Rate-compatible puncturing for low-density parity-check codes with dual-diagonal parity structure, in Proc. IEEE Symp. Person. Indoor Mobile Radio Commun. (PIMRC), Sep. 2005, vol. 4, pp. 26422646. [12] H. Park, K. Kim, D. Kim, and K. Whang, Structured puncturing for rate-compatible B-LDPC codes with dual-diagonal parity structure, IEEE Trans. Wireless Commun., vol. 7, no. 10, pp. 36923696, Oct. 2008. [13] M. Mansour and N. Shanbhag, High-throughput LDPC decoders, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 6, pp. 976996, Dec. 2003. [14] H. Zhong and T. Zhang, Block-LDPC: A practical LDPC coding system design approach, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 4, pp. 766775, Apr. 2005. [15] Y. Dai, N. Chen, and Z. Yan, Memory efcient decoder architectures for quasi-cyclic LDPC codes, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 9, pp. 28982911, Oct. 2008. [16] L. Liu and C. Shi, Sliced message passing: High throughput overlapped decoding of high-rate low-density parity-check codes, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 11, pp. 36973710, Dec. 2008. [17] O. Daesun and K. Parhi, Min-sum decoder architectures with reduced word length for LDPC codes, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 1, pp. 105115, Jan. 2010. [18] G. Masera, F. Quaglio, and F. Vacca, Implementation of a exible LDPC decoder, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 6, pp. 542546, Jun. 2007. [19] C. Liu, C. Lin, S. Yen, C. Chen, H. Chang, C. Lee, Y. Hsu, and S. Jou, Design of a multimode QC-LDPC decoder based on shift-routing network, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 9, pp. 734738, Sep. 2009. [20] C. Zhang, Z. Wang, J. Sha, L. Li, and J. Lin, Flexible LDPC decoder design for multigigabit-per-second applications, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 1, pp. 116124, Jan. 2010.

ZHANG et al.: A HIGH-THROUGHPUT LDPC DECODER ARCHITECTURE WITH RATE COMPATIBILITY

847

[21] K. Zhang, X. Huang, and Z. Wang, High-throughput layered decoder implementation for quasi-cyclic LDPC codes, IEEE J. Sel. Areas Commun., vol. 27, no. 6, pp. 985994, Aug. 2009. [22] Air Interface for Fixed and Mobile Broadband Wireless Access Systems, IEEE 802.16e, Oct. 2005 [Online]. Available: http://www.ieee802.org/16/tge [23] T. Brack, M. Alles, F. Kienle, and N. Wehn, A synthesizable IP core for WiMax 802.16e LDPC code decoding, in Proc. IEEE 17th Int. Symp. Personal, Indoor, Mobile Radio Communications, Sep. 2006, pp. 15. [24] G. Gentile, M. Rovini, and L. Fanucci, Low-complexity architectures of a decoder for IEEE 802.16e LDPC codes, in Proc. Euromicro Conf. Digital System Design (DSD), Aug. 2007, pp. 369375. [25] X.-Y. Shih, C.-Z. Zhan, C.-H. Lin, and A.-Y. Wu, An 8.29 mm 52 mW multi-mode LDPC decoder design for mobile WiMax system in 0.13 m CMOS process, IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 672683, Mar. 2008. [26] C.-H. Liu, S.-W. Yen, C.-L. Chen, H.-C. Chang, C.-Y. Lee, Y.-S. Hsu, and S.-J. Jou, An LDPC decoder chip based on self-routing network for IEEE 802.16e applications, IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 684694, Mar. 2008. [27] M. Fossorier, M. Mihaljevic, and H. Imai, Reduced complexity iterative decoding of low-density parity-check codes based on belief propagation, IEEE Trans. Commun., vol. 47, no. 5, pp. 673680, May 1999. [28] E. B. Guilloud and J. Danger, -min decoding algorithm of regular and irregular LDPC codes, in Proc. 3rd Int. Symp. Turbo Codes Related Topics, Sep. 2003, pp. 451454. [29] J. Chen, A. Dholakia, E. Eleftheriou, M. Fossorier, and X.-Y. Hu, Reduced-complexity decoding of LDPC codes, IEEE Trans. Commun., vol. 53, no. 8, pp. 12881299, Aug. 2005. [30] D. Hocevar, A reduced complexity decoder architecture via layered decoding of LDPC codes, in Proc. IEEE Workshop Signal Process. Syst. (SiPS), Oct. 2004, pp. 107112. [31] K. Gunnam, G. Choi, M. Yeary, and M. Atiquzzaman, VLSI architectures for layered decoding for irregular LDPC codes of WiMax, in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2007, pp. 45424547. Kai Zhang received the B.E. degree in information engineering and the M.S. degree in microelectronics, both from Xian Jiaotong University, Xian, China, in 2004 and 2007, respectively. He is currently working toward the Ph.D. degree in electrical and computer engineering at Worcester Polytechnic Institute, Worcester, MA. His research interests are VLSI architectures for error correction codes and cooperative communications.

Xinming Huang (M01SM09) received the Ph.D. degree in electrical engineering from Virginia Tech, Blacksburg, VA in 2001. He was formerly a Member of Technical Staff with the wireless advanced technology laboratory, Bell Labs of Lucent Technologies, from 2001 to 2003. He is currently an Associate Professor in the Department of Electrical and Computer Engineering at the Worcester Polytechnic Institute (WPI), Worcester, MA. His research interests are in the areas of circuit design and system architecture, with emphasis on recongurable computing, wireless communications, and networked embedded systems. Dr. Huang was among the recipients of the DARPA Young Faculty Award in 2007, the IBM Faculty Fellowship Award in 2004, and the Central Bell Labs Annual Excellence and Teamwork Award in 2002.

Zhongfeng Wang (M00SM05) received B.E. and M.S. degrees, both from the Department of Automation at Tsinghua University, Beijing, China, and the Ph.D. degree from the Department of Electrical and Computer Engineering at the University of Minnesota, Minneapolis, in 2000. In the past, he has worked for Beijing Hua-hai New Technology Development Co., Beijing, China; Morphics Technology, Inc. (now a part of Inneon Technology), Campbell, CA; and National Semiconductor Company, Longmont, CO. From 2003 to 2007, he worked as Assistant Professor in the School of Electrical Engineering and Computer Science at Oregon State University (OSU), Corvallis. Since June 2007, he has been working as a Senior Principle Scientist for Broadcom Corporation, Irvine, CA. He has edited one book VLSI (InTech Publisher, 2010), authored/ coauthored over 100 technical papers, and led numerous U.S. patent applications. His current research interest is on VLSI for very high speed networking. Dr. Wang was the recipient of the IEEE Circuits and Systems (CAS) Society VLSI TRANSACTIONS Best Paper Award in 2007 and the recipient of the Best Student Paper Award (rst prize) at the IEEE Workshop on Signal Processing Systems in 1999. He served as Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I (TCAS-I) from 2003 to 2005. He is serving as Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II from 2008 to 2011 and Associate Editor for the IEEE TRANSACTIONS ON VLSI SYSTEMS from 2009 through 2010. He has also served as Technical Program Committee Member for many IEEE and ACM conferences. He is currently on the technical committee of VLSI Systems and Applications (VTA-TC) and Circuits and Systems for Communications (CAS-COM) in the IEEE CAS Society.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy