0% found this document useful (0 votes)
55 views4 pages

1.2 Increasing The Throughput of Polar Decoders

This document discusses methods to increase the throughput of polar decoders. It presents an improved version of the simplified successive-cancellation (SSC) decoding algorithm that increases decoding throughput without degrading error-correction performance. The proposed algorithm uses resource-constrained maximum-likelihood (ML) nodes to decode some constituent codes of rates R, where 0 < R < 1, in one time step. This allows the decoder to have up to three times the throughput of the simplified SSC decoding algorithm and up to twenty-nine times the throughput of a standard successive-cancellation decoder while using the same number of processing elements.

Uploaded by

Bui Van Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views4 pages

1.2 Increasing The Throughput of Polar Decoders

This document discusses methods to increase the throughput of polar decoders. It presents an improved version of the simplified successive-cancellation (SSC) decoding algorithm that increases decoding throughput without degrading error-correction performance. The proposed algorithm uses resource-constrained maximum-likelihood (ML) nodes to decode some constituent codes of rates R, where 0 < R < 1, in one time step. This allows the decoder to have up to three times the throughput of the simplified SSC decoding algorithm and up to twenty-nine times the throughput of a standard successive-cancellation decoder while using the same number of processing elements.

Uploaded by

Bui Van Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

IEEE COMMUNICATIONS LETTERS, VOL. 17, NO.

4, APRIL 2013 725

Increasing the Throughput of Polar Decoders


Gabi Sarkis and Warren J. Gross

Abstract—The serial nature of successive-cancellation decoding u0 + + +


results in low polar decoder throughput. In this letter we present
an improved version of the simplified successive-cancellation u1 + +
decoding algorithm that increases decoding throughput without u2 + +
degrading the error-correction performance. We show that the
proposed algorithm has up to three times the throughput of
u3 +
the simplified successive-cancellation decoding algorithm and up u4 + + αv
αl βv
to twenty-nine times the throughput of a standard successive- u5 + βl
cancellation decoder while using the same number of processing αr v
elements.
u6 +
βr
u7
Index Terms—Polar codes, latency, throughput, simplified
successive-cancellation decoding. (a) Standard (b) Tree

Fig. 1: Standard and tree representations of an SC polar


I. Introduction decoder for an (8, 3) code.

P OLAR codes were proved to achieve the channel capacity


with decoding complexity O(N log N), where N is the
code length, using successive cancellation (SC) decoding [1]. and choosing the appropriate value once the current bit value
However, two problems hinder the adoption of polar codes: is estimated [5]. The throughput of this method is twice that of
the rate at which they approach capacity increases slowly SC decoding. However, applying this method to SSC decoding
with code length N, requiring long codes to achieve good would not increase the latter’s speed significantly.
error-correction performance; and they have low decoding Since the gains obtained by using SSC decoding [4] are
throughput due to the serial nature of the SC decoding significantly greater than those of [5], we focus on improv-
algorithm: approximately 0.5 f R, where f is the decoder clock ing SSC. This work begins by reviewing SSC decoding in
frequency and R the code rate [2]. In this work we focus on the Section II. In Section III, we introduce resource-constrained
throughput issue, which is exacerbated by the requirement of maximum-likelihood (ML) nodes to decode some constituent
long codes since algorithms that are more parallel in nature, codes of rates R, where 0 < R < 1, in one time step, yielding a
such as belief propagation, are rendered too complex to be decoder with two to three times the speed of the SSC decoder.
practical. For example, a fully parallel BP decoder requires The error-correction performance resulting from using ML
at least N processing elements; and it was shown in [3] that nodes with both traditional and min-sum (MS) update rules
a practical, semi-parallel BP decoder had higher complexity is investigated in Section IV. Finally, the throughput of SC,
and lower throughput than an SC decoder built for the same SSC, and our proposed method are compared in Section V,
code. taking into account resource constraints. It should be noted
Simplified successive-cancellation (SSC) [4] was proposed that while we use systematic polar codes [6] in this work, the
as a method to improve the latency, and in turn throughput, same throughput results can be obtained using non-systematic
of SC decoding while reducing decoding complexity without codes as well. We highlight the difference in implementation
affecting the error-correction performance. It increases the par- between the two cases in Section II-A.
allelism of SC decoding by operating at the constituent-code,
instead of bit, level: for constituent codes with a rate zero, II. Simplified Successive-Cancellation Decoding
i.e. containing only frozen bits, no operations are performed; A. The SSC Decoder
while the output of the constituent codes with rate one, i.e. An SSC decoder graph is built by converting a non-bit-
containing only non-frozen bits, is calculated in one time step reversed polar code graph into a binary tree of three node
using threshold detection. The throughput achieved by SSC types: rate-zero nodes, rate-one nodes, and rate-R nodes,
decoding varied between two and twenty times that of SC denoted N 0 , N 1 , and N R , respectively. Frozen bits in the polar
decoding depending on code length [4]. code are N 0 nodes, and non-frozen bits are N 1 nodes. The
A method using look-ahead techniques was proposed to standard SC decoder graph for an (8, 3) polar code is shown
increase throughput by pre-computing future bit likelihoods in Fig. 1a with the frozen bits in gray. The ui bit nodes from
Manuscript received July 23, 2012. The associate editor coordinating the Fig. 1a become the leaf nodes in the SC decoder tree shown in
review of this letter and approving it for publication was L. Dolecek. Fig. 1b: N 0 nodes are in white and correspond to frozen bits,
The authors are with the Department of Electrical and Computer and N 1 bits are in black and correspond to non-frozen bits.
Engineering, McGill University, Montréal, Québec, Canada (e-mail:
gabi.sarkis@mail.mcgill.ca, warren.gross@mcgill.ca). The type of a node in the SC tree is determined according
Digital Object Identifier 10.1109/LCOMM.2013.021213.121633 to the leaf-nodes descending from it: if all its descendant leaf
1089-7798/13$31.00 
c 2013 IEEE
726 IEEE COMMUNICATIONS LETTERS, VOL. 17, NO. 4, APRIL 2013

nodes are N 0 nodes, the node is also an N 0 node; if the


descendant leaf nodes are N 1 nodes, it is also an N 1 node;
finally, a node whose descendants contain both N 0 and N 1
nodes is an N R node, colored in gray in Fig. 1b. Groups of
nodes in the standard SC graph that form one node each in
the SC tree, are indicated by a bounding box in Fig. 1a.
A node v in the SC tree has a parent p and two children
l and r with which it exchanges real-valued message vectors
αi , which are the soft-valued input to the constituent decoder (a) SC (b) SSC (c) ML-SSC
represented by the node v, and binary-valued message vectors
βi , which are the constituent decoder decoded output, as Fig. 2: Decoder trees corresponding to the SC, SSC, and ML-
indicated in Fig. 1b. The size of the message vectors depends SSC decoding algorithms.
on the tree level in which the node is located. Assuming that
leaf nodes are located in level zero and that v is located in
obtained if αv were to be actually decoded recursively. These
level d(v), then αl , βl , αr , and βr contain 2d(v)−1 elements, and
two simplifications allow the decoder to stop traversing a path
αv and βv contain 2d(v) elements. Once αv is available from
in the tree once an N 0 or an N 1 node is encountered, resulting
the parent, αl is calculated according to:
in a significant reduction in the number of graph nodes. Such
αl [i] = αv [i]  αv [2d(v)−1 + i] for 0 ≤ i < 2d(v)−1 , (1) a decoder is called an SSC decoder and its tree is shown in
Fig. 2b to have seven instead of fifteen nodes to traverse for
where  is defined in [4] using the sum-product update rules an (8, 3) polar code.
and log-likelihood ratios (LLR), where the LLR of a bit x is
ln Pr[x=0]
Pr[x=1] , as: B. Latency of Node Updates under Resource Constraints
a  b = 2 tanh−1 (tanh(a/2) tanh(b/2)). (2) In this section we present the latency of the SSC decoder in
time steps. However, unlike [4] and in an effort to match each
The child l then calculates βl , which is used to compute the
time step with a clock cycle in a hardware implementation of
value of αr so that in the LLR domain:
the decoder, we calculate the latency under the constraint that
αr [i] = (1 − 2βl [i])αv [i] + αv [2d(v)−1 + i] for 0 ≤ i < 2d(v)−1 . only P processing elements are available, i.e. only P elements
(3) of a message vector can be calculated simultaneously when
After βr has been calculated, it is combined with βl to obtain evaluating (1) and (3). Throughout this work, we assume that
βv according to: P is a power of 2.
⎧ The number of time steps, referred to as latency, required by


⎨βl [i] ⊕ βr [i], for 0 ≤ i < 2d(v)−1 ;
βv [i] = ⎪ (4) an N R node v varies based on the value of d(v). If 2d(v)−1 ≤ P,

⎩βr [i], for 2d(v)−1 ≤ i < 2d(v) , v can calculate each of its messages in one time step; therefore,
the latency of this node will be three time steps. On the other
where ⊕ is modulo-2 addition (binary XOR).
hand, if 2d(v)−1 > P, each of αl and αr requires 2d(v)−1 /P time
Since leaf nodes do not have children, they calculate βv
steps to be computed for a total of 2d(v) /P + 1 time steps of
directly from αv using threshold detection:
⎧ latency; we assume that calculating βv will only incur one step


⎨0 when αv ≥ 0; of latency due to the simplicity of implementing (4).
βv = ⎪⎪ (5) The threshold detector used to calculate the βv outputs
⎩1 otherwise.
of N 1 nodes is a sign detector that outputs a 0 when the
In this case, αv and βv are one-element vectors. The βv number is positive and 1 otherwise. As the sign information
values are passed to the parent where they are eventually of αv elements is readily available regardless of the number
combined with the output of the sibling node and passed to representation used, (5) can be trivially performed for all αv
the grandparent and so on until the root node is reached. elements simultaneously. Therefore, N 1 nodes have a latency
The root node’s αv input is calculated directly from the re- of one time step. In the case of non-systematic codes, an
ceived channel information and its βv output is the systematic inverse transform must be applied to βv . The result of the
codeword from which the information bits can be extracted. If transform is only used to provide the final decoder output, and
one were to use non-systematic codes, the output of each N 1 is not used in subsequent calculations; therefore, the transform
node must be transformed using the inverse of an appropriately can be performed in parallel to the next decoding step and does
sized polar generator matrix as shown in [4]. not increase latency.
In [4], it was noted that the βv output of N 0 nodes is always N 0 nodes do not incur any latency; since their βv outputs
a zero vector if the frozen-bit values are chosen to be zero; are already known, their parent nodes can proceed to the next
therefore, the decoder need not calculate αv values for N 0 message calculation immediately. Moreover, since they do not
nodes and wait for the resulting βv , instead it can proceed require any input, their parent nodes need not calculate αv . In
knowing that βv will be the zero vector. Another simplification effect, not only do N 0 nodes not increase the latency, they
that was introduced in [4] is that since N 1 nodes correspond actually decrease it.
to constituent codes of rate one, calculating βv directly from Combining the latencies of all the nodes of the (8, 3) SSC
αv using threshold detection yields results identical to those decoder in Fig. 2b indicates that nine time steps are required
SARKIS and GROSS: INCREASING THE THROUGHPUT OF POLAR DECODERS 727

until the output of the decoder is available if P ≥ 4. Whereas 100


(32768, 29492)
a standard SC decoder would require fourteen steps [2]. If we SSC
10−1 ML-SSC
reduce P to two, the SSC decoder will require eleven time SSC (MS)
steps to finish. ML-SSC (MS)
10−2

FER
III. Maximum-Likelihood Nodes 10−3

N R nodes add the most latency to the decoding process due


10−4 (2048, 1024)
to two reasons: their intrinsic latency described in Section II-B, SSC
and the latency of all nodes in the subtrees rooted in them. We 10−5
ML-SSC
are interested in a replacement for these N R nodes that has SSC (MS)
ML-SSC (MS)
smaller intrinsic latency, and is able to decode the constituent 10−6
code and calculate the output βv directly from the input αv — 0 1 2 3 4 5 6
Eb /N0 (dB)
without the need to traverse a subtree. We propose resource-
constrained exhaustive-search maximum-likelihood (ML) de- Fig. 3: FER of SSC and ML-SCC decoding using SPA and
coding of the constituent code corresponding to an N R node MS update rules.
as an alternative that decodes a subtree in a single clock cycle.
In this work, a processing element is equivalent to the low-
complexity processing element of [3]. IV. Min-Sum Update Rules and Error-Correction
An ML decoder for a binary block code C estimates the Performance
transmitted codeword x̂ according to: The  operator definition in (1) uses the sum-product
x̂ = arg max Pr(y|x), (6) algorithm (SPA) check-node update rule. To reduce its im-
x∈C plementation complexity and replace the hyperbolic functions
where y is the decoder’s soft-valued input. In the case of a with simpler approximations, one could use the min-sum (MS)
constituent polar code of length nv = 2d(v) where the input to simplification as in [2]:
the decoder αv is composed of LLR values, (6) becomes:
a  b = sign(a)sign(b) min(|a|, |b|). (9)
n
v −1

βv = arg max (1 − 2x[i])αv[i]. (7) Simulations of a (2048, 1024) and a (32768, 29491) polar
x∈C
i=0 codes when transmitting random codewords over the additive
The complexity of (7) grows exponentially in the code di- white Gaussian noise (AWGN) channel using binary phase-
mension kv and linearly in the code length nv . Therefore, ML shift keying (BPSK) show that using the min-sum update
nodes N ML can only be used to replace N R nodes when rules with SSC and ML-SSC decoding has a negligible effect
computational resource constraints are met: calculating the on error-correction performance. These results are shown
 in Fig. 3, where the ML-SSC decoder was subject to the
sum (1 − 2x[i])αv[i] for each of the 2kv candidate codewords
requires nv − 1 additions, where the signs of the operands are constraints kv = 4 and nv = 16.
determined by the value of x[i], and finding the codeword
with the maximum likelihood requires 2kv − 1 comparisons. V. Throughput
To formalize, an N ML node can replace an N R node v when It was observed that two factors significantly affect the
(2 + 1)(nv − 1) ≤ P.
kv
(8) decoding speed of SSC and ML-SSC: code rate R and code
length N. To quantify this effect, we use the information
Performing (7) when (8) is satisfied requires one time step. throughput per clock cycle, i.e. the number of information
Thus, the (8, 3) polar code can be decoded in seven time bits decoded divided by the number of clock cycles required
steps when using ML-SSC with nv = 2 and kv = 1—therefore, to complete the decoding operation. Unless otherwise stated,
P = 4 which is the minimum value required for SC and we used P = 256, kv = 4, and nv = 16. All the polar codes
SSC to achieve the aforementioned latencies—resulting in the were created for an AWGN channel with a noise variance
decoder graph in Fig. 2c where it is shown that one N ML σ 2 = 0.25 according to [7].
node, indicated by the striped pattern, replaced a subtree of The change in information throughput of SSC and ML-SSC
three nodes. decoding with respect to code rate shows a clear trend where
The operations required to perform (7) are already provided the throughput is proportional to R. Fig. 4 illustrates this trend
by the processing elements of [3], which contain comparators for a polar code of length N = 215 = 32768, where it can be
and adders. In addition, N ML nodes can use the same process- observed that SSC is 2.5 to 12.6 times faster than a semi-
ing elements already required by the N R nodes. Therefore, an parallel SC (SP-SC) decoder [3], and that ML-SSC is 5.3 to
ML-SSC decoder does not require any additional processing 20.5 times faster than SC. That is, ML-SSC is 1.6 to 3.4 times
elements beyond those already present in an SSC decoder. faster than SSC.
Since the aim of ML-SSC is to decrease decoding latency, To investigate the relation between code length and through-
not to improve error-correction performance, we chose (7) put, the information throughput of nine polar codes of lengths
over other ML decoding methods that have lower computa- varying from 211 to 219 is shown in Fig. 5 for code rates 0.5,
tional complexity but lower parallelism. 0.7, and 0.9. The information throughput of both the SSC and
728 IEEE COMMUNICATIONS LETTERS, VOL. 17, NO. 4, APRIL 2013

10 20 (15,16)
ML-SSC ML-SSC
SSC SSC (12,32)

Throughput (info. bit/s/Hz)


(10,32)
Throughput (info. bit/s/Hz)

SP-SC SP-SC (13,32)


(8,32) (11,32)
15 (kv , nv ) (7,16) (9,32)
(7,32)

(5,16) (7,8)

5 10
(4,8) (4,16)

(3,8)
5
(3,4)
(2,4)

0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 24 28 212 216 220
Code Rate (R) Computational Resources (P)

Fig. 4: Throughput of the ML-SSC, SSC, and SP-SC decoders Fig. 6: Throughput of the SSC and ML-SSC decoders when
for a code of length N = 215 at different rates. the number of available resources changes.

ML-SSC (R = 0.9) SSC (R = 0.9)


15 SP-SC (R = 0.9) ML-SSC (R = 0.7) These results indicate that for the (32768, 29491) code, an
SSC (R = 0.7) SP-SC (R = 0.7)
ML-SSC (R = 0.5) SSC (R = 0.5)
ML-SSC decoder, with P = 256, an information throughput
Throughput (info. bit/s/Hz)

12 SP-SC (R = 0.5) of 9.1 bit/s/Hz, and running at a conservative 200 MHz will
achieve an information throughput of 9.1 × 200 = 1.82 Gbit/s.
9
VI. Conclusion
6
In this work, we presented an improved version of SSC
decoding that uses resource-constrained maximum-likelihood
nodes to increase decoding throughput by up to three times,
3 and by up to 25 times compared to semi-parallel SC decoding
using the same number of processing elements, paving the
0 10 way for multi-Gbit/s polar decoders.
2 211 212 213 214 215 216 217 218 219 220
Code Length (N)
Acknowledgment
Fig. 5: Throughput of the ML-SSC, SSC, and SP-SC decoders
The authors would like to thank Prof. Alexander Vardy and
for different codes of rates 0.5, 0.7, and 0.9.
Dr. Ido Tal at the University of California, San Diego for
helpful discussions.
ML-SSC decoders increases in a logarithmic manner in code
References
length. The rate of this increase is proportional to code rate.
ML-SSC was 4.9 to 29.3 times faster than SP-SC and 1.5 [1] E. Arıkan, “Channel polarization: a method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
to 2.3 times faster than SSC. The throughput improvement Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, 2009.
resulting from using N ML nodes for these codes is significant [2] C. Leroux, I. Tal, A. Vardy, and W. J. Gross, “Hardware architectures for
enough that the ML-SSC decoder with codes of rate = 0.5 successive cancellation decoding of polar codes,” in Proc. 2011 IEEE Int
Acoustics, Speech and Signal Processing Conf, pp. 1665–1668.
has a higher throughput than the SSC decoder with codes [3] C. Leroux, A. J. Raymond, G. Sarkis, and W. J. Gross, “A semi-parallel
of rate = 0.7, enabling the use of the lower rate codes—to successive-cancellation decoder for polar codes,” IEEE Trans. Signal
benefit from their better error-correction capabilities—without Process., vol. 61, no. 2, pp. 289–299, Jan. 15, 2013.
[4] A. Alamdar-Yazdi and F. R. Kschischang, “A simplified successive-
lowering the system’s throughput. cancellation decoder for polar codes,” IEEE Commun. Lett., vol. 15,
When P is increased for the (32768, 29491) code, the no. 12, pp. 1378–1380, 2011.
[5] C. Zhang, B. Yuan, and K. K. Parhi, “Reduced-latency SC polar decoder
throughput of SSC increases noticeably until P = 256. Past architectures,” in Proc. 2012 IEEE International Conference on Commu-
this point, the improvement decreases and is barely present nications, pp. 1–5.
when increasing P from 512 to 1024. ML-SSC shows contin- [6] E. Arıkan, “Systematic polar coding,” IEEE Commun. Lett., vol. 15, no. 8,
pp. 860–862, 2011.
ued improvement as P increases. This is expected as in the [7] I. Tal and A. Vardy, “How to construct polar codes,” arXiv [cs.IT], May
extreme, and impractical, case where P = (2k + 1)(n − 1), the 2011. Available: http://arxiv.org/abs/1105.6164
decoder will have sufficient resources to decode a received
vector in one time step. The effect of P on information
throughput is shown in Fig. 6.
It was noted that the structure of the polar code, affected
by the target channel quality, has a significant impact on the
speed of both the SSC and ML-SSC decoders. This is due to
the distribution of frozen bits, which affects the number of the
N 0 , N 1 , and N R nodes and the resulting decoder-tree size.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy