0% found this document useful (0 votes)

27 views

On Deep Learning - Based Channel Decoding

Uploaded by

vanconghao041002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

On Deep Learning - Based Channel Decoding

Uploaded by

vanconghao041002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

On Deep Learning-Based Channel Decoding

Tobias Gruber∗ , Sebastian Cammerer∗ , Jakob Hoydis†, and Stephan ten Brink∗
∗ Institute of Telecommunications, Pfaffenwaldring 47, University of Stuttgart, 70659 Stuttgart, Germany
{gruber,cammerer,tenbrink}@inue.uni-stuttgart.de
† Nokia Bell Labs, Route de Villejust, 91620 Nozay, France

jakob.hoydis@nokia-bell-labs.com

Abstract—We revisit the idea of using deep neural networks for A. Related Work
one-shot decoding of random and structured codes, such as polar
arXiv:1701.07738v1 [cs.IT] 26 Jan 2017

codes. Although it is possible to achieve maximum a posteriori In 1943, McCulloch and Pitts published the idea of a NN
(MAP) bit error rate (BER) performance for both code families that models the architecture of the human brain in order
and for short codeword lengths, we observe that (i) structured
to solve problems [2]. But it took about 45 years until the
codes are easier to learn and (ii) the neural network is able to
generalize to codewords that it has never seen during training backpropagation algorithm [3] made useful applications such
for structured, but not for random codes. These results provide as handwritten ZIP code recognition possible [4]. One early
some evidence that neural networks can learn a form of decoding form of a NN is a Hopfield net [5]. This concept was shown
algorithm, rather than only a simple classifier. We introduce to be similar to maximum likelihood decoding (MLD) of
the metric normalized validation error (NVE) in order to further
linear block error-correcting codes (ECCs) [6]: an erroneous
investigate the potential and limitations of deep learning-based
decoding with respect to performance and complexity. codeword will converge to the nearest stable state of the Hop-
field net which represents the most likely codeword. A naive
implementation of MLD means correlating the received vector
I. I NTRODUCTION
of modulated symbols with all possible codewords which
Deep learning-based channel decoding is doomed by the makes it infeasible for most practible codeword lengths, as the
decoding complexity is O 2k with k denoting the number

curse of dimensionality [1]: for a short code of length N = 100
and rate r = 0.5, 250 different codewords exist, which of information bits in the codeword. The parallel computing
are far too many to fully train any neural network (NN) capabilities of NNs allow us to solve or, at least, approximate
in practice. The only way that a NN can be trained for the MLD problem in polynomial time [7]. Moreover, the
practical blocklengths is, if it learns some form of decoding weights of the NN are precomputed during training and the
algorithm which can infer the full codebook from training on decoding step itself is then relatively simple.
a small fraction of codewords. However, to be able to learn a Due to its low storage capacity, Hopfield nets were soon
decoding algorithm, the code itself must have some structure replaced by feed-forward NNs which can learn an appropriate
which is based on a simple encoding rule, like in the case of mapping between noisy input patterns and codewords. No
convolutional or algebraic codes. The goal of this paper is to assumption has to be made about the statistics of the channel
shed some light on the question whether structured codes are noise because the NN is able to learn the mapping or to
easier to “learn” than random codes, and whether a NN can extract the channel statistics during the learning process [8].
decode codewords that it has never seen during training. Different ideas around the use of NN for decoding emerged
We want to emphasize that this work is based on very short in the 90s. While in [8] the output nodes represent the bits of
blocklengths, i.e., N ≤ 64, which enables the comparison the codeword, it is also possible to use one output node per
with maximum a posteriori (MAP) decoding, but also has an codeword (one-hot coding) [9]. For Hamming coding, another
independent interest for practical applications such as the in- variation is to use only the syndrome as input of the NN in
ternet of things (IoT). We are currently restricted to short codes order to find the most likely error pattern [10]. Subsequently,
because of the exponential training complexity [1]. Thus, NND for convolutional codes arose in 1996 when Wang and
the neural network decoding (NND) concept is currently not Wicker showed that NND matches the performance of an ideal
competitive with state-of-the-art decoding algorithms which Viterbi decoder [1]. But they also mentioned a very important
have been highly optimized over the last decades and scale to drawback of NND: decoding problems have far more possi-
arbitrary blocklengths. bilities than conventional pattern recognition problems. This
Yet, there may be certain code structures which facilitate limits the NND to short codes. However, the NN decoder for
the learning process. One of our key finding is that structured convolutional codes was further improved by using recurrent
codes are indeed easier to learn than random codes, i.e., less neural nets [11].
training epochs are required. Additionally, our results indicate NND did not achieve any big breakthrough for neither
that NNs may generalize or “interpolate” to the full codebook block nor convolutional codes. Due to the standard training
after having seen only a subset of examples, whenever the techniques in those times it was not possible to work with NNs
code has structure. employing a large number of neurons and layers, which ren-
Input Noise NND input Hidden 2 Output

Modulation [optional] LLR Hidden 1 Hidden 3

abstract channel NN decoder

codeword xi ∈ X of length N

k estimated information bits

xi,0
b0 xi,1 b̂0
k information bits

b1 xi,2 b̂1

Encoder
{0, 1}k → {0, 1}N
bk−2 xi,N −3 b̂k−2
bk−1 xi,N −2 b̂k−1
xi,N −1

training neural network

Fig. 1: Deep learning setup for channel coding.

dered them unsuited for longer codewords. Hence, the interest output of the NN, an input-output mapping is defined by a
in NNs dwindled, not only for machine learning applications chain of functions depending on the set of parameters θ by
but also for decoding purposes. Some slight improvements
were made in the following years, e.g., by using random neural w = f (v; θ) = f (L−1) f (L−2) . . . f (0) (v) (2)
nets [12] or by reducing the number of weights [13].
where L gives the number of layers and is also called depth.
In 2006, a new training technique, called layer-by-layer
It was shown in [17] that such a multilayer NN with L = 2
unsupervised pre-training followed by gradient descent fine-
and nonlinear activation functions can theoretically approxi-
tuning [14], led to the renaissance of NNs because it made
mate any continuous function on a bounded region arbitrarily
training of NNs with more layers feasible. NNs with many
closely—if the number of neurons is large enough.
hidden layers are called deep. Nowadays, powerful new hard-
In order to find the optimal weights of the NN, a training
ware such as graphical processing units (GPUs) are available
set of known input-output mappings is required and a specific
to speed up learning as well as inference. In this renaissance
loss function has to be defined. By the use of gradient descent
of NNs, new NND ideas emerge. Yet, compared to previous
optimization methods and the backpropagation algorithm [3],
work, the NN learning techniques are only used to optimize
weights of the NN can be found which minimize the loss
well known decoding schemes which we denote as introduc-
function over the training set. The goal of training is to enable
tion of expert knowledge. For instance, in [15], weights are
the NN to find the correct outputs for unseen inputs. This is
assigned to the Tanner graph of the belief propagation (BP)
called generalization. In order to quantify the generalization
algorithm and learned by NN techniques in order to improve
ability, the loss can be determined for a data set that has not
the BP algorithm. It still seems that the recent advances in the
been used for training, the so-called validation set.
machine learning community have not yet been adapted to the
In this work, we want to use a NN for decoding of noisy
pure idea of learning to decode.
codewords. At the transmitter, k information bits are encoded
II. D EEP L EARNING FOR C HANNEL C ODING into a codeword of length N . The coded bits are modulated
The theory of deep learning is comprehensively described and transmitted over a noisy channel. At the receiver, a noisy
in [16]. Nevertheless, for completeness, we will briefly explain version of the codeword is received and the task of the decoder
the main ideas and concepts in order to introduce a NN for is to recover the corresponding information bits. In comparison
channel (de-)coding and its terminology. A NN consists of to iterative decoding, the NN finds its estimate by passing
many connected neurons. In such a neuron all of its weighted each layer only once. As this principle enables low-latency
inputs are added up, a bias is optionally added, and the result implementations, we term it one-shot decoding.
is propagated through a nonlinear activation function, e.g., a Obtaining labeled training data is usually a very hard and
sigmoid function or a rectified linear unit (ReLU), which are expensive task for the field of machine learning. But using
respectively defined as NN for channel coding is special because we deal with man-
made signals. Therefore, we are able to generate as many
1 training samples as we like. Moreover, the desired NN output,
gsigmoid (z) = , grelu (z) = max {0, z} . (1)
1 + e−z also denoted as label, is obtained for free because if noisy
If the neurons are arranged in layers without feedback connec- codewords are generated, the transmitted information bits are
tions we speak of a feedforward NN because information flows obviously known. For the sake of simplicity, binary phase shift
through the net from the left to the right without feedback (see keying (BPSK) modulation and an additive white Gaussian
Fig. 1). Each layer i with ni inputs and mi outputs performs noise (AWGN) channel is used. Other channels can be adopted
the mapping f (i) : Rni → Rmi with the weights and biases straightforwardly, and it is this flexibility that may be a
of the neurons as parameters. Denoting v as input and w as particular advantage of NN-based decoding.
In order to keep the training set small it is possible to 5 40
extend the NN with additional layers for modulating and 4 30
adding noise (see Fig. 1). These additional layers have no

NVE

NVE
trainable parameters, i.e., they perform a certain action such 3 20
as adding noise and propagate this value only to the node 2 10
of the next layer with the same index. Instead of creating, 1 1
and thus storing, many noisy versions of the same codeword, −2 0 2 4 6 −4−2 0 2 4 6 8
working on the noiseless codeword is sufficient. Thus, the Training-Eb/N0 [dB] Training-Eb /N0 [dB]
training set X consists of all possible codewords xi ∈ FN 2 with
F2 ∈ {0, 1} (the labels being the corresponding information (a) Polar Code (b) Random Code
bits) and is given by X = {x0 , . . . , x2k−1 } . Fig. 2: NVE versus training-Eb/N0 for 16 bit-length codes for
As recommended in [16], each hidden layer employs a a 128-64-32 NN trained with Mep = 216 training epochs.
ReLU activation function because it is nonlinear and at the
same time very close to linear which helps during optimiza- complexity. As we support reproducible research, we have
tion. Since the output layer represents the information bits, a made parts of the source code of this paper available.4
sigmoid function forces the output neurons to be in between
zero and one, which can be interpreted as the probability that III. L EARN TO D ECODE
a “1” was transmitted. If the probability is close to the bit of In the sequel, we will consider two different code families:
the label, the loss should be incremented only slightly whereas random codes and structured codes, namely polar codes [19].
large errors should result in a very large loss. Examples for Both have codeword length N = 16 and code rate r = 0.5.
such loss functions are the mean squared error (MSE) and the While random codes are generated by randomly picking
binary cross-entropy (BCE), defined respectively as codewords from the codeword space with a Hamming distance
1 X 2 larger than two, the generator matrix of polar codes of block
LMSE = bi − b̂i (3) size N = 2n is given by
k i
1 Xh 1 0
GN = F⊗n ,
i
LBCE = − bi ln b̂i + (1 − bi ) ln 1 − b̂i (4) F= (6)
k i 1 1
where F⊗n denotes the nth Kronecker power of F. The
where bi ∈ {0, 1} is the ith target information bit (label) and codewords are now obtained by x = uGN , where u contains
b̂i ∈ [0, 1] the NN soft estimate. k information bits and N − k frozen positions, for details we
There are some alternatives for this setup. First, log- refer to [19]. This way, polar codes are inherently structured.
likelihood ratio (LLR) values could be used instead of channel
values. For BPSK modulation over an AWGN channel, these A. Design parameters of NND
are obtained by Our starting point is a NN as described before (see Fig. 1).
P (x = 0|y) 2 We introduce the notation 128-64-32 which describes the
LLR (y) = ln = 2y (5) design of the NN decoder employing three hidden layers with
P (x = 1|y) σ
128, 64, and 32 nodes, respectively. However, there are other
where σ 2 is the noise power and y the received channel design parameters with a non-negligible performance impact:
value. This processing step can be also implemented as an 1) What is the best training signal-to-noise-ratio (SNR)?
additional layer without any trainable parameters. Note, that 2) How many training samples are necessary?
the noise variance must be known in this case and provided as 3) Is it easier to learn from LLR channel output values
an additional input to the NN.1 Representing the information rather than from the direct channel output?
bits in the output layer as a one-hot-coded vector of length 2k 4) What is an appropriate loss function?
is another variant. However, we refrain from this idea since 5) How many layers and nodes should the NN employ?
it does not scale to large values of k. Freely available open- 6) Which type of regularization5 should be used?
source machine learning libraries, such as Theano2 , help to The area of research dealing with the optimization of these
implement and train complex NN models on fast concurrent parameters is called hyperparameter optimization [20]. In this
GPU architectures. We use Keras3 as a convenient high- work, we do not further consider this optimization and restrict
level abstraction front-end for Theano. It allows to quickly ourselves to a fixed set of hyperparameters which we have
deploy NNs from a very abstract point of view in the Python found to achieve good results. Our focus is on the differences
programming language that hides away a lot of the underlying between random and structured codes.
1 Inspired by the idea of spatial transformer networks [18], one could 4 https://github.com/gruberto/DL-ChannelDecoding
alternatively use a second NN to estimate σ2 from the input and provide 5 Regularization is any method that trades-off a larger training error against
this estimate as an additional parameter to the LLR layer. a smaller validation error. An overview of such techniques is provided in [16,
2 https://github.com/Theano/Theano
Ch. 7]. We do not use any regularization techniques in this work, but leave
3 https://github.com/fchollet/keras it as an interesting future investigation.
Since the performance of NND depends not only on the
SNR of the validation data set (for which the bit error rate 10−1
(BER) is computed) but also on the SNR of the training
data set6 , we define below a new performance metric, the Mep
10−2 Mep = 210
normalized validation error (NVE). Denote by ρt and ρv the

BER
Mep = 212
SNR (measured as Eb /N0 ) of the training and validation 10−3 Mep = 214
data sets, respectively, and let BERNND (ρt , ρv ) be the BER
Mep = 216
achived by a NN trained at ρt on data with ρv . Similarly, let −4
10 Mep = 218
BERMAP (ρv ) be the BER of MAP decoding at SNR ρv . For a
MAP
set of S different validation data sets with SNRs ρv,1 , . . . , ρv,S ,
the NVE is defined as 10−5
0 2 4 6 8 10
S
1 X BERNND (ρt , ρv,s ) Eb /N0 [dB]
NVE(ρt ) = . (7) (a) Polar Code
S s=1 BERMAP (ρv,s )

The NVE measures how good a NND, trained at a particular Mep

10−1
SNR, is compared to MAP decoding over a range of different
SNRs. Obviously, for NVE = 1, the NN achieves MAP per-
formance, but is generally greater. In the sequel, we compute 10−2 Mep = 210

BER
the NVE over S = 20 different SNR points from 0 dB to 5 dB Mep = 212
with a validation set size of 20000 examples for each SNR. 10−3 Mep = 214
We train our NN decoder in so-called “epochs”. In each Mep = 216
epoch, the gradient of the loss function is calculated over the 10−4 Mep = 218
entire training set X using Adam’, a method for stochastic MAP
gradient descent optimization [22]. Since the noise layer in 10−5
0 2 4 6 8 10
our architecture generates a new noise realization each time it
is used, the NN decoder will never see the same input twice. Eb /N0 [dB]
For this reason, although the training set has a limited size (b) Random Code
of 2k codewords, we can train on an essentially unlimited Fig. 3: Influence of the number of epochs Mep on the BER of a
training set by simply increasing the number of epochs Mep . 128-64-32 NN for 16 bit-length codes with code rate r = 0.5.
However, this makes it impossible to distinguish whether the
NN is improved by a larger amount of training samples or
However, for polar codes, close to MAP performance is
more optimization iterations.
already achieved for Mep = 218 epochs, while we may need
Starting with a NN decoder architecture of 128-64-32 and
a larger NN or more training epochs for random codes.
Mep = 222 learning epochs, we train the NN with datasets
In Fig. 4, we illustrate the influence of direct channel values
of different training SNRs and evaluate the resulting NVE.
versus channel LLR values as decoder input in combination
The result is shown in Fig. 2, from which it can be seen that
with two loss functions, MSE and BCE. The NVE for all
there is an “optimal” training Eb /N0 . An explanation for the
combinations is plotted as a function of the number of training
occurrence of an optimum can be explained by the two cases:
epochs. Such a curve is also called “learning curve” since
1) Eb /N0 → ∞; train without noise, the NN is not trained it shows the process of learning. Although it is ususally
to handle noise. recommended to normalize the NN inputs to have zero mean
2) Eb /N0 → 0; train only with noise, the NN can not learn and unit variance, we train the NN without any normalization
the code structure. which seems to be sufficient for our setup. For a few training
This clearly indicates an optimum somewhere in between these epochs, the LLR input improves the learning process; however,
two cases. From now on, a training Eb /N0 of 1 dB and 4 dB this advantage disappears for a larger Mep . The same holds
is chosen for polar and random codes, respectively. for BCE against MSE. For polar codes with LLR values and
Fig. 3 shows the BER achieved by a very small NN BCE the learning appears not to converge for the applied
of dimensions 128-64-32 as a function of the number of number of epochs. In summary, for training the NN with a
training epochs ranging from Mep = 210 , . . . , 218 . For BER large number of training epochs it does not matter if LLR or
simulations, we use 1 million codewords per SNR point. For channel values are used as inputs and which loss function is
both code families, the larger the number of training epochs, employed. Moreover, normalization is not required.
the closer is the gap between MAP and NND performance. In order to answer the question how large the NN should
be, we trained NNs with different sizes and structures. From
6 It would also be possible to have a training data set which contains a
Fig. 5, we can conclude that, for both polar and random codes,
mix of different SNR values, but we have not investigated this option here.
Recently, the authors in [21] observed that starting at a high training SNR it is possible to achieve MAP performance. Moreover, and
and then gradually reducing the SNR works well. somewhat surprisingly, the larger the net, the less training
20 30
direct channel N = 16
channel LLR N = 32
15 MSE 20 N = 64

NVE
BCE Polar Code
NVE

Polar Code Random Code

10 Random Code 10

5
8 9 10 11 12 13 14
# information bits k
104 105
Fig. 6: Scalability shown by NVE for a 1024-512-256 NN
Training epochs Mep for 16/32/64 bit-length codes with different code rates and
Fig. 4: Learning curve for 16 bit-length codes with code rate Mep = 216 training epochs.
r = 0.5 for a 128-64-32 NN.
IV. C APABILITY OF G ENERALIZATION
20
128-64-32 As Fig. 2–6 show, NNDs for polar codes always perform
256-128-64 better than random codes for a fixed NN design and number of
15 512-256-128 training epochs. This provides a first indication that structured
1024-512-256 codes, such as polar codes, are easier to learn than random
NVE

Polar Code
10 codes. In order to confirm this hypothesis, we train the NN
Random Code
based on a subset Xp which covers only p % of the entire set
of valid codewords. Then, the NN decoder is evaluated with
5 the set Xp that covers the remaining 100 − p % of X . As a
benchmark, we evaluate the NN decoder also for the set of
all codewords X . Instead of BER as in Fig. 3, we now use
104 105
the block error rate (BLER) for evaluation (see Fig. 7). This
Training epochs Mep way, we only consider whether an entire codeword is correctly
Fig. 5: Learning curve for different NN sizes for 16 bit-length detected or not, exluding side-effects of similarities between
codes with code rate r = 0.5. codewords which might lead to partially correct decoding.
While for polar codes the NN is able to decode codewords
that were not seen during training, the NN cannot decode
epochs are necessary. In general, the larger the number of
any unseen codeword for random codes. Fig. 8 emphasizes
layers and neurons, the larger is the expressive power or
this observation by showing the single-word BLER for the
capacity of the NN [16]. Contrary to what is common in
codewords xi ∈ X80 which were not used for training.
classic machine learning tasks, increasing the network size
Obviously, the NN fails for almost every unseen random
does not lead to overfitting since the network never sees the
codeword which is plausible. But for a structured code, such
same input twice.
as a polar codes, the NN is able to generalize even for unseen
codewords. Unfortunately, the NN architecture considered here
B. Scalability is not able to achieve MAP performance if it is not trained on
Up to now, we have only considered 16 bit-length codes the entire codebook. However, finding a network architecture
which are of little practical importance. Therefore, the scal- that generalizes best is topic of our current investigations.
ability of the NN decoder is investigated in Fig. 6. One can In summary, we can distinguish two forms of generalization.
see that the length N is not crucial to learn a code by deep First, as described in Section III, the NN can generalize from
learning techniques. What matters, however, is the number input channel values with a certain training SNR to input
of information bits k that determines the number of different channel values with arbitrary SNR. Second, the NN is able to
classes (2k ) which the NN has to distinguish. For this reason, generalize from a subset Xp of codewords to an unseen subset
the NVE increases exponentially for larger values of k for a Xp . However, we observed that for larger NNs the capability
NN of fixed size and fixed number of training epochs. If a NN of the second form of generalization vanishes.
decoder is supposed to scale, it must be able to generalize from
V. O UTLOOK AND C ONCLUSION
a few training examples. In other words, rather than learning
to classify 2k different codewords, the NN decoder should For small block lengths, we achieved to decode random
learn a decoding algorithm which provides the correct output codes as well as polar codes with MAP performance. But
for any possible codeword. In the next section, we investigate learning is limited through exponential complexity as the
whether structure allows for some form of generalization. number of information bits in the codewords increases. The
100 1 Random Code
Polar Code

BLER
70 %
10−1 80 % 0.5
90 %
p = 100 %
BLER

p = 90 % 70 %
10−2 0
p = 80 % 0 10 20 30 40 50
80 %
p = 70 %
90 % Codeword index i of X80 .
MAP
10−3 100 %
Xp Fig. 8: Single-word BLER for xi ∈ X80 at Eb /N0 = 4.16 dB
X and Mep = 218 learning epochs.
10−4
0 2 4 6 8 10
[4] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
Eb /N0 [dB] W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten
zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551,
(a) 16 bit-length Polar Code (r = 0.5)
Dec. 1989.
0 [5] J. Hopfield, “Neural networks and physical systems with emergent
10 90 % 80 % 70 %
70 %
collective computational abilities,” Proc. Nat. Acad. Sci., vol. 79, pp.
2554–2558, 1982.
80 % [6] J. Bruck and M. Blaum, “Neural networks, error-correcting codes, and
10−1 90 % polynomials over the binary n-cube,” IEEE Trans. Inform. Theory,
p = 100 % vol. 35, no. 5, pp. 976–987, Sept. 1989.
BLER

p = 90 % [7] G. Zeng, D. Hush, and N. Ahmed, “An application of neural net in

10−2 p = 80 % decoding error-correcting codes,” in IEEE Int. Symp. on Circuits and
Systems, vol. 2, May 1989, pp. 782–785.
p = 70 % [8] W. R. Caid and R. W. Means, “Neural network error correcting decoders
MAP for block and convolutional codes,” in Proc. IEEE Globecom Conf.,
10−3 100 % vol. 2, Dec. 1990, pp. 1028–1031.
Xp
X [9] A. D. Stefano, O. Mirabella, G. D. Cataldo, and G. Palumbo, “On the
use of neural networks for Hamming coding,” in IEEE Int. Symp. on
10−4 Circuits and Systems, vol. 3, June 1991, pp. 1601–1604.
0 2 4 6 8 10 [10] L. G. Tallini and P. Cull, “Neural nets for decoding error-correcting
Eb /N0 [dB] codes,” in Proc. IEEE Tech. Applicat. Conf. and Workshops Northcon95,
Oct. 1995, pp. 89–.
(b) 16 bit-length Random Code (r = 0.5) [11] A. Hamalainen and J. Henriksson, “A recurrent neural decoder for
convolutional codes,” in Proc. IEEE Int. Conf. on Commun. (ICC), vol. 2,
Fig. 7: BLER for a 128-64-32 NN trained on Xp with 1999, pp. 1305–1309.
Mep = 218 learning epochs. Solid and dashed lines show the [12] H. Abdelbaki, E. Gelenbe, and S. E. El-Khamy, “Random neural network
performance on Xp on X , respectively. decoder for error correcting codes,” in Int. Joint Conf. on Neural
Networks, vol. 5, 1999, pp. 3241–3245.
[13] J.-L. Wu, Y.-H. Tseng, and Y.-M. Huang, “Neural network decoders for
very surprising result is that the NN is able to general- linear block codes,” Int. Journ. of Computational Engineering Science,
vol. 3, no. 3, pp. 235–255, 2002.
ize for structured codes, which gives hope that decoding [14] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for
algorithms can be learned. State-of-the-art polar decoding deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554,
currently suffers from high decoding complexity, a lack of July 2006.
[15] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode
possible parallelization and, thus, critical decoding latency. linear codes using deep learning,” CoRR, 2016. [Online]. Available:
NND inherently describes a highly parallelizable structure, http://arxiv.org/abs/1607.04793
enabling one-shot decoding. This renders deep learning-based [16] I. Goodfellow, Y. Bengio, and A. Courville, “Deep Learning,”
2016, book in preparation for MIT Press. [Online]. Available:
decoding a promising alternative channel decoding approach http://www.deeplearningbook.org
as it avoids sequential algorithms. Future investigations will be [17] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward
based on the exploration of regularization techniques as well networks are universal approximators,” Neural Networks, vol. 2, no. 5,
pp. 359–366, 1989.
as recurrent and memory-augmented neural networks, which [18] M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer
are known to be Turing complete [23] and have recently shown networks,” in Advances in Neural Information Processing Systems, 2015,
remarkable performance in algorithm learning. pp. 2017–2025.
[19] E. Arikan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
R EFERENCES Trans. Inform. Theory, vol. 55, no. 7, pp. 3051 –3073, 2009.
[20] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms
for hyper-parameter optimization,” in Advances in Neural Information
[1] X.-A. Wang and S. B. Wicker, “An artificial neural net Viterbi decoder,” Processing Systems 24. Curran Associates, Inc., 2011, pp. 2546–2554.
IEEE Trans. Commun., vol. 44, no. 2, pp. 165–171, Feb. 1996. [21] D. George and E. A. Huerta, “Deep Neural Networks to Enable Real-
[2] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent time Multimessenger Astrophysics,” ArXiv e-prints, Dec. 2016.
in nervous activity,” The bulletin of mathematical biophysics, vol. 5, [22] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
no. 4, pp. 115–133, 1943. CoRR, 2014. [Online]. Available: http://arxiv.org/abs/1412.6980
[3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Parallel distributed [23] H. T. Siegelmann and E. D. Sontag, “On the computational power of
processing: Explorations in the microstructure of cognition, vol. 1.” neural nets,” in Proc. of the fifth annual workshop on Computational
Cambridge, MA, USA: MIT Press, 1986, pp. 318–362. learning theory. ACM, 1992, pp. 440–449.

Asme Allowable Stress Table
100% (2)
Asme Allowable Stress Table
2 pages
11D 1.8L Overhaul Mirage
50% (2)
11D 1.8L Overhaul Mirage
55 pages
Infrared Thermography: Errors and Uncertainties
No ratings yet
Infrared Thermography: Errors and Uncertainties
3 pages
NeurIPS-2022-error-correction-code-transformer-Paper-Conference
No ratings yet
NeurIPS-2022-error-correction-code-transformer-Paper-Conference
11 pages
Scaling Deep Learning-Based Decoding of Polar
No ratings yet
Scaling Deep Learning-Based Decoding of Polar
6 pages
Doan Wiley
No ratings yet
Doan Wiley
15 pages
Linear Block Code Decoding Using Neural Network: 15bec061@nirmauni - Ac.in
No ratings yet
Linear Block Code Decoding Using Neural Network: 15bec061@nirmauni - Ac.in
3 pages
ECE/CS 559 - Neural Networks Lecture Notes #3 Some Example Neural Networks
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #3 Some Example Neural Networks
7 pages
Blockchain Tehnologije EN
No ratings yet
Blockchain Tehnologije EN
6 pages
Neural Networks: and Shape Identification
No ratings yet
Neural Networks: and Shape Identification
95 pages
Deep Learning Models for Physical Layer Communications
No ratings yet
Deep Learning Models for Physical Layer Communications
263 pages
GeoStat DeepLearn NDesassis 15 06 22
No ratings yet
GeoStat DeepLearn NDesassis 15 06 22
134 pages
Temporal Pattern Classification Using Spiking Neural Networks
No ratings yet
Temporal Pattern Classification Using Spiking Neural Networks
67 pages
Deep Learning in Physical Layer Communications: Zhijin Qin, Hao Ye, Geoffrey Ye Li, and Biing-Hwang Fred Juang
No ratings yet
Deep Learning in Physical Layer Communications: Zhijin Qin, Hao Ye, Geoffrey Ye Li, and Biing-Hwang Fred Juang
14 pages
Morse Code Datasets For Machine Learning: Sourya Dey, Keith M. Chugg and Peter A. Beerel
No ratings yet
Morse Code Datasets For Machine Learning: Sourya Dey, Keith M. Chugg and Peter A. Beerel
7 pages
Neural Joint Source-Channel Coding
No ratings yet
Neural Joint Source-Channel Coding
15 pages
Unit-5 AI
No ratings yet
Unit-5 AI
19 pages
DL Unit - 4
No ratings yet
DL Unit - 4
14 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
DUnit IV
No ratings yet
DUnit IV
22 pages
Learning-Based Approach For Designing Error-Correcting Codes
No ratings yet
Learning-Based Approach For Designing Error-Correcting Codes
35 pages
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
No ratings yet
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
8 pages
Joint Transceiver Optimization For Wireless Communication PHY With CNN
No ratings yet
Joint Transceiver Optimization For Wireless Communication PHY With CNN
21 pages
Network Learning and Training of A Cascaded Link-Based Feed Forward Neural Network (CLBFFNN) in An Intelligent Trimodal Biometric System
No ratings yet
Network Learning and Training of A Cascaded Link-Based Feed Forward Neural Network (CLBFFNN) in An Intelligent Trimodal Biometric System
21 pages
Character Recognition by Levenberg
No ratings yet
Character Recognition by Levenberg
5 pages
From Perceptron To Deep Neural Nets - Becoming Human - Artificial Intelligence Magazine
No ratings yet
From Perceptron To Deep Neural Nets - Becoming Human - Artificial Intelligence Magazine
36 pages
Applying Graph Neural Networks To The Decision Ver 240324 001255
No ratings yet
Applying Graph Neural Networks To The Decision Ver 240324 001255
14 pages
Learning
No ratings yet
Learning
48 pages
Decoding of Error Correcting Codes Using Neural Networks
No ratings yet
Decoding of Error Correcting Codes Using Neural Networks
67 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
Unit-3
No ratings yet
Unit-3
16 pages
ANN Assignment
No ratings yet
ANN Assignment
10 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
Lecun 20181015 Ihes Gomax PDF
No ratings yet
Lecun 20181015 Ihes Gomax PDF
109 pages
Introduction to Deep Learning 1st Edition Eugene Charniak instant download
100% (1)
Introduction to Deep Learning 1st Edition Eugene Charniak instant download
27 pages
Effect of Varying Neurons in The Hidden Layer of Neural Network For Simple Character Recognition
No ratings yet
Effect of Varying Neurons in The Hidden Layer of Neural Network For Simple Character Recognition
4 pages
20250415 - Deep_learning
No ratings yet
20250415 - Deep_learning
49 pages
End-to-end learning of adaptive coded modulation schemes for resilient wireless communications
No ratings yet
End-to-end learning of adaptive coded modulation schemes for resilient wireless communications
18 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
143 pages
Introduction to Deep Learning 1st Edition Eugene Charniak - The latest ebook is available for instant download now
100% (2)
Introduction to Deep Learning 1st Edition Eugene Charniak - The latest ebook is available for instant download now
53 pages
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
No ratings yet
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
46 pages
A New Efficient Decoder of Linear Block Codes Based On Ensemble Learning Methods
No ratings yet
A New Efficient Decoder of Linear Block Codes Based On Ensemble Learning Methods
11 pages
Introduction To DL With TensorFlow
No ratings yet
Introduction To DL With TensorFlow
55 pages
2014 10 Cho EMNLP
No ratings yet
2014 10 Cho EMNLP
11 pages
Notes_ML_02_Slides_RNN_ANN
No ratings yet
Notes_ML_02_Slides_RNN_ANN
105 pages
Introduction to Deep Learning 1st Edition Eugene Charniak - The ebook in PDF format is available for download
100% (1)
Introduction to Deep Learning 1st Edition Eugene Charniak - The ebook in PDF format is available for download
48 pages
Machine Learning Module-3
No ratings yet
Machine Learning Module-3
23 pages
An Introduction To Back
No ratings yet
An Introduction To Back
4 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
ECE/CS 559 - Neural Networks Lecture Notes #8: Associative Memory and Hopfield Networks
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #8: Associative Memory and Hopfield Networks
9 pages
Conti AppPhyLett
No ratings yet
Conti AppPhyLett
6 pages
Deep Learning Book
No ratings yet
Deep Learning Book
25 pages
AML_mod4
No ratings yet
AML_mod4
22 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
Deep Learning Tutorial
No ratings yet
Deep Learning Tutorial
133 pages
Pattern Recognition in Neural Networks: T. Muthya Mounika, V.V. Vishnu Prabhakar
No ratings yet
Pattern Recognition in Neural Networks: T. Muthya Mounika, V.V. Vishnu Prabhakar
3 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
81 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
2019-POPL - Code2vec Learning Distributed Representations of Code
No ratings yet
2019-POPL - Code2vec Learning Distributed Representations of Code
29 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Finger-tap Code: a 555 Timer Morse Adventure Do It Yourself Morse Code Project
From Everand
Finger-tap Code: a 555 Timer Morse Adventure Do It Yourself Morse Code Project
GURUPRASAD N H
No ratings yet
Concise and Simple Guide to IP Subnets
From Everand
Concise and Simple Guide to IP Subnets
alasdair gilchrist
5/5 (4)
Japanese Secret Projects
100% (10)
Japanese Secret Projects
83 pages
14.30 - MR Sokhom Pheakavanmony PDF
No ratings yet
14.30 - MR Sokhom Pheakavanmony PDF
16 pages
TLBrochure PDF
No ratings yet
TLBrochure PDF
11 pages
WWW - Drilling.Kr: HD609-I Drifter Parts For HCR 9 Crawler Drill
100% (1)
WWW - Drilling.Kr: HD609-I Drifter Parts For HCR 9 Crawler Drill
3 pages
Stylistics
No ratings yet
Stylistics
4 pages
F18 - Single Boiler Sanremo
No ratings yet
F18 - Single Boiler Sanremo
2 pages
Coolermaster RS C50 EMBA D2 Manual
No ratings yet
Coolermaster RS C50 EMBA D2 Manual
88 pages
DDM Manual 1
No ratings yet
DDM Manual 1
49 pages
Dissertation Technique Arbitrage
100% (2)
Dissertation Technique Arbitrage
6 pages
How To Control Your Robot Using A Wii Nunchuck An
No ratings yet
How To Control Your Robot Using A Wii Nunchuck An
10 pages
Physics Formulas
No ratings yet
Physics Formulas
7 pages
Zhang Et Al 2015. Cloning and Functional Analysis of A Laccase Gene During Fruiting Body Formation in Hypsizygus Marmoreus
No ratings yet
Zhang Et Al 2015. Cloning and Functional Analysis of A Laccase Gene During Fruiting Body Formation in Hypsizygus Marmoreus
10 pages
A Case On Periampullary Carcinoma.: Presented by DR Sumaiya Tasnim Tanima
No ratings yet
A Case On Periampullary Carcinoma.: Presented by DR Sumaiya Tasnim Tanima
34 pages
Iit Jam Physics 2010 PDF
No ratings yet
Iit Jam Physics 2010 PDF
7 pages
Why Burger King Is Proudly Advertising A Moldy, Disgusting Whopper
No ratings yet
Why Burger King Is Proudly Advertising A Moldy, Disgusting Whopper
8 pages
Episode 1-The One With The New Nanny (Pilot)
No ratings yet
Episode 1-The One With The New Nanny (Pilot)
16 pages
Geotechnical Problem Solving: Book Description
No ratings yet
Geotechnical Problem Solving: Book Description
9 pages
CATHY Accomplishment REPORT New
No ratings yet
CATHY Accomplishment REPORT New
22 pages
Summary of Competency Plan
No ratings yet
Summary of Competency Plan
7 pages
AVE MOD 002 INS - I03 Boeing737 - InstallationInstructions
No ratings yet
AVE MOD 002 INS - I03 Boeing737 - InstallationInstructions
26 pages
Sony mhc-rg22
No ratings yet
Sony mhc-rg22
32 pages
Chipset PC Motherboard Southbridge Sandy Bridge: Northbridge (Computing)
No ratings yet
Chipset PC Motherboard Southbridge Sandy Bridge: Northbridge (Computing)
6 pages
C.E (B) - 47 2023-24
No ratings yet
C.E (B) - 47 2023-24
63 pages
AP - TP As Linear Programming
No ratings yet
AP - TP As Linear Programming
6 pages
Network 2005 in Depth 1st Edition Tamara Dean 2024 Scribd Download
100% (13)
Network 2005 in Depth 1st Edition Tamara Dean 2024 Scribd Download
64 pages
1.assignment 1 Lingusitics
No ratings yet
1.assignment 1 Lingusitics
7 pages
2021Jan-GROUP PROJECT Topic-Sent ST
No ratings yet
2021Jan-GROUP PROJECT Topic-Sent ST
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

On Deep Learning - Based Channel Decoding

Uploaded by

On Deep Learning - Based Channel Decoding

Uploaded by

On Deep Learning-Based Channel Decoding

Modulation [optional] LLR Hidden 1 Hidden 3

abstract channel NN decoder

k estimated information bits

training neural network

Fig. 1: Deep learning setup for channel coding.

The NVE measures how good a NND, trained at a particular Mep

Polar Code Random Code

p = 90 % [7] G. Zeng, D. Hush, and N. Ahmed, “An application of neural net in

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.