0% found this document useful (0 votes)

55 views5 pages

Bitwise Neural Network

This document proposes bitwise neural networks (BNNs) as a way to perform neural network computations using only binary values and bitwise operations. BNNs could improve efficiency for applications with limited resources. The document outlines how BNNs represent weights, inputs, outputs and activations with single bits. It describes the feedforward process using basic bit logic instead of arithmetic. Training methods are proposed to learn the bitwise weights, with the goal of BNNs achieving performance comparable to traditional neural networks. The document tests BNNs on MNIST digit classification to evaluate this approach.

Uploaded by

God Gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views5 pages

Bitwise Neural Network

Uploaded by

God Gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Bitwise Neural Networks

Minje Kim MINJE @ ILLINOIS . EDU

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
Paris Smaragdis PARIS @ ILLINOIS . EDU
University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
Adobe Research, Adobe Systems Inc., San Francisco, CA 94103, USA
arXiv:1601.06071v1 [cs.LG] 22 Jan 2016

Abstract cated function, Deep Neural Networks (DNN) achieve the

goal by learning a hierarchy of features in their multiple
Based on the assumption that there exists a neu-
layers (Hinton et al., 2006; Bengio, 2009).
ral network that efficiently represents a set of
Boolean functions between all binary inputs and Although DNNs are extending the state of the art results
outputs, we propose a process for developing and for various tasks, such as image classification (Goodfel-
deploying neural networks whose weight param- low et al., 2013), speech recognition (Hinton et al., 2012),
eters, bias terms, input, and intermediate hid- speech enhancement (Xu et al., 2014), etc, it is also the
den layer output signals, are all binary-valued, case that the relatively bigger networks with more parame-
and require only basic bit logic for the feedfor- ters than before call for more resources (processing power,
ward pass. The proposed Bitwise Neural Net- memory, battery time, etc), which are sometimes critically
work (BNN) is especially suitable for resource- constrained in applications running on embedded devices.
constrained environments, since it replaces ei- Examples of those applications span from context-aware
ther floating or fixed-point arithmetic with signif- computing, collecting and analysing a variety of sensor sig-
icantly more efficient bitwise operations. Hence, nals on the device (Baldauf et al., 2007), to always-on com-
the BNN requires for less spatial complexity, less puter vision applications (e.g. Google glasses), to speech-
memory bandwidth, and less power consumption driven personal assistant services, such as “Hey, Siri.” A
in hardware. In order to design such networks, primary concern that hinders those applications from be-
we propose to add a few training schemes, such ing more successful is that they assume an always-on pat-
as weight compression and noisy backpropaga- tern recognition engine on the device, which will drain the
tion, which result in a bitwise network that per- battery fast unless it is carefully implemented to minimize
forms almost as well as its corresponding real- the use of resources. Additionally, even in an environment
valued network. We test the proposed network with the necessary resources being available, speeding up
on the MNIST dataset, represented using binary a DNN can greatly improve the user experience when it
features, and show that BNNs result in compet- comes to tasks like searching big databases (Salakhutdinov
itive performance while offering dramatic com- & Hinton, 2009). In either case, a more compact yet still
putational savings. well-performing DNN is a welcome improvement.
Efficient computational structures for deploying artificial
neural networks have long been studied in the literature.
1. Introduction Most of the effort is focused on training networks whose
According to the universal approximation theorem, a sin- weights can be transformed into some quantized represen-
gle hidden layer with a finite number of units can approx- tations with a minimal loss of performance (Fiesler et al.,
imate a continuous function with some mild assumptions 1990; Hwang & Sung, 2014). They typically use the quan-
(Cybenko, 1989; Hornik, 1991). While this theorem im- tized weights in the feedforward step at every training iter-
plies a shallow network with a potentially intractable num- ation, so that the trained weights are robust to the known
ber of hidden units when it comes to modeling a compli- quantization noise caused by a limited precision. It was
also shown that 10 bits and 12 bits are enough to represent
Proceedings of the 31 st International Conference on Machine gradients and storing weights for implementing the state-
Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copy- of-the-art maxout networks even for training the network
right 2015 by the author(s). (Courbariaux et al., 2014). However, in those quantized
Bitwise Neural Networks

networks one still needs to employ arithmetic operations, 2. Feedforward in Bitwise Neural Networks
such as multiplication and addition, on fixed-point values.
Even though faster than floating point, they still require rel- It has long been known that any Boolean function, which
atively complex logic and can consume a lot of power. takes binary values as input and produces binary outputs
as well, can be represented as a bitwise network with one
With the proposed Bitwise Neural Networks (BNN), we hidden layer (McCulloch & Pitts, 1943), for example, by
take a more extreme view that every input node, output merely memorizing all the possible mappings between in-
node, and weight, is represented by a single bit. For ex- put and output patterns. We define the forward propagation
ample, a weight matrix between two hidden layers of 1024 procedure as follows based on the assumption that we have
units is a 1024 × 1025 matrix of binary values rather than trained such a network with bipolar binary parameters:
quantized real values (including the bias). Although learn- l−1
K
ing those bitwise weights as a Boolean concept is an NP- X
complete problem (Pitt & Valiant, 1988), the bitwise net- ali = bli + l
wi,j ⊗ zjl−1 , (1)
j
works have been studied in the limited setting, such as µ-
zil = sign ali ,

perceptron networks where an input node is allowed to be (2)
connected to one and only one hidden node and its final l Kl l K l ×K l−1 l Kl
z ∈B ,W ∈ B ,b ∈ B , (3)
layer is a union of those hidden nodes (Golea et al., 1992).
A more practical network was proposed in (Soudry et al., where B is the set of bipolar binaries, i.e. ±11 , and ⊗ stands
2014) recently, where the posterior probabilities of the bi- for the bitwise XNOR operation (see Figure 1 (a)). l, j, and
nary weights were sought using the Expectation Back Prop- i indicate a layer, input and output units of the layer, respec-
agation (EBP) scheme, which is similar to backpropagation tively. We use bold characters for a vector (or a matrix if
in its form, but has some advantages, such as parameter- capicalized). K l is the number of input units at l-th layer.
free learning and a straightforward discretization of the Therefore, z0 equals to an input vector, where we omit the
weights. Its promising results on binary text classification sample index for the notational convenience. We use the
tasks however, rely on the real-valued bias terms and aver- sign activation function to generate the bipolar outputs.
aging of predictions from differently sampled parameters.
We can check the prediction error E by measuring the bit-
This paper presents a completely bitwise network where wise agreement of target vector t and the output units of
all participating variables are bipolar binaries. Therefore, L-th layer using XNOR as a multiplication operator,
in its feedforward only XNOR and bit counting operations L+1
KX
are used instead of multiplication, addition, and a nonlinear
1 − ti ⊗ ziL+1 /2,

activation on floating or fixed-point variables. For training, E= (4)
i
we propose a two-stage approach, whose first part is typical
network training with a weight compression technique that but this error function can be tentatively replaced by involv-
helps the real-valued model to easily be converted into a ing a softmax layer during the training phase.
BNN. To train the actual BNN, we use those compressed
The XNOR operation is a faster substitute of binary mul-
weights to initialize the BNN parameters, and do noisy
tiplication. Therefore, (1) and (2) can be seen as a special
backpropagation based on the tentative bitwise parameters.
version of the ordinary feedforward step that only works
To binarize the input signals, we can adapt any binariza-
when the inputs, weights, and bias are all bipolar binaries.
tion techniques, e.g. fixed-point representations and hash
Note that these bipolar bits will in practice be implemented
codes. Regardless of the binarization scheme, each input
using 0/1 binary values, where (2) activation is equivalent
node is given only a single bit at a time, as opposed to a bit
to counting the number of 1’s and then checking if the accu-
packet representing a fixed-point number. This is signifi-
mulation is bigger than the half of the number of input units
cantly different from the networks with quantized inputs,
plus 1. With no loss of generality, in this paper we will use
where a real-valued signal is quantized into a set of bits,
the ±1 bipolar representation since it is more flexible in
and then all those bits are fed to an input node in place of
defining hyperplanes and examining the network behavior.
their corresponding single real value. Lastly, we apply the
sign function as our activation function instead of a sig- Sometimes a BNN can solve the same problem as a real-
moid to make sure the input to the next layer is bipolar bi- valued network without any size modifications, but in gen-
nary as well. We compare the performance of the proposed eral we should expect that a BNN could require larger net-
BNN with its corresponding ordinary real-valued networks work structures than a real-valued one. For example, the
on hand-written digit recognition tasks, and show that the XOR problem in Figure 1 (b) can have an infinite num-
bitwise operations can do the job with a very small perfor- ber of solutions with real-valued parameters once a pair
mance loss, while providing a large margin of improvement 1
In the bipolar binary representation, +1 stands for the
in terms of the necessary computational resources. “TRUE” status, while −1 is for “FALSE.”
Bitwise Neural Networks

tions of a real-valued system, such as the power consump-

tion of multipliers and adders for the floating-point opera-
Input
Output (-1,1) (1,1) tions, various dynamic ranges of the fixed-point representa-
A B
-1 -1 +1 -x1+x2+1>0
tions, erroneous flips of the most significant bits, etc. Note
-1 +1 -1 x1-x2+1>0 that if the bitwise parameters are sparse, we can further
+1 -1 -1 reduce the number of hyperplanes. For example, for an in-
(-1,-1) (1,-1)
+1 +1 +1 active element in the weight matrix W due to the sparsity,
we can simply ignore the computation for it similarly to the
operations on the sparse representations. Conceptually, we
(a) (b)
can say that those inactive weights serve as zero weights, so
+1
that a BNN can solve the problem in Figure 1 (d) by using
-1 only one hyperplane as in (e). From now on, we will use
x1 -1
+1
(-1,1) (1,1) this extended version of BNN with inactive weights, yet
+1 +1
y there are some cases where BNN needs more hyperplanes
x2
-1
+1 than a real-valued network even with the sparsity.
x1-x2+1<0 x1+x2-1>0

+1 (-1,-1) (1,-1)
3. Training Bitwise Neural Networks
(c) (d)
We first train some compressed network parameters, and
then retrain them using noisy backpropagation for BNNs.
(-1,1) (1,1)

3.1. Real-valued Networks with Weight Compression

0x1+x2+0>0
First, we train a real-valued network that takes either bit-
(-1,-1) (1,-1)
wise inputs or real-valued inputs ranged between −1 and
(e) +1. A special part of this network is that we constrain
the weights to have values between −1 and +1 as well by
Figure 1. (a) An XNOR table. (b) The XOR problem that needs wrapping them with tanh. Similarly, if we choose tanh for
two hyperplanes. (c) A multi-layer perceptron that solves the the activation, we can say that the network is a relaxed ver-
XOR problem. (d) A linearly separable problem while bitwise sion of the corresponding bipolar BNN. With this weight
networks need two hyperplanes to solve it (y = x2 ). (e) A bit- compression technique, the relaxed forward pass during
wise network with zero weights that solves the y = x2 problem. training is defined as follows:
l−1
K
of hyperplanes can successfully discriminate (1, 1) and
X
ali = tanh(b̄li ) + l
tanh(w̄i,j )z̄jl−1 , (5)
(−1, −1) from (1, −1) and (−1, 1). Among all the pos- j
sible solutions, we can see that binary weights and bias are
z̄il = tanh ali ,

(6)
enough to define the hyperplanes, x1 − x2 + 1 > 0 and
−x1 + x2 + 1 > 0 (dashes). Likewise, the separation per- where all the binary values in (1) and (2) are real for the
l l−1 l l
formance of the particular BNN defined in (c) has the same time being: W̄l ∈ RK ×K , b̄l ∈ RK , and z̄l ∈ RK .
classification power once the inputs are binary as well. The bars on top of the notations are for the distinction.
Figure 1 (d) shows another example where BNN requires Weight compression needs some changes in the backprop-
more hyperplanes than a real-valued network. This linearly agation procedure. In a hidden layer we calculate the error,
separable problem is solvable with only one hyperplane, l+1
such as −0.1x1 + x2 + 0.5 > 0, but it is impossible to de- KX
scribe such a hyperplane with binary coefficients. We can δjl (n) = tanh(w̄i,j )δi (n) · 1 − tanh2 alj .
l+1 l+1

i
instead come up with a solution by combining multiple bi-
nary hyperplanes that will eventually increase the perceived Note that the errors fron the next layer are multiplied with
complexity of the model. However, even with a larger num- the compressed versions of the weights. Hence, the gradi-
ber of nodes, the BNN is not necessarily more complex ents of the parameters in the case of batch learning are
than the smaller real-valued network. This is because a pa- X
rameter or a node of BNN requires only one bit to represent
l
∇w̄i,j = δil (n)z̄jl−1 · 1 − tanh2 w̄i,j
l
,
n
while a real-valued node generally requires more than that, X
up to 64 bits. Moreover, the simple XNOR and bit count- ∇b̄li = δil (n) · 1 − tanh2 b̄li ,
ing operations of BNN bypass the computational complica- n
Bitwise Neural Networks

with the additional term from the chain rule on the com-
Table 1. Classification errors for real-valued and bitwise networks
pressed weights.
on different types of bitwise features

3.2. Training BNN with Noisy Backpropagation

Since we have trained a real-valued network with a proper N ETWORKS B IPOLAR 0 OR 1
F IXED - POINT
range of weights, what we do next is to train the actual (2 BITS )
bitwise network. The training procedure is similar to the
ones with quantized weights (Fiesler et al., 1990; Hwang F LOATING - POINT
1.17% 1.32% 1.36%
NETWORKS (64 BITS )
& Sung, 2014), except that the values we deal with are all
bits, and the operations on them are bitwise. To this end, BNN 1.33% 1.36% 1.47%
we first initialize all the real-valued parameters, W̄ and b̄,
with the ones learned from the previous section. Then, we
setup a sparsity parameter λ which says the proportion of
the zeros after the binarization. Then, we divide the param-
eters into three groups: +1, 0, or −1. Therefore, λ decides
l l
the boundaries β, e.g. wij = −1 if w̄ij < −β. Note that the network suitable for initializing the following bipolar
the number of zero weights |w̄ij | < β equals to λK l K l−1 .
l
bitwise network. The number of iterations from 500 to
1, 000 was enough to build a baseline. The first row of
The main idea of this second training phase is to feedfor-
Table 1 shows the performance of the baseline real-valued
ward using the binarized weights and the bit operations as
network with 64bits floating-point. As for the input to the
in (1) and (2). Then, during noisy backpropagation the
real-valued networks, we rescale the pixel intensities into
errors and gradients are calculated using those binarized
the bipolar range, i.e. from −1 to +1, for the bipolar case
weights and signals as well:
(the first column). In the second column, we use the origi-
K
X
l+1 nal input between 0 and 1 as it is. For the third column, we
l+1 l+1
δjl (n) = wi,j δi (n), encode the four equally spaced regions between 0 to 1 into
i two bits, and feed each bit into each input node. Hence, the
X X baseline network for the third input type has 1, 568 binary
l
∇w̄i,j = δil (n)zjl−1 , ∇b̄li = δil (n). (7)
n n
input nodes rather than 784 as in the other cases.
Once we learn the real-valued parameters, now we train
In this way, the gradients and errors properly take the bina- the BNN, but with binarized inputs. For instance, instead
rization of the weights and the signals into account. Since of real values between −1 and +1 in the bipolar case, we
the gradients can get too small to update the binary param- take their sign as the bipolar binary features. As for the 0/1
eters W and b, we instead update their corresponding real- binaries, we simply round the pixel intensity. Fixed-point
valued parameters, inputs are already binarized. Now we train the new BNN
l l l
with the noisy backpropagation technique as described in
w̄i,j ← w̄i,j − η∇w̄i,j , b̄li,j ← b̄li − η∇b̄li , (8) 3.2. The second row of Table 1 shows the BNN results. We
see that the bitwise networks perform well with very small
with η as a learning rate parameter. Finally, at the end of
additional errors. Note that the performance of the original
each update we binarize them again with β. We repeat this
real-valued dropout network with similar network topology
procedure at every epoch.
(logistic units without max-norm constraint) is 1.35%.

4. Experiments 5. Conclusion
In this section we go over the details and results of the
In this work we propose a bitwise version of artificial neu-
hand-written digit recognition task on the MNIST data
ral networks, where all the inputs, weights, biases, hid-
set (LeCun et al., 1998) using the proposed BNN system.
den units, and outputs can be represented with single bits
Throughout the training, we adopt the softmax output layer
and operated on using simple bitwise logic. Such a net-
for these multiclass classification cases. All the networks
work is very computationally efficient and can be valuable
have three hidden layers with 1024 units per layer.
for resource-constrained situations, particularly in cases
From the first round of training, we get a regular dropout where floating-point / fixed-point variables and operations
network with the same setting suggested in (Srivastava are prohibitively expensive. In the future we plan to in-
et al., 2014), except the fact that we used the hyperbolic vestigate a bitwise version of convolutive neural networks,
tangent for both weight compression and activation to make where efficient computing is more desirable.
Bitwise Neural Networks

References McCulloch, W. S. and Pitts, W. H. A logical calculus of

the ideas immanent in nervous activity. The Bulletin of
Baldauf, M., Dustdar, S., and Rosenberg, F. A survey
Mathematical Biophysics, 5(4):115–133, 1943.
on context-aware systems. International Journal of Ad
Hoc and Ubiquitous Computing, 2(4):263–277, January Pitt, L. and Valiant, L. G. Computational limitations on
2007. learning from examples. Journal of the Association for
Computing Machinery, 35:965–984, 1988.
Bengio, Y. Learning deep architectures for AI. Foundations
and Trends in Machine Learning, 2(1):1–127, 2009. Salakhutdinov, R. and Hinton, G. Semantic hashing. Inter-
national Journal of Approximate Reasoning, 50(7):969
Courbariaux, M., Bengio, Y., and David, J.-P. Low pre- – 978, 2009.
cision arithmetic for deep learning. arXiv preprint
arXiv:1412.7024, 2014. Soudry, D., Hubara, I., and Meir, R. Expectation backprop-
agation: Parameter-free training of multilayer neural net-
Cybenko, G. Approximations by superpositions of sig- works with continuous or discrete weights. In Advances
moidal functions. Mathematics of Control, Signals, and in Neural Information Processing Systems (NIPS), 2014.
Systems, 2(4):303–314, 1989.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,
Fiesler, E., Choudry, A., and Caulfield, H. J. Weight dis- and Salakhutdinov, R. Dropout: A simple way to prevent
cretization paradigm for optical neural networks. In The neural networks from overfitting. Journal of Machine
Hague’90, 12-16 April, pp. 164–173. International Soci- Learning Research, 15(1):1929–1958, January 2014.
ety for Optics and Photonics, 1990.
Xu, Y., Du, J., Dai, L.-R., and Lee, C.-H. An experimen-
Golea, M., Marchand, M., and Hancock, T. R. On learning tal study on speech enhancement based on deep neural
µ-perceptron networks with binary weights. In Advances networks. IEEE Signal Processing Letters, 21(1):65–68,
in Neural Information Processing Systems (NIPS), pp. 2014.
591–598, 1992.

Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville,

A., and Bengio, Y. Maxout networks. In Proceedings
of the International Conference on Machine Learning
(ICML), 2013.

Hinton, G., Deng, L., Yu, D., Dahl, G. E., M, Abdel-

rahman, Jaitly, N., Senior, A., Vanhoucke, V., Nguyen,
P., Sainath, T., and Kingsbury, B. Deep neural networks
for acoustic modeling in speech recognition: The shared
views of four research groups. IEEE Signal Processing
Magazine, 29(6):82–97, 2012.

Hinton, G. E., Osindero, S., and Teh, Y. A fast learning

algorithm for deep belief nets. Neural Computation, 18
(7):1527–1554, 2006.

Hornik, K. Approximation capabilities of multilayer feed-

forward networks. Neural Networks, 4(2):251–257,
1991.

Hwang, K. and Sung, W. Fixed-point feedforward deep

neural network design using weights +1, 0, and -1.
In 2014 IEEE Workshop on Signal Processing Systems
(SiPS), Oct 2014.

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-

based learning applied to document recognition. Pro-
ceedings of the IEEE, 86(11):2278–2324, November
1998.

Awesome Machine Learning Papers
100% (1)
Awesome Machine Learning Papers
326 pages
Hands-On Bayesian Neural Network
No ratings yet
Hands-On Bayesian Neural Network
28 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
30 pages
Towards A Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't
No ratings yet
Towards A Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't
56 pages
From Perceptron To Deep Neural Nets - Becoming Human - Artificial Intelligence Magazine
No ratings yet
From Perceptron To Deep Neural Nets - Becoming Human - Artificial Intelligence Magazine
36 pages
Neural Networks and Neural Language Models
No ratings yet
Neural Networks and Neural Language Models
27 pages
3an Empirical Study of Binary N
No ratings yet
3an Empirical Study of Binary N
11 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
Binaryconnect: Training Deep Neural Networks With Binary Weights During Propagations
No ratings yet
Binaryconnect: Training Deep Neural Networks With Binary Weights During Propagations
9 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
14 Deep
No ratings yet
14 Deep
6 pages
BackProp in Recurrent NNs
100% (1)
BackProp in Recurrent NNs
10 pages
L05 Slides - mlp2
No ratings yet
L05 Slides - mlp2
21 pages
Neural Networks
No ratings yet
Neural Networks
11 pages
E-District Crs PDF
No ratings yet
E-District Crs PDF
20 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
19 pages
Mid Summary
No ratings yet
Mid Summary
13 pages
CDS Views
100% (1)
CDS Views
24 pages
Lecture Slides
No ratings yet
Lecture Slides
30 pages
MLT Answer Key
No ratings yet
MLT Answer Key
10 pages
Machine Learning For Transportation Research and Applications - Chapter 4
No ratings yet
Machine Learning For Transportation Research and Applications - Chapter 4
15 pages
A Lightweight Binarized Convolutional Neural Network Model For Small Memory and Low-Cost Mobile Devices
No ratings yet
A Lightweight Binarized Convolutional Neural Network Model For Small Memory and Low-Cost Mobile Devices
11 pages
Hernandez Lobatoc15
No ratings yet
Hernandez Lobatoc15
9 pages
BDA Unit 2
No ratings yet
BDA Unit 2
48 pages
Dave Reed: Connectionist Approach To AI
No ratings yet
Dave Reed: Connectionist Approach To AI
26 pages
Principle of Soft Computing-34-50-16-17
No ratings yet
Principle of Soft Computing-34-50-16-17
2 pages
Back To Simplicit - How To Train Accurate BNNs From Scratch
No ratings yet
Back To Simplicit - How To Train Accurate BNNs From Scratch
9 pages
Artificial Neural Networks and Deep Learning
No ratings yet
Artificial Neural Networks and Deep Learning
39 pages
Physics Informed Neural Network Theory and Applications
No ratings yet
Physics Informed Neural Network Theory and Applications
44 pages
1710 11573 PDF
No ratings yet
1710 11573 PDF
14 pages
978-3-030-41068-1 (1) - 133-188
No ratings yet
978-3-030-41068-1 (1) - 133-188
56 pages
Unit 8
No ratings yet
Unit 8
8 pages
CNS MCQ
100% (2)
CNS MCQ
17 pages
Field Engineer DT
No ratings yet
Field Engineer DT
3 pages
Quick Reference Ceasar II
No ratings yet
Quick Reference Ceasar II
54 pages
On The Geometry of Deep Learning
No ratings yet
On The Geometry of Deep Learning
14 pages
Neural Networks Unit-3
No ratings yet
Neural Networks Unit-3
14 pages
Chartjs Tutorial For Beginners: @codewallblog
No ratings yet
Chartjs Tutorial For Beginners: @codewallblog
20 pages
Got f900 Serise Operation M
No ratings yet
Got f900 Serise Operation M
422 pages
NN Moraish
No ratings yet
NN Moraish
2 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Chapter 3
No ratings yet
Chapter 3
30 pages
Excel and Exceed Basel-II
No ratings yet
Excel and Exceed Basel-II
20 pages
The Unofficial Studio One 3 Keyboard Shortcuts Guide
No ratings yet
The Unofficial Studio One 3 Keyboard Shortcuts Guide
19 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Chapter 7
No ratings yet
Chapter 7
31 pages
ML Unit-5
No ratings yet
ML Unit-5
11 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
RBFN and TDNN
No ratings yet
RBFN and TDNN
42 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Notes For Oracle
No ratings yet
Notes For Oracle
9 pages
VFP Cross Tab Query Vs
No ratings yet
VFP Cross Tab Query Vs
2 pages
A Survey of Randomized Algorithms For Training Neural Networks
No ratings yet
A Survey of Randomized Algorithms For Training Neural Networks
10 pages
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
No ratings yet
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
31 pages
INF305 Notes (Pressman)
No ratings yet
INF305 Notes (Pressman)
57 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
Social Media in Ecommerce
No ratings yet
Social Media in Ecommerce
41 pages
Windows 8-1 Product-Guide PDF
No ratings yet
Windows 8-1 Product-Guide PDF
27 pages
Lecture W15ab
No ratings yet
Lecture W15ab
44 pages
DL 2
No ratings yet
DL 2
62 pages
Bi Apps Financial Analytics On Jde
No ratings yet
Bi Apps Financial Analytics On Jde
57 pages
Contents MLP PDF
No ratings yet
Contents MLP PDF
60 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
Information Sciences: Le Zhang, P.N. Suganthan
No ratings yet
Information Sciences: Le Zhang, P.N. Suganthan
3 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Exam 1 EE 371 Summer 2000: University of Washington Department of Electrical Engineering
No ratings yet
Exam 1 EE 371 Summer 2000: University of Washington Department of Electrical Engineering
6 pages
Create Crossword Puzzle
No ratings yet
Create Crossword Puzzle
4 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Deep Feedforward Networks Application To Patter Recognition
No ratings yet
Deep Feedforward Networks Application To Patter Recognition
5 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Standard Calibration Procedure Weighing Scale Doc. No. Call/SCP/019 Rev. 00 May 01, 2015
No ratings yet
Standard Calibration Procedure Weighing Scale Doc. No. Call/SCP/019 Rev. 00 May 01, 2015
4 pages
Random Processes: Professor Ke-Sheng Cheng
No ratings yet
Random Processes: Professor Ke-Sheng Cheng
23 pages
Iritel - Sunce+ Network Management System (NMS) - Otn DWDM SDH Sonet
No ratings yet
Iritel - Sunce+ Network Management System (NMS) - Otn DWDM SDH Sonet
2 pages
Wilshire Software Technologies: Adv. Shell Scripting Schedule
No ratings yet
Wilshire Software Technologies: Adv. Shell Scripting Schedule
1 page
Bitonic Sort
No ratings yet
Bitonic Sort
4 pages
Porting Process Simplied
No ratings yet
Porting Process Simplied
7 pages
Rsa Examples
No ratings yet
Rsa Examples
4 pages
OOPS in Java
No ratings yet
OOPS in Java
15 pages
Linear Separability Linearly Separable Data Non-Linearly Separable Data
No ratings yet
Linear Separability Linearly Separable Data Non-Linearly Separable Data
1 page
GlideFast Consulting LLC Announced Today It Has Achieved Gold Services Partner Status From ServiceNow, The Enterprise Cloud Company
No ratings yet
GlideFast Consulting LLC Announced Today It Has Achieved Gold Services Partner Status From ServiceNow, The Enterprise Cloud Company
2 pages
DDFDFD
No ratings yet
DDFDFD
1 page
Citect SCADA Help - Converting Legacy AlarmSetQuery Functions - System Model
No ratings yet
Citect SCADA Help - Converting Legacy AlarmSetQuery Functions - System Model
2 pages
Ke Unit 4 Notes
No ratings yet
Ke Unit 4 Notes
22 pages
M2-Mutual Exclusion in Synchronization
No ratings yet
M2-Mutual Exclusion in Synchronization
4 pages
Software-Defined Networks: A Systems Approach
From Everand
Software-Defined Networks: A Systems Approach
Larry Peterson
5/5 (1)
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Bitwise Neural Network

Uploaded by

Bitwise Neural Network

Uploaded by

Bitwise Neural Networks

Minje Kim MINJE @ ILLINOIS . EDU

Abstract cated function, Deep Neural Networks (DNN) achieve the

tions of a real-valued system, such as the power consump-

3.1. Real-valued Networks with Weight Compression

3.2. Training BNN with Noisy Backpropagation

References McCulloch, W. S. and Pitts, W. H. A logical calculus of

Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville,

Hinton, G., Deng, L., Yu, D., Dahl, G. E., M, Abdel-

Hinton, G. E., Osindero, S., and Teh, Y. A fast learning

Hornik, K. Approximation capabilities of multilayer feed-

Hwang, K. and Sung, W. Fixed-point feedforward deep

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.