0% found this document useful (0 votes)

17 views18 pages

Reactnet: Towards Precise Binary Neural Network With Generalized Activation Functions

Uploaded by

zhuochentao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views18 pages

Reactnet: Towards Precise Binary Neural Network With Generalized Activation Functions

Uploaded by

zhuochentao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

ReActNet: Towards Precise Binary Neural

Network with Generalized Activation Functions

Zechun Liu1,2? , Zhiqiang Shen2† , Marios Savvides2 , and Kwang-Ting Cheng1

1
Hong Kong University of Science and Technology, 2 Carnegie Mellon University
zliubq@connect.ust.hk, {zhiqians,marioss}@andrew.cmu.edu, timcheng@ust.hk
arXiv:2003.03488v2 [cs.CV] 13 Jul 2020

Abstract. In this paper, we propose several ideas for enhancing a bi-

nary network to close its accuracy gap from real-valued networks without
incurring any additional computational cost. We first construct a base-
line network by modifying and binarizing a compact real-valued network
with parameter-free shortcuts, bypassing all the intermediate convolu-
tional layers including the downsampling layers. This baseline network
strikes a good trade-off between accuracy and efficiency, achieving su-
perior performance than most of existing binary networks at approxi-
mately half of the computational cost. Through extensive experiments
and analysis, we observed that the performance of binary networks is
sensitive to activation distribution variations. Based on this important
observation, we propose to generalize the traditional Sign and PReLU
functions, denoted as RSign and RPReLU for the respective general-
ized functions, to enable explicit learning of the distribution reshape
and shift at near-zero extra cost. Lastly, we adopt a distributional loss
to further enforce the binary network to learn similar output distribu-
tions as those of a real-valued network. We show that after incorporating
all these ideas, the proposed ReActNet outperforms all the state-of-the-
arts by a large margin. Specifically, it outperforms Real-to-Binary Net
and MeliusNet29 by 4.0% and 3.6% respectively for the top-1 accuracy
and also reduces the gap to its real-valued counterpart to within 3.0%
top-1 accuracy on ImageNet dataset. Code and models are available at:
https://github.com/liuzechun/ReActNet.

1 Introduction
The 1-bit convolutional neural network (1-bit CNN, also known as binary neu-
ral network) [7,30], of which both weights and activations are binary, has been
recognized as one of the most promising neural network compression methods
for deploying models onto the resource-limited devices. It enjoys 32× memory
compression ratio, and up to 58× practical computational reduction on CPU, as
demonstrated in [30]. Moreover, with its pure logical computation (i.e., XNOR
operations between binary weights and binary activations), 1-bit CNN is both
highly energy-efficient for embedded devices [8,40], and possesses the potential
of being directly deployed on next generation memristor-based hardware [17].
? †
Work done while visiting CMU. Corresponding author.
ECCV ECCV
2 Z. Liu et al. #2106 #2106

ECCV-20 submission ID 2106 15

630 630
Methods OPs (⇥ 108 ) Top-1 Acc(%)
631 Methods OPs (×108 ) Top-1 Acc(%) 631
Real-to-Binary Baseline [3] 1.63 60.9
632
Real-to-Binary Baseline [3] 1.63 60.9 632
Our ReAct Baseline Net
Our ReAct Baseline Net 0.87 0.87 61.1 61.1
633 XNORNet [29]
XNORNet [30] 1.67 1.67 51.2 51.2
633
634 Bi-RealNet [23] [23]
Bi-RealNet 1.63 1.63 56.4 56.4 634
635 Real-to-Binary [3]
Real-to-Binary [3] 1.65 1.65 65.4 65.4 635
636 MeliusNet29 [2]
MeliusNet29 [2] 2.14 2.14 65.8 65.8 636
637
Our ReActNet-A
Our ReActNet-A 0.87 0.87 69.4 69.4 637
638
MeliusNet59 [2]
MeliusNet59 [2] 5.32 5.32 70.7 70.7 638
Our ReActNet-C
Our ReActNet-C 2.14 2.14 71.4 71.4
639 639
640 Table 3. 640
641 641
642 642
Fig. 1. Computational cost 643 vs. ImageNet Accuracy. Proposed ReActNets signif- 643

icantly outperform other binary

644 References
neural networks. In particular, ReActNet-C achieves 644
645 645
state-of-the-art result with 71.4%
646
top-1
1. accuracy
Alizadeh, but being
M., Fernández-Marqués, 2.5×N.D.,
J., Lane, more efficient
Gal, Y.: than
An empirical study of 646
MeliusNet59. ReActNet-A exceeds
647 Real-to-Binary
binary neural networks’Net and MeliusNet29
optimisation (2018) 4 by 4.0% and 647
2. Bethge, J., Bartz, C., Yang, H., Chen, Y., Meinel, C.: Meliusnet: Can binary neural
3.6% top-1 accuracy, respectively,
648
and with
networks more
achieve than 1.9×
mobilenet-level computational
accuracy? reduction.
arXiv preprint arXiv:2001.05936 (2020)
648
649 649
Details are described in Section
650 5.2.4, 10, 11, 15 650
3. Brais Martinez, Jing Yang, A.B.G.T.: Training binary neural networks with real-to-
651 651
binary convolutions. International Conference on Learning Representations (2020)
652 2, 3, 4, 6, 9, 10, 11, 12, 15 652
653 4. Bulat, A., Tzimiropoulos, G.: Xnor-net++: Improved binary neural networks. 653
654 British Machine Vision Conference (2019) 4, 6 654
Despite these attractive655
characteristics of 1-bit CNN, the severe accuracy
5. Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half- 655
degradation prevents it from656 being broadly deployed. For example, a representa-
wave gaussian quantization. In: Proceedings of the IEEE Conference on Computer 656
657 Vision and Pattern Recognition. pp. 5918–5926 (2017) 11 657
tive binary network, XNOR-Net
658
[30] only achieves 51.2% accuracy on the Ima-
6. Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object 658
659 leaving a ∼ 18% accuracy gap from the real-valued
detection models with knowledge distillation. In: Advances in Neural Information
geNet classification dataset, Processing Systems. pp. 742–751 (2017) 3
659
660 660
ResNet-18. Some preeminent 661
binary networks [8,37] show good performance on
7. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural
661
networks: Training deep neural networks with weights and activations constrained
small datasets such as CIFAR10
662 and MNIST, but still encounter severe accuracy
to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016) 1, 3, 11 662
663 8. Ding, R., Chin, T.W., Liu, Z., Marculescu, D.: Regularizing activation distribution 663
drop when applied to a large664
dataset such as ImageNet.
for training binarized deep networks. In: Proceedings of the IEEE Conference on 664

In this study, our motivation

665 Computer Vision and Pattern Recognition. pp. 11408–11417 (2019) 1, 2
is to further close the performance gap between 665
666 9. Ding, X., Zhou, X., Guo, Y., Han, J., Liu, J., et al.: Global sparse momentum 666
binary neural networks and 667 real-valued networks on the challenging large-scale
sgd for pruning very deep neural networks. In: Advances in Neural Information 667
668 Processing Systems. pp. 6379–6391 (2019) 3 668
datasets. We start with designing
669
a high-performance baseline network. Inspired
10. Faraone, J., Fraser, N., Blott, M., Leong, P.H.: Syq: Learning symmetric quanti- 669
by the recent advances in real-valued
670 compact neural network design, we choose
zation for efficient deep neural networks. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. pp. 4300–4309 (2018) 11
670

MobileNetV1 [15] structure671

672
as our binarization backbone, which we believe is of
11. Gu, J., Li, C., Zhang, B., Han, J., Cao, X., Liu, J., Doermann, D.: Projection
671
672
convolutional neural networks for 1-bit cnns via discrete back propagation. In:
greater practical value than673 binarizing non-compact models. Following the in-
Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 8344– 673
674 674
sights highlighted in [23], we adopt blocks with identity shortcuts which bypass
8351 (2019) 4, 11

1-bit vanilla convolutions to replace the convolutions in MobileNetV1. Moreover,

we propose to use a concatenation of two of such blocks to handle the channel
number mismatch in the downsampling layers, as shown in Fig. 2(a). This base-
line network design not only helps avoid real-valued convolutions in shortcuts,
which effectively reduces the computation to near half of that needed in preva-
lent binary neural networks [30,23,3], but also achieves a high top-1 accuracy of
61.1% on ImageNet.
To further enhance the accuracy, we investigate another aspect which has not
been studied in previous binarization or quantization works: activation distribu-
tion reshaping and shifting via non-linearity function design. We observed that
the overall activation value distribution affects the feature representation, and
this effect will be exaggerated by the activation binarization. A small distribu-
tion value shift near zero will cause the binarized feature map to have a disparate
appearance and in turn will influence the final accuracy. This observation will be
elaborated in Section 4.2. Enlightened by this observation, we propose a new gen-
ReActNet 3

eralization of Sign function and PReLU function to explicitly shift and reshape
the activation distribution, denoted as ReAct-Sign (RSign) and ReAct-PReLU
(RPReLU) respectively. These activation functions adaptively learn the param-
eters for distributional reshaping, which enhance the accuracy of the baseline
network by ∼ 7% with negligible extra computational cost.
Furthermore, we propose a distributional loss to enforce the output distribu-
tion similarity between the binary and real-valued networks, which further boosts
the accuracy by ∼ 1%. After integrating all these ideas, the proposed network,
dubbed as ReActNet, achieves 69.4% top-1 accuracy on ImageNet with only 87M
OPs, surpassing all previously published works on binary networks and reduce
the accuracy gap from its real-valued counterpart to only 3.0%, shown in Fig. 1.
We summarize our contributions as follows:

– We design a baseline binary network by modifying MobileNetV1, whose per-

formance already surpasses most of the previously published work on binary
networks while incurring only half of the computational cost.
– We propose a simple channel-wise reshaping and shifting operation on the
activation distribution, which helps binary convolutions spare the compu-
tational power in adjusting the distribution to learn more representative
features.
– We further adopt a distributional loss between binary and real-valued net-
work outputs, replacing the original loss, which facilitates the binary network
to mimic the distribution of a real-valued network.
– We demonstrate that our proposed ReActNet, which integrates the above
mentioned contributions, achieves 69.4% top-1 accuracy on ImageNet, for
the first time, exceeding the benchmarking ResNet-level accuracy (69.3%)
while achieving more than 22× reduction in computational complexity. This
result also outperforms the state-of-the-art binary network [3] by 4.0% top-1
accuracy while incurring only half the OPs1 .

2 Related Work

There have been extensive studies on neural network compression and accelera-
tion, including quantization [46,39,43], pruning [9,12,24,22], knowledge distilla-
tion [14,33,6] and compact network design [15,32,25,41]. A comprehensive survey
can be found in [35]. The proposed method falls into the category of quantiza-
tion, specifically the extreme case of quantizing both weights and activations to
only 1-bit, which is so-called network binarization or 1-bit CNNs.
Neural network binarization originates from EBP [34] and BNN [7], which
establish an end-to-end gradient back-propagation framework for training the
discrete binary weights and activations. As an initial attempt, BNN [7] demon-
strated its success on small classification datasets including CIFAR10 [16] and
MNIST [27], but encountered severe accuracy drop on a larger dataset such as
1
OPs is a sum of binary OPs and floating-point OPs, i.e., OPs = BOPs/64 + FLOPs.
4 Z. Liu et al.

ImageNet [31], only achieving 42.2% top-1 accuracy compared to 69.3% of the
real-valued version of the ResNet-18.
Many follow-up studies focused on enhancing the accuracy. XNOR-Net [30],
which proposed real-valued scaling factors to multiply with each of binary weight
kernels, has become a representative binarization method and enhanced the top-
1 accuracy to 51.2%, narrowing the gap to the real-valued ResNet-18 to ∼18%.
Based on the XNOR-Net design, Bi-Real Net [23] proposed to add shortcuts
to propagate real-values along the feature maps, which further boost the top-1
accuracy to 56.4%.
Several recent studies attempted to improve the binary network performance
via expanding the channel width [26], increasing the network depth [21] or using
multiple binary weight bases [19]. Despite improvement to the final accuracy, the
additional computational cost offsets the BNNs high compression advantage.
For network compression, the real-valued network design used as the starting
point for binarization should be compact. Therefore, we chose MobileNetV1 as
the backbone network for development of our baseline binary network, which
combined with several improvements in implementation achieves ∼ 2× further
reduction in the computational cost compared to XNOR-Net and Bi-Real Net,
and a top-1 accuracy of 61.1%, as shown in Fig. 1.
In addition to architectural design [2,23,28], studies on 1-bit CNNs expand
from training algorithms [36,1,46,3], binary optimizer design [13], regulariza-
tion loss design [8,29], to better approximation of binary weights and activa-
tions [30,11,37]. Different from these studies, this paper focuses on a new aspect
that is seldom investigated before but surprisingly crucial for 1-bit CNNs ac-
curacy, i.e. activation distribution reshaping and shifting. For this aspect, we
propose novel ReAct operations, which are further combined with a proposed
distributional loss. These enhancements improve the accuracy to 69.4%, further
shrinking the accuracy gap to its real-valued counterpart to only 3.0%. The base-
line network design and ReAct operations, as well as the proposed loss function
are detailed in Section 4.

3 Revisit: 1-bit Convolution

In a 1-bit convolutional layer, both weights and activations are binarized to -1
and +1, such that the computationally heavy operations of floating-point matrix
multiplication can be replaced by light-weighted bitwise XNOR operations and
popcount operations [4], as:
Xb ∗ Wb = popcount(XNOR(Xb , Wb )), (1)
where Wb and Xb indicate the matrices of binary weights and binary activations.
Specifically, weights and activations are binarized through a sign function:
(
+ ||Wnr ||l1 , if wr > 0

+1, if xr > 0 ||Wr ||l1
xb = Sign(xr ) = , wb = Sign(wr ) =
−1, if xr ≤ 0 n − ||Wnr ||l1 , if wr ≤ 0
(2)
ReActNet 5

ReAct Sign
ReAct Sign
Sign 1-bit 3x3 Conv, s=2 2x2 AvgPool,
Sign s=2
2x2 AvgPool, 1-bit 3x3 Conv
1-bit 3x3 Conv, s=2 BatchNorm
1-bit 3x3 Conv s=2
BatchNorm
BatchNorm
BatchNorm
ReAct PReLU Duplicate activation
Duplicate activation
ReAct PReLU

Sign Sign ReAct Sign ReAct Sign

Sign ReAct Sign
1-bit 1x1 Conv 1-bit 1x1 Conv 1-bit 1x1 Conv 1-bit 1x1 Conv
1-bit 1x1 Conv 1-bit 1x1 Conv
BatchNorm BatchNorm BatchNorm BatchNorm
BatchNorm BatchNorm

Concatenate ReAct PReLU ReAct PReLU

ReAct PReLU
Reduction Block Concatenate
Normal Block
Normal Block Reduction Block

(a) Proposed baseline network block (b) Proposed ReActNet block

Fig. 2. The proposed baseline network modified from MobileNetV1 [15], which replaces
the original (3×3 depth-wise and 1×1 point-wise) convolutional pairs by the proposed
blocks. (a) The baselines configuration in terms of channel and layer numbers is iden-
tical to that of MobileNetV1. If the input and output channel numbers are equal in a
dw-pw-conv pair in the original network, a normal block is used, otherwise a reduction
block is adopted. For the reduction block, we duplicate the input activation and con-
catenate the outputs to increase the channel number. As a result, all 1-bit convolutions
have the same input and output channel numbers and are bypassed by identity short-
cuts. (b) In the proposed ReActNet block, ReAct-Sign and ReAct-PReLU are added
to the baseline network.

The subscripts b and r denote binary and real-valued, respectively. The weight
binarization method is inherited from [30], of which, ||Wnr ||l1 is the average of ab-
solute weight values, used as a scaling factor to minimize the difference between
binary and real-valued weights. XNOR-Net [30] also applied similar real-valued
scaling factor to binary activations. Note that with introduction of the proposed
ReAct operations, to be described in Section 4.2, this scaling factor for activa-
tions becomes unnecessary and can be eliminated.

4 Methodology

In this section, we first introduce our proposed baseline network in Section 4.1.
Then we analyze how the variation in activation distribution affects the feature
quality and in turn influences the final performance. Based on this analysis, we
introduce ReActNet which explicitly reshapes and shifts the activation distri-
bution using ReAct-PReLU and ReAct-Sign functions described in Section 4.2
and matches the outputs via a distributional loss defined between binary and
real-valued networks detailed in Section 4.3.
6 Z. Liu et al.

4.1 Baseline Network

Most studies on binary neural networks have been binarizing the ResNet struc-
ture. However, further compressing compact networks, such as the MobileNets,
would be more logical and of greater interest for practical applications. Thus, we
chose MobileNetV1 [15] structure for constructing our baseline binary network.
Inspired by Bi-Real Net [23], we add a shortcut to bypass every 1-bit convo-
lutional layer that has the same number of input and output channels. The 3×3
depth-wise and the 1×1 point-wise convolutional blocks in the MobileNetV1 [15]
are replaced by the 3×3 and 1×1 vanilla convolutions in parallel with shortcuts,
respectively, as shown in Fig. 2.
Moreover, we propose a new structure design to handle the downsampling
layers. For the downsampling layers whose input and output feature map sizes
differ, previous works [23,37,3] adopt real-valued convolutional layers to match
their dimension and to make sure the real-valued feature map propagating along
the shortcut will not be “cut off” by the activation binarization. However, this
strategy increases the computational cost. Instead, our proposal is to make sure
that all convolutional layers have the same input and output dimensions so that
we can safely binarize them and use a simple identity shortcut for activation
propagation without additional real-valued matrix multiplications.
As shown in Fig. 2(a), we duplicate input channels and concatenate two
blocks with the same inputs to address the channel number difference and also
use average pooling in the shortcut to match spatial downsampling. All layers
in our baseline network are binarized, except the first input convolutional layer
and the output fully-connect layer. Such a structure is hardware friendly.

4.2 ReActNet

The intrinsic property of an image classification neural network is to learn a

mapping from input images to the output logits. A logical deduction is that a
good performing binary neural network should learn similar logits distribution as
a real-valued network. However, the discrete values of variables limit binary neu-
ral networks from learning as rich distributional representations as real-valued
ones. To address it, XNOR-Net [30] proposed to calculate analytical real-valued
scaling factors and multiply them with the activations. Its follow-up works [38,4]
further proposed to learn these factors through back-propagation.
In contrast to these previous works, this paper focuses on a different aspect:
the activation distribution. We observed that small variations to activation dis-
tributions can greatly affect the semantic feature representations in 1-bit CNNs,
which in turn will influence the final performance. However, 1-bit CNNs have
limited capacity to learn appropriate activation distributions. To address this
dilemma, we introduce generalized activation functions with learnable coeffi-
cients to increase the flexibility of 1-bit CNNs for learning semantically-optimized
distributions.
Distribution Matters in 1-bit CNNs The importance of distribution has not
been investigated much in training a real-valued network, because with weights
ReActNet 7

Real Binary Real Binary Real Binary

Real-valued Real-valued Real-valued

Histogram Histogram Histogram

(a) Negatively shifted input (b) Original input (c) Positively shifted input

Fig. 3. An illustration of how distribution shift affects feature learning in binary neural
networks. An ill-shifted distribution will introduce (a) too much background noise or
(c) too few useful features, which harms feature learning.

and activations being continuous real values, reshaping or moving distributions

would be effortless.
However, for 1-bit CNNs, learning distribution is both crucial and difficult.
Because the activations in a binary convolution can only choose values from
{−1, +1}, making a small distributional shift in the input real-valued feature
map before the sign function can possibly result in a completely different output
binary activations, which will directly affect the informativeness in the feature
and significantly impact the final accuracy. For illustration, we plot the output
binary feature maps of real-valued inputs with the original (Fig. 3(b)), positively-
shifted (Fig. 3(a)), and negatively-shifted (Fig. 3(c)) activation distributions.
Real-valued feature maps are robust to the shifts with which the legibility of
semantic information will pretty much be maintained, while binary feature maps
are sensitive to these shifts as illustrated in Fig. 3(a) and Fig. 3(c).
Explicit Distribution Reshape and Shift via Generalized Activation
Functions Based on the aforementioned observation, we propose a simple yet
effective operation to explicitly reshape and shift the activation distributions,
dubbed as ReAct, which generalizes the traditional Sign and PReLU func-
tions to ReAct-Sign (abbreviated as RSign) and ReAct-PReLU (abbreviated
as RPReLU) respectively.
Definition
Essentially, RSign is defined as a sign function with channel-wisely learnable
thresholds:
+1, if xri > αi

b r
xi = h(xi ) = . (3)
−1, if xri ≤ αi

Here, xri is real-valued input of the RSign function h on the i th channel, xbi is
the binary output and αi is a learnable coefficient controlling the threshold. The
subscript i in αi indicates that the threshold can vary for different channels. The
superscripts b and r refer to binary and real values. Fig. 4(a) shows the shapes
of RSign and Sign.
8 Z. Liu et al.

ℎ(#) ℎ(#) &(#) &(#)

% &# =# & # =#−)++

# # # #
& # =( #−) ++
& # = (#
(), +)

(a) Sign vs. RSign (b) PReLU vs. RPReLU

Fig. 4. Proposed activation functions, RSign and RPReLU, with learnable coefficients
and the traditional activation functions, Sign and PReLU.

Similarly, RPReLU is defined as

xi − γi + ζi , if xi > γi
f (xi ) = , (4)
βi (xi − γi ) + ζi , if xi ≤ γi

where xi is the input of the RPReLU function f on the i th channel, γi and ζi

are learnable shifts for moving the distribution, and βi is a learnable coefficient
controlling the slope of the negative part. All the coefficients are allowed to be
different across channels. Fig. 4(b) compares the shapes of RPReLU and PReLU.
Intrinsically, RSign is learning the best channel-wise threshold (α) for bina-
rizing the input feature map, or equivalently, shifting the input distribution to
obtain the best distribution for taking a sign. From the latter angle, RPReLU can
be easily interpreted as γ shifts the input distribution, finding a best point to use
β to “fold” the distribution, then ζ shifts the output distribution, as illustrated
in Fig. 5. These learned coefficients automatically adjust activation distributions
for obtaining good binary features, which enhances the 1-bit CNNs’ performance.
With the introduction of these functions, the aforementioned difficulty in dis-
tributional learning can be greatly alleviated, and the 1-bit convolutions can
effectively focus on learning more meaningful patterns. We will show later in
the result section that this enhancement can boost the baseline networks top-1
accuracy substantially.
The number of extra parameters introduced by RSign and RPReLU is only
4 × number of channels in the network, which is negligible considering the large
size of the weight matrices. The computational overhead approximates a typical
non-linear layer, which is also trivial compared to the computational intensive
convolutional operations.
Optimization
Parameters in RSign and RPReLU can be optimized end-to-end with other
parameters in the network. The gradient of αi in RSign can be simply derived
by the chain rule as:

∂L X ∂L ∂h(xr )
i
= r ) ∂α , (5)
∂αi xr
∂h(x i i
i
ReActNet 9

−" # = # ∗ &, )* # < 0 +.

Fig. 5. An explanation of how proposed RPReLU operates. It first moves the input
distribution by −γ, then reshapes the negative part by multiplying it with β and lastly
moves the output distribution by ζ.

∂L
where L represents the loss function and ∂h(x r ) denotes the gradients from deeper
P i
layers. The summation xr is applied to all entries in the i th channel. The
i
∂h(xri )
derivative ∂αi can be easily computed as

∂h(xri )
= −1 (6)
∂αi
Similarly, for each parameter in RPReLU, the gradients are computed with
the following formula:
∂f (xi )
= I{xi ≤γi } · (x − γi ), (7)
∂βi
∂f (xi )
= −I{xi ≤γi } · βi − I{xi >γi } , (8)
∂γi
∂f (xi )
= 1. (9)
∂ζi
Here, I denotes the indicator function. I{·} = 1 when the inequation inside {}
holds, otherwise I{·} = 0.

4.3 Distributional Loss

Based on the insight that if the binary neural networks can learn similar dis-
tributions as real-valued networks, the performance can be enhanced, we use a
distributional loss to enforce this similarity, formulated as:
n
1 X X Rθ pB
c (Xi )
θ
LDistribution = − pc (Xi ) log( R ), (10)
n c i=1 pc θ (Xi )

where the distributional loss LDistribution is defined as the KL divergence be-

tween the softmax output pc of a real-valued network Rθ and a binary network
Bθ . The subscript c denotes classes and n is the batch size.
Different from prior work [46] that needs to match the outputs from every in-
termediate block, or further using multi-step progressive structural transition [3],
we found that our distributional loss, while much simpler, can yield competitive
10 Z. Liu et al.

results. Moreover, without block-wise constraints, our approach enjoys the flexi-
bility in choosing the real-valued network without the requirement of architecture
similarity between real and binary networks.

5 Experiments

To investigate the performance of the proposed methods, we conduct experi-

ments on ImageNet dataset. We first introduce the dataset and training strategy
in Section 5.1, followed by comparison between the proposed networks and state-
of-the-arts in terms of both accuracy and computational cost in Section 5.2. We
then analyze the effects of the distributional loss, concatenated downsampling
layer and the RSign and the RPReLU in detail in the ablation study described
in Section 5.3. Visualization results on how RSign and RPReLU help binary
network capture the fine-grained underlying distribution are presented in Sec-
tion 5.4.

5.1 Experimental Settings

Dataset The experiments are carried out on the ILSVRC12 ImageNet clas-
sification dataset [31], which is more challenging than small datasets such as
CIFAR [16] and MNIST [27]. In our experiments, we use the classic data aug-
mentation method described in [15].
Training Strategy We followed the standard binarization method in [23] and
adopted the two-step training strategy as [3]. In the first step, we train a network
with binary activations and real-valued weights from scratch. In the second step,
we inherit the weights from the first step as the initial value and fine-tune the
network with weights and activations both being binary. For both steps, Adam
optimizer with a linear learning rate decay scheduler is used, and the initial
learning rate is set to 5e-4. We train it for 600k iterations with batch size being
256. The weight decay is set to 1e-5 for the first step and 0 for the second step.
Distributional Loss In both steps, we use proposed distributional loss as the
objective function for optimization, replacing the original cross-entropy loss be-
tween the binary network output and the label.
OPs Calculation We follow the calculation method in [3], we count the binary
operations (BOPs) and floating point operations (FLOPs) separately. The total
operations (OPs) is calculated by OPs = BOPs/64 + FLOPs, following [30,23].

5.2 Comparison with State-of-the-art

We compare ReActNet with state-of-the-art quantization and binarization meth-

ods. Table 1 shows that ReActNet-A already outperforms all the quantizing
methods in the left part, and also archives 4.0% higher accuracy than the state-
of-the-art Real-to-Binary Network [3] with only approximately half of the OPs.
Moreover, in contrast to [3] which computes channel re-scaling for each block
ReActNet 11

Bitwidth Acc(%) BOPs FLOPs OPs Acc(%)

Methods Binary Methods
(W/A) Top-1 (×109 ) (×108 ) (×108 ) Top-1
BWN [7] 1/32 60.8 BNNs [7] 1.70 1.20 1.47 42.2
TWN [18] 2/32 61.8 CI-BCNN [37] – – 1.63 59.9
INQ [42] 2/32 66.0 Binary MobileNet [28] – – 1.54 60.9
TTQ [44] 2/32 66.6 PCNN [11] – – 1.63 57.3
SYQ [10] 1/2 55.4 XNOR-Net [30] 1.70 1.41 1.67 51.2
HWGQ [5] 1/2 59.6 Trained Bin [38] – – – 54.2
LQ-Nets [39] 1/2 62.6 Bi-RealNet-18 [23] 1.68 1.39 1.63 56.4
DoReFa-Net [43] 1/4 59.2 Bi-RealNet-34 [23] 3.53 1.39 1.93 62.2
Ensemble BNN [45] (1/1) × 6 61.1 Bi-RealNet-152 [21] 10.7 4.48 6.15 64.5
Circulant CNN [20] (1/1) × 4 61.4 Real-to-Binary Net [3] 1.68 1.56 1.83 65.4
Structured BNN [47] (1/1) × 4 64.2 MeliusNet29 [2] 5.47 1.29 2.14 65.8
Structured BNN* [47] (1/1) × 4 66.3 MeliusNet42 [2] 9.69 1.74 3.25 69.2
ABC-Net [19] (1/1) × 5 65.0 MeliusNet59 [2] 18.3 2.45 5.32 70.7
Our ReActNet-A 1/1 – – 4.82 0.12 0.87 69.4
Our ReActNet-B 1/1 – – 4.69 0.44 1.63 70.1
Our ReActNet-C 1/1 – – 4.69 1.40 2.14 71.4

Table 1. Comparison of the top-1 accuracy with state-of-the-art methods. The left
part presents quantization methods applied on ResNet-18 structure and the right part
are binarization methods with varied structures (ResNet-18 if not specified). Quan-
tization methods include weight quantization (upper left block), low-bit weight and
activation quantization (middle left block) and the weight and activation binarization
with the expanded network capacity (lower left block), where the number times (1/1)
indicates the multiplicative factor. (W/A) represents the number of bits used in weight
or activation quantization.

with real-valued fully-connected layers, ReActNet-A has pure 1-bit convolutions

except the first and the last layers, which is more hardware-friendly.
To make further comparison with previous approaches that use real-valued
convolution to enhance binary networks accuracy [23,3,2], we constructed Re-
ActNet-B and ReActNet-C, which replace the 1-bit 1×1 convolution with real-
valued 1×1 convolution in the downsampling layers, as shown in Fig. 6(c).
ReActNet-B defines the real-valued convolutions to be group convolutions with
4 groups, while ReActNet-C uses full real-valued convolution. We show that
ReActNet-B achieves 13.7% higher accuracy than Bi-RealNet-18 with the same
number of OPs and ReActNet-C outperforms MeliusNet59 by 0.7% with less
than half of the OPs.
Moreover, we applied the ReAct operations to Bi-RealNet-18, and obtained
65.5% Top-1 accuracy, increasing the accuracy of Bi-RealNet-18 by 9.1% without
changing the network structure.
Considering the challenges in previous attempts to enhance 1-bit CNNs per-
formance, the accuracy leap achieved by ReActNets is significant. It requires
an ingenious use of binary networks special property to effectively utilize every
precious bit and strike a delicate balance between binary and real-valued in-
formation. For example, ReActNet-A, with 69.4% top-1 accuracy at 87M OPs,
outperforms the real-valued 0.5× MobileNetV1 by 5.7% greater accuracy at
12 Z. Liu et al.

Table 2. The effects of different components in ReActNet on the final accuracy. (†

denotes the network not using the concatenated blocks, but directly binarizing the
downsampling layers instead. * indicates not using the proposed distributional loss
during training.)

41.6% fewer OPs. These results demonstrate the potential of 1-bit CNNs and
the effectiveness of our ReActNet design.

5.3 Ablation Study

We conduct ablation studies to analyze the individual effect of the following

proposed techniques:
Block Duplication and Concatenation Real-valued shortcuts are crucial for
binary neural network accuracy [23]. However, the input channels of the down-
sampling layers are twice the output channels, which violates the requirement for
adding the shortcuts that demands an equal number of input and output chan-
nels. In the proposed baseline network, we duplicate the downsampling blocks
and concatenate the outputs (Fig. 6(a)), enabling the use of identity shortcuts
to bypass 1-bit convolutions in the downsampling layers. This idea alone results
in a 2.9% accuracy enhancement compared to the network without concatena-
tion (Fig. 6(b)). The enhancement can be observed by comparing the 2nd and
4th rows of Table 2. With the proposed downsampling layer design, our baseline
network achieves both high accuracy and high compression ratio. Because it no
longer requires real-valued matrix multiplications in the downsampling layers,
the computational cost is greatly reduced. As a result, even without using the
distributional loss in training, our proposed baseline network has already sur-
passed the Strong Baseline in [3] by 0.1% for top-1 accuracy at only half of the
OPs. With this strong performance, this simple baseline network serves well as
a new baseline for future studies on compact binary neural networks.
Distributional Loss The results in the first section of Table 2 also validate that
the distributional loss designed for matching the output distribution between
binary and real-valued neural networks is effective for enhancing the performance
of propose baseline network, improving the accuracy by 1.4%, which is achieved
independent of the network architecture design.
ReActNet 13

Duplicate activation Duplicate activation

Sign Sign Real 1x1 Conv, group=g Real 1x1 Conv, group=g
Sign
1-bit 1x1 Conv 1-bit 1x1 Conv BatchNorm BatchNorm
Inplanes, 1-bit 1x1 Conv, outplanes
BatchNorm BatchNorm PReLU PReLU
BatchNorm

Concatenate
Concatenate

(a) Concatenated 1-bit convolutional (b) Normal 1-bit convolutional block (c) Real valued convolutional block for
block in Proposed Baseline Network for downsampling in baseline network downsampling in ReActNet-B and C

Fig. 6. Variations in the downsampling layer design.

Fig. 7. Comparing validation accuracy curves between baseline networks and ReAct-
Net. Using proposed RSign and RPReLU (red curve) achieves the higher accuracy and
is more robust than using Sign and PReLU (green curve).

ReAct Operations The introduction of RSign and RPReLU improves the ac-
curacy by 4.9% and 3.6% respectively over the proposed baseline network, as
shown in the second section of Table 2. By adding both RSign and RPReLU,
ReActNet-A achieves 6.9% higher accuracy than the baseline, narrowing the
accuracy gap to the corresponding real-valued network to within 3.0%. Com-
pared to merely using the Sign and PReLU, the use of the generalized activation
functions, RSign and RPReLU, with simple learnable parameters boost the ac-
curacy by 3.9%, which is very significant for the ImageNet classification task.
As shown in Fig. 7, the validation curve of the network using original Sign +
PReLU oscillates vigorously, which is suspected to be triggered by the slope
coefficient β in PReLU changing its sign which in turn affects the later layers
with an avalanche effect. This also indirectly confirms our assumption that 1-bit
CNNs are vulnerable to distributional changing. In comparison, the proposed
RSign and RPReLU functions are effective for stabilizing training in addition to
improving the accuracy.

5.4 Visualization

To help gain better insights, we visualize the learned coefficients as well as the
intermediate activation distributions.
14 Z. Liu et al.

Target layer Target layer

3e-6 6.15 -0.61 1.0
!"
& 0.5
#̅
(a) Baseline Network + PReLU 0.0
1.99 -0.61 %̅
&̅ -0.5
& 6.56 -0.62
-1.0
(b) Corresponding real-valued network with PReLU (c) ReActNet-A

Fig. 8. The color bar of the learned coefficients. Blue color denotes the positive values
while red denotes the negative, and the darkness in color reflects the absolute value.
We also mark coefficients that have extreme values.

Sign 1-bit 3x3 Conv BatchNorm

(a) Proposed Baseline Network

RSign 1-bit 3x3 Conv BatchNorm RPReLU

(b) ReActNet

Fig. 9. Histogram of the activation distribution

Learned Coefficients For clarity, we present the learned coefficients of each

layer in form of the color bar in Fig. 8. Compared to the binary network using
traditional PReLU whose learned slopes β are positive only (Fig. 8(a)), ReAct-
Net using RPReLU learns both positive and negative slopes (Fig. 8(c)), which
are closer to the distributions of PReLU coefficients in a real-valued network we
trained (Fig. 8(b)). The learned distribution shifting coefficients also have large
absolute values as shown in Rows 1-3 of Fig. 8(c), indicating the necessity of
their explicit shift for high-performance 1-bit CNNs.

Activation Distribution In Fig. 9, we show the histograms of activation dis-

tributions inside the trained baseline network and ReActNet. Compared to the
baseline network without RSign and RPReLU, ReActNets distributions are more
enriched and subtle, as shown in the forth sub-figure in Fig. 9(b). Also, in Re-
ActNet, the distribution of -1 and +1 after the sign function is more balanced,
as illustrated in the second sub-figure in Fig. 9(b), suggesting better utilization
of black and white pixels in representing the binary features.
ReActNet 15

6 Conclusions

In this paper, we present several new ideas to optimize a 1-bit CNN for higher
accuracy. We first design parameter-free shortcuts based on MobileNetV1 to
propagate real-valued feature maps in both normal convolutional layers as well
as the downsampling layers. This yields a baseline binary network with 61.1%
top-1 accuracy at only 87M OPs for the ImageNet dataset. Then, based on our
observation that 1-bit CNNs performance is highly sensitive to distributional
variations, we propose ReAct-Sign and ReAct-PReLU to enable shift and re-
shape the distributions in a learnable fashion and demonstrate their dramatical
enhancements on the top-1 accuracy. We also propose to incorporate a distri-
butional loss, which is defined between the outputs of the binary network and
the real-valued reference network, to replace the original cross-entropy loss for
training. With contributions jointly achieved by these ideas, the proposed Re-
ActNet achieves 69.4% top-1 accuracy on ImageNet, which is just 3% shy of its
real-valued counterpart while at substantially lower computational cost.

References

1. Alizadeh, M., Fernández-Marqués, J., Lane, N.D., Gal, Y.: An empirical study of
binary neural networks’ optimisation (2018) 4
2. Bethge, J., Bartz, C., Yang, H., Chen, Y., Meinel, C.: Meliusnet: Can binary neural
networks achieve mobilenet-level accuracy? arXiv preprint arXiv:2001.05936 (2020)
2, 4, 11
3. Brais Martinez, Jing Yang, A.B.G.T.: Training binary neural networks with real-to-
binary convolutions. International Conference on Learning Representations (2020)
2, 3, 4, 6, 9, 10, 11, 12
4. Bulat, A., Tzimiropoulos, G.: Xnor-net++: Improved binary neural networks.
British Machine Vision Conference (2019) 4, 6
5. Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-
wave gaussian quantization. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. pp. 5918–5926 (2017) 11
6. Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object
detection models with knowledge distillation. In: Advances in Neural Information
Processing Systems. pp. 742–751 (2017) 3
7. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural
networks: Training deep neural networks with weights and activations constrained
to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016) 1, 3, 11
8. Ding, R., Chin, T.W., Liu, Z., Marculescu, D.: Regularizing activation distribution
for training binarized deep networks. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. pp. 11408–11417 (2019) 1, 2, 4
9. Ding, X., Zhou, X., Guo, Y., Han, J., Liu, J., et al.: Global sparse momentum
sgd for pruning very deep neural networks. In: Advances in Neural Information
Processing Systems. pp. 6379–6391 (2019) 3
10. Faraone, J., Fraser, N., Blott, M., Leong, P.H.: Syq: Learning symmetric quanti-
zation for efficient deep neural networks. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. pp. 4300–4309 (2018) 11
16 Z. Liu et al.

11. Gu, J., Li, C., Zhang, B., Han, J., Cao, X., Liu, J., Doermann, D.: Projection
convolutional neural networks for 1-bit cnns via discrete back propagation. In:
Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 8344–
8351 (2019) 4, 11
12. He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural net-
works. In: Proceedings of the IEEE International Conference on Computer Vision.
pp. 1389–1397 (2017) 3
13. Helwegen, K., Widdicombe, J., Geiger, L., Liu, Z., Cheng, K.T., Nusselder, R.:
Latent weights do not exist: Rethinking binarized neural network optimization. In:
Advances in neural information processing systems. pp. 7531–7542 (2019) 4
14. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network.
arXiv preprint arXiv:1503.02531 (2015) 3
15. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An-
dreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for
mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) 2, 3, 5, 6, 10
16. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images.
Tech. rep., Citeseer (2009) 3, 10
17. Li, B., Shan, Y., Hu, M., Wang, Y., Chen, Y., Yang, H.: Memristor-based approxi-
mated computation. In: Proceedings of the 2013 International Symposium on Low
Power Electronics and Design. pp. 242–247. IEEE Press (2013) 1
18. Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint
arXiv:1605.04711 (2016) 11
19. Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network.
In: Advances in Neural Information Processing Systems. pp. 345–353 (2017) 4, 11
20. Liu, C., Ding, W., Xia, X., Zhang, B., Gu, J., Liu, J., Ji, R., Doermann, D.: Circu-
lant binary convolutional networks: Enhancing the performance of 1-bit dcnns with
circulant back propagation. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. pp. 2691–2699 (2019) 11
21. Liu, Z., Luo, W., Wu, B., Yang, X., Liu, W., Cheng, K.T.: Bi-real net: Binarizing
deep network towards real-network performance. International Journal of Com-
puter Vision pp. 1–18 (2018) 4, 11
22. Liu, Z., Mu, H., Zhang, X., Guo, Z., Yang, X., Cheng, K.T., Sun, J.: Metapruning:
Meta learning for automatic neural network channel pruning. In: Proceedings of
the IEEE International Conference on Computer Vision. pp. 3296–3305 (2019) 3
23. Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.T.: Bi-real net: Enhanc-
ing the performance of 1-bit cnns with improved representational capability and
advanced training algorithm. In: Proceedings of the European conference on com-
puter vision (ECCV). pp. 722–737 (2018) 2, 4, 6, 10, 11, 12
24. Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolu-
tional networks through network slimming. In: Proceedings of the IEEE Interna-
tional Conference on Computer Vision. pp. 2736–2744 (2017) 3
25. Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: Practical guidelines for
efficient cnn architecture design. In: Proceedings of the European Conference on
Computer Vision (ECCV). pp. 116–131 (2018) 3
26. Mishra, A., Nurvitadhi, E., Cook, J.J., Marr, D.: Wrpn: wide reduced-precision
networks. arXiv preprint arXiv:1709.01134 (2017) 4
27. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits
in natural images with unsupervised feature learning. In: NIPS workshop on deep
learning and unsupervised feature learning. vol. 2011, p. 5 (2011) 3, 10
ReActNet 17

28. Phan, H., Liu, Z., Huynh, D., Savvides, M., Cheng, K.T., Shen, Z.: Binarizing
mobilenet via evolution-based searching. In: Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition. pp. 13420–13429 (2020) 4,
11
29. Qin, H., Gong, R., Liu, X., Shen, M., Wei, Z., Yu, F., Song, J.: Forward and back-
ward information retention for accurate binary neural networks. In: Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp.
2250–2259 (2020) 4
30. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classi-
fication using binary convolutional neural networks. In: European conference on
computer vision. pp. 525–542. Springer (2016) 1, 2, 4, 5, 6, 10, 11
31. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z.,
Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recog-
nition challenge. International Journal of Computer Vision 115(3), 211–252 (2015)
4, 10
32. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: In-
verted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. pp. 4510–4520 (2018) 3
33. Shen, Z., He, Z., Xue, X.: Meal: Multi-model ensemble via adversarial learning.
In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp.
4886–4893 (2019) 3
34. Soudry, D., Hubara, I., Meir, R.: Expectation backpropagation: Parameter-free
training of multilayer neural networks with continuous or discrete weights. In:
Advances in Neural Information Processing Systems. pp. 963–971 (2014) 3
35. Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural
networks: A tutorial and survey. Proceedings of the IEEE 105(12), 2295–2329
(2017) 3
36. Tang, W., Hua, G., Wang, L.: How to train a compact binary neural network with
high accuracy? In: Thirty-First AAAI conference on artificial intelligence (2017) 4
37. Wang, Z., Lu, J., Tao, C., Zhou, J., Tian, Q.: Learning channel-wise interactions
for binary convolutional neural networks. In: The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (June 2019) 2, 4, 6, 11
38. Xu, Z., Cheung, R.C.: Accurate and compact convolutional neural networks with
trained binarization. British Machine Vision Conference (2019) 6, 11
39. Zhang, D., Yang, J., Ye, D., Hua, G.: Lq-nets: Learned quantization for highly
accurate and compact deep neural networks. In: Proceedings of the European con-
ference on computer vision (ECCV). pp. 365–382 (2018) 3, 11
40. Zhang, J., Pan, Y., Yao, T., Zhao, H., Mei, T.: dabnn: A super fast inference
framework for binary neural networks on arm devices. In: Proceedings of the 27th
ACM International Conference on Multimedia. pp. 2272–2275 (2019) 1
41. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolu-
tional neural network for mobile devices. In: Proceedings of the IEEE conference
on computer vision and pattern recognition. pp. 6848–6856 (2018) 3
42. Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization:
Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044
(2017) 11
43. Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: Dorefa-net: Training low
bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint
arXiv:1606.06160 (2016) 3, 11
44. Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv
preprint arXiv:1612.01064 (2016) 11
18 Z. Liu et al.

45. Zhu, S., Dong, X., Su, H.: Binary ensemble neural network: More bits per network
or more networks per bit? In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. pp. 4923–4932 (2019) 11
46. Zhuang, B., Shen, C., Tan, M., Liu, L., Reid, I.: Towards effective low-bitwidth
convolutional neural networks. In: Proceedings of the IEEE conference on computer
vision and pattern recognition. pp. 7920–7928 (2018) 3, 4, 9
47. Zhuang, B., Shen, C., Tan, M., Liu, L., Reid, I.: Structured binary neural networks
for accurate image classification and semantic segmentation. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition. pp. 413–422 (2019)
11

MCQ On Management Information System Ans PDF
90% (10)
MCQ On Management Information System Ans PDF
20 pages
Awesome Machine Learning Papers
100% (1)
Awesome Machine Learning Papers
326 pages
Bead Dealers in Chawri Bazar, Delhi, India - Justdial
No ratings yet
Bead Dealers in Chawri Bazar, Delhi, India - Justdial
6 pages
And The Bit Goes Down
No ratings yet
And The Bit Goes Down
11 pages
Tutorial On DNN 6 of 9 Network and Hardware Co Design
No ratings yet
Tutorial On DNN 6 of 9 Network and Hardware Co Design
60 pages
Atlas - Histologie PDF
100% (1)
Atlas - Histologie PDF
133 pages
Binary Neural Networks
No ratings yet
Binary Neural Networks
218 pages
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
No ratings yet
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
41 pages
Data Science Interview Questions #Week4
No ratings yet
Data Science Interview Questions #Week4
146 pages
2020 MeliusNet
No ratings yet
2020 MeliusNet
21 pages
2022 Bnext
No ratings yet
2022 Bnext
16 pages
Auto QNN
No ratings yet
Auto QNN
23 pages
2022 Cmim Net
No ratings yet
2022 Cmim Net
16 pages
Data Science Interview Preparation (# DAY 22)
No ratings yet
Data Science Interview Preparation (# DAY 22)
16 pages
Efficient Deep Learning in Network Compression and
No ratings yet
Efficient Deep Learning in Network Compression and
21 pages
2020 Real To Binary.11535
No ratings yet
2020 Real To Binary.11535
11 pages
BNNs
No ratings yet
BNNs
18 pages
Genai See
No ratings yet
Genai See
51 pages
Basirat and Roth. 2018 - The Quest For The Golden Activation Function
No ratings yet
Basirat and Roth. 2018 - The Quest For The Golden Activation Function
16 pages
Kim Improving Accuracy of Binary Neural Networks Using Unbalanced Activation Distribution CVPR 2021 Paper
No ratings yet
Kim Improving Accuracy of Binary Neural Networks Using Unbalanced Activation Distribution CVPR 2021 Paper
10 pages
Manual EPLAN - Manual Software Eplan P8 - Iniciante
100% (1)
Manual EPLAN - Manual Software Eplan P8 - Iniciante
141 pages
DoReFa Net
No ratings yet
DoReFa Net
13 pages
Zhang 2021
No ratings yet
Zhang 2021
12 pages
BN Free
No ratings yet
BN Free
11 pages
NNQuant 1
No ratings yet
NNQuant 1
14 pages
2009 RBNN
No ratings yet
2009 RBNN
12 pages
3an Empirical Study of Binary N
No ratings yet
3an Empirical Study of Binary N
11 pages
Learning To Train A Binary Neural Network
No ratings yet
Learning To Train A Binary Neural Network
16 pages
RTN: Reparameterized Ternary Network: Yuhang Li, Xin Dong, Sai Qian Zhang, Haoli Bai, Yuanpeng Chen, Wei Wang
No ratings yet
RTN: Reparameterized Ternary Network: Yuhang Li, Xin Dong, Sai Qian Zhang, Haoli Bai, Yuanpeng Chen, Wei Wang
9 pages
Mixed Precision Training
No ratings yet
Mixed Precision Training
12 pages
Deep Convolutional Neural Network Inference With Floating-Point Weights and
No ratings yet
Deep Convolutional Neural Network Inference With Floating-Point Weights and
10 pages
A Deep Learning Accelerator Based On A Streaming Architecture For Binary Neural Networks
No ratings yet
A Deep Learning Accelerator Based On A Streaming Architecture For Binary Neural Networks
19 pages
Iva Unit-5 Edited
No ratings yet
Iva Unit-5 Edited
42 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Binaryconnect: Training Deep Neural Networks With Binary Weights During Propagations
No ratings yet
Binaryconnect: Training Deep Neural Networks With Binary Weights During Propagations
9 pages
1710 11573 PDF
No ratings yet
1710 11573 PDF
14 pages
Chen, Deng Et Al 2021 - Effective and Efficient Batch Normalization
No ratings yet
Chen, Deng Et Al 2021 - Effective and Efficient Batch Normalization
15 pages
Bitwise Neural Network
No ratings yet
Bitwise Neural Network
5 pages
Applsci 14 08500 v2
No ratings yet
Applsci 14 08500 v2
26 pages
Mapping Binary ResNets On Computing-In-Memory Hardware With Low-Bit ADCs
No ratings yet
Mapping Binary ResNets On Computing-In-Memory Hardware With Low-Bit ADCs
6 pages
A Lightweight Binarized Convolutional Neural Network Model For Small Memory and Low-Cost Mobile Devices
No ratings yet
A Lightweight Binarized Convolutional Neural Network Model For Small Memory and Low-Cost Mobile Devices
11 pages
Data Science Interview Preparation (# DAY 22)
No ratings yet
Data Science Interview Preparation (# DAY 22)
16 pages
I N Q: T L CNN L - P W: Ncremental Etwork Uantization Owards Ossless S With OW Recision Eights
No ratings yet
I N Q: T L CNN L - P W: Ncremental Etwork Uantization Owards Ossless S With OW Recision Eights
14 pages
Res Net 2
No ratings yet
Res Net 2
40 pages
ElieNicolas BNNs
No ratings yet
ElieNicolas BNNs
16 pages
Toderici Full Resolution Image CVPR 2017 Paper
No ratings yet
Toderici Full Resolution Image CVPR 2017 Paper
9 pages
Bitnet: Scaling 1-Bit Transformers For Large Language Models
No ratings yet
Bitnet: Scaling 1-Bit Transformers For Large Language Models
14 pages
6 Apr - 6 - DL
No ratings yet
6 Apr - 6 - DL
69 pages
Kim Improving Accuracy of CVPR 2021 Supplemental
No ratings yet
Kim Improving Accuracy of CVPR 2021 Supplemental
1 page
Questions
No ratings yet
Questions
6 pages
Control-M Installation Guide 6.1.03 PDF
No ratings yet
Control-M Installation Guide 6.1.03 PDF
418 pages
Back To Simplicit - How To Train Accurate BNNs From Scratch
No ratings yet
Back To Simplicit - How To Train Accurate BNNs From Scratch
9 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Ternary Weight Networks: 30th Conference On Neural Information Processing Systems (NIPS 2016), Barcelona, Spain
No ratings yet
Ternary Weight Networks: 30th Conference On Neural Information Processing Systems (NIPS 2016), Barcelona, Spain
5 pages
9.b Handout-4-Activation Functions
No ratings yet
9.b Handout-4-Activation Functions
4 pages
Object Classification Using CNN
No ratings yet
Object Classification Using CNN
9 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
Wellcomm User Guide
100% (1)
Wellcomm User Guide
23 pages
EEP 4201 Assignmnet
No ratings yet
EEP 4201 Assignmnet
10 pages
Jpeg Xs Whitepaper
No ratings yet
Jpeg Xs Whitepaper
8 pages
Active Directory Delegation
No ratings yet
Active Directory Delegation
233 pages
A Survey of Model Compression and Acceleration For Deep Neural Networks
No ratings yet
A Survey of Model Compression and Acceleration For Deep Neural Networks
10 pages
موسوعة امثلة C++ المحلولة
No ratings yet
موسوعة امثلة C++ المحلولة
34 pages
SS 2020
No ratings yet
SS 2020
21 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
FTTX
No ratings yet
FTTX
44 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
19 ResNet 10 09 2024
No ratings yet
19 ResNet 10 09 2024
35 pages
International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE)
No ratings yet
International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE)
8 pages
Dbms Notes
No ratings yet
Dbms Notes
48 pages
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
No ratings yet
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
19 pages
Xuesong Wang Et Al - 2021 - Multipath Ensemble Convolutional Neural Network
No ratings yet
Xuesong Wang Et Al - 2021 - Multipath Ensemble Convolutional Neural Network
9 pages
Powershell Commands PDF
No ratings yet
Powershell Commands PDF
3 pages
RESNET
No ratings yet
RESNET
5 pages
SYBBA CA Sem IV Labbook
No ratings yet
SYBBA CA Sem IV Labbook
116 pages
Wi-Fi Interview Questions & Answers
No ratings yet
Wi-Fi Interview Questions & Answers
6 pages
Lab Manual - CSP 350
No ratings yet
Lab Manual - CSP 350
57 pages
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
No ratings yet
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
8 pages
Java Programming Made Notes
No ratings yet
Java Programming Made Notes
6 pages
5800 Clinic Mar09
No ratings yet
5800 Clinic Mar09
32 pages
Unit 1
No ratings yet
Unit 1
62 pages
FP BNN On FPGA
No ratings yet
FP BNN On FPGA
15 pages
Unlock HDD That Are Locked After Secure Erase
No ratings yet
Unlock HDD That Are Locked After Secure Erase
9 pages
Accelerating Binarized Convolutional 2017
No ratings yet
Accelerating Binarized Convolutional 2017
10 pages
Electronics 11 00663
No ratings yet
Electronics 11 00663
14 pages
Runtime Reconfigurable Processing Elements For Binary2020
No ratings yet
Runtime Reconfigurable Processing Elements For Binary2020
45 pages
Bai Giang - Le Thi Thuy
No ratings yet
Bai Giang - Le Thi Thuy
56 pages
Deep Learning For Middle School Students
No ratings yet
Deep Learning For Middle School Students
34 pages
Level of Implementation of Industrial Technology Syllabi at ThePangasinan State University
No ratings yet
Level of Implementation of Industrial Technology Syllabi at ThePangasinan State University
9 pages
R - Fpga 4: EAL Time Semantic Segmentation On S For Autonomous Vehicles With Hls ML
No ratings yet
R - Fpga 4: EAL Time Semantic Segmentation On S For Autonomous Vehicles With Hls ML
11 pages
Finn
No ratings yet
Finn
10 pages
Gov Uscourts FLSD 521536 237 7
No ratings yet
Gov Uscourts FLSD 521536 237 7
5 pages
Literature Review
0% (1)
Literature Review
4 pages
1 s2.0 S2046043021000514 Main
No ratings yet
1 s2.0 S2046043021000514 Main
12 pages
S02 - S03 - 1 - The Dominant Logic Retrospective and Extension - Bettis N CKP 1995
No ratings yet
S02 - S03 - 1 - The Dominant Logic Retrospective and Extension - Bettis N CKP 1995
11 pages
Gaurav Resume
No ratings yet
Gaurav Resume
1 page
Accelerating Low Bit-Width Convolutional Neural Networks With Embedded FPGA
No ratings yet
Accelerating Low Bit-Width Convolutional Neural Networks With Embedded FPGA
4 pages
Maharmeh 2021
No ratings yet
Maharmeh 2021
5 pages
UsbFix Report
No ratings yet
UsbFix Report
9 pages
Project
No ratings yet
Project
2 pages
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Reactnet: Towards Precise Binary Neural Network With Generalized Activation Functions

Uploaded by

Reactnet: Towards Precise Binary Neural Network With Generalized Activation Functions

Uploaded by

ReActNet: Towards Precise Binary Neural

Network with Generalized Activation Functions

Zechun Liu1,2? , Zhiqiang Shen2† , Marios Savvides2 , and Kwang-Ting Cheng1

Abstract. In this paper, we propose several ideas for enhancing a bi-

ECCV-20 submission ID 2106 15

icantly outperform other binary

In this study, our motivation

MobileNetV1 [15] structure671

1-bit vanilla convolutions to replace the convolutions in MobileNetV1. Moreover,

– We design a baseline binary network by modifying MobileNetV1, whose per-

3 Revisit: 1-bit Convolution

Sign Sign ReAct Sign ReAct Sign

Concatenate ReAct PReLU ReAct PReLU

(a) Proposed baseline network block (b) Proposed ReActNet block

4.1 Baseline Network

The intrinsic property of an image classification neural network is to learn a

Real Binary Real Binary Real Binary

Real-valued Real-valued Real-valued

and activations being continuous real values, reshaping or moving distributions

ℎ(#) ℎ(#) &(#) &(#)

% &# =# & # =#−)++

(a) Sign vs. RSign (b) PReLU vs. RPReLU

Similarly, RPReLU is defined as

where xi is the input of the RPReLU function f on the i th channel, γi and ζi

−" # = # ∗ &, )* # < 0 +.

4.3 Distributional Loss

where the distributional loss LDistribution is defined as the KL divergence be-

To investigate the performance of the proposed methods, we conduct experi-

5.1 Experimental Settings

5.2 Comparison with State-of-the-art

We compare ReActNet with state-of-the-art quantization and binarization meth-

Bitwidth Acc(%) BOPs FLOPs OPs Acc(%)

with real-valued fully-connected layers, ReActNet-A has pure 1-bit convolutions

Network Top-1 Acc(%)

Table 2. The effects of different components in ReActNet on the final accuracy. (†

5.3 Ablation Study

We conduct ablation studies to analyze the individual effect of the following

Duplicate activation Duplicate activation

Fig. 6. Variations in the downsampling layer design.

Target layer Target layer

Sign 1-bit 3x3 Conv BatchNorm

(a) Proposed Baseline Network

RSign 1-bit 3x3 Conv BatchNorm RPReLU

Fig. 9. Histogram of the activation distribution

Learned Coefficients For clarity, we present the learned coefficients of each

Activation Distribution In Fig. 9, we show the histograms of activation dis-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.