GithubHelp home page GithubHelp logo

liuzechun / bi-real-net Goto Github PK

View Code? Open in Web Editor NEW
176.0 8.0 39.0 121 KB

Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm. In ECCV 2018 and IJCV

C++ 59.69% Cuda 15.64% Python 24.59% Shell 0.09%

bi-real-net's Introduction

Bi-Real-net

This is the implementation of our paper "Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm" published in ECCV 2018 and "Bi-real net: Binarizing deep network towards real-network performance" published in IJCV.

We proposed to use a identity mapping to propagate the real-valued information before binarization. The proposed 1-layer-per-block structure with the shortcut bypassing every binary convolutional layers significantly outperforms the original 2-layer-per-block structure in ResNet when weights and activations are binarized. The detailed motivation and discussion can be found in our IJCV paper. Three other proposed training techniques can be found in the ECCV paper.

News

[November 23rd 2019] We finished the pytorch implementation of training Bi-Real Net from scratch, which is super easy to run. We obtain the same accuracy as reported in the paper. Clone and have a try with our new pytorch implementation!

[March 28th 2021] Our newest paper "ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions" achieves 65.8% accuracy on Bi-Real Net-18 structure with simple ReAct functions. Check this paper if you are interested.

Reference

If you find this repo useful for your research, please consider citing the paper:

@inproceedings{liu2018bi,
  title={Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm},
  author={Liu, Zechun and Wu, Baoyuan and Luo, Wenhan and Yang, Xin and Liu, Wei and Cheng, Kwang-Ting},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={722--737},
  year={2018}
}

and

@article{liu2020bi,
  title={Bi-real net: Binarizing deep network towards real-network performance},
  author={Liu, Zechun and Luo, Wenhan and Wu, Baoyuan and Yang, Xin and Liu, Wei and Cheng, Kwang-Ting},
  journal={International Journal of Computer Vision (IJCV)},
  volume={128},
  number={1},
  pages={202--219},
  year={2020},
  publisher={Springer}
}

Pytorch Implementation

To make Bi-Real Net easier to implement. We recently discovered that we can train it from scratch with Adam solver. The start learning rate is 0.001 and linearly decay to 0 after 256 epoches. the batchsize is set to 512. If you want to decrease or increase the batchsize, remember to multiply the learning rate with the same ratio. This implementation is different from that reported in the paper. The difference are mainly three folds:

Caffe implementation in our original paper Pytorch implementation
Training Technique Step-by-step finetune Train from scratch
Solver SGD with momentum Adam
Data Augmentation Random crop 224 from 256 Random rescale with rescale ratio [0.08-1] then random crop 224 from 256

Requirements: * python3, pytorch 1.3.0, torchvision 0.4.1

Caffe Implementation

This model was trained on ImageNet dataset with 1000 classes and 1.2 million training images and 50k validation images. For each image in the ImageNet dataset, the smaller dimension of the image is rescaled to 256 while keeping the aspect ratio intact. For training, a random crop of size 224 × 224 is selected. Note that, in contrast to XNOR-Net and the full-precision ResNet, we do not use the operation of random resize, which might improve the performance further. For inference, we employ the 224 × 224 center crop from images.

Pre-training: We prepare the real-valued network for initializing binary network in three steps: 1) Train the network with ReLU nonlinearity function from scratch, following the hyper-parameter settings in ResNet. 2) Replace ReLU with leaky-clip with the range of (-1,1) and the negative slope of 0.1 and finetune the network. 3) Finetune the network with clip(-1,x,1) nonlinearity instead of leaky-clip.

Training: We train two instances of the Bi-Real net, including an 18-layer Bi-Real net and a 34-layer Bi-Real net. The training of them consists of two steps: training the 1-bit convolution layer and retraining the BatchNorm. In the first step, the weights in the 1-bit convolution layer are binarized to the sign of real-valued weights multiplying the absolute mean of each kernel. We use the SGD solver with the momentum of 0.9 and set the weight-decay to 0, which means we no longer encourage the weights to be close to 0. For the 18-layer Bi-Real net, we run the training algorithm for 20 epochs with a batch size of 128. The learning rate starts from 0.01 and is decayed twice by multiplying 0.1 at the 10th and the 15th epoch. For the 34-layer Bi-Real net, the training process includes 40 epochs and the batch size is set to 1024. The learning rate starts from 0.08 and is multiplied by 0.1 at the 20th and the 30th epoch, respectively. In the second step, we constraint the weights to -1 and 1, and set the learning rate in all convolution layers to 0 and retrain the BatchNorm layer for 1 epoch to absorb the scaling factor.

Inference: we use the trained model with binary weights and binary activations in the 1-bit convolution layers for inference.

Using the code: this is a caffe implementation. We added the binary convolution layer and leaky-clip layer. The binary convolution layer is modified from https://github.com/loswensiana/BWN-XNOR-caffe, in which we modified the gradient computation method. To use the code, please put the ex_layers folder under the src and include folder respectively. Also you need to replace the original thereshold layer with our threshold layer because we modified its backward computation.

Accuracy

Bi-Real net XNOR-Net
18-layer Top-1 56.4% 51.2%
Top-5 79.5% 73.4%
34-layer Top-1 62.2%
Top-5 83.9%
50-layer Top-1 62.6%
Top-5 83.9%

bi-real-net's People

Contributors

liuzechun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bi-real-net's Issues

Conv Layer backward computation

The Readme says "The binary convolution layer is modified from https://github.com/loswensiana/BWN-XNOR-caffe, in which we modified the gradient computation method"

However, looking at the Diff of
https://github.com/loswensiana/BWN-XNOR-caffe/blob/master/src/caffe/ex_layers/binary_conv_layer.cpp
and
https://github.com/liuzechun/Bi-Real-net/blob/master/Bi-Real-net-caffe/caffe-train/src/caffe/ex_layers/binary_conv_train_layer.cpp
It looks like backward and gradients are exactly the same. Am I looking at the wrong files, or are they maybe not updated?

Thanks for any help

make caffe failed

When I use " sudo make all" to make bi-real-net, a error is found.

CXX .build_release/src/caffe/proto/caffe.pb.cc nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). AR -o .build_release/lib/libcaffe.a LD -o .build_release/lib/libcaffe.so.1.0.0-rc3 .build_release/cuda/src/caffe/ex_layers/binary_conv_layer.o:在函数‘__device_stub__ZN5caffe23binary_sync_conv_groupsEv()’中: tmpxft_00005c61_00000000-4_binary_conv_layer.compute_50.cudafe1.cpp:(.text+0x410): '__device_stub__ZN5caffe23binary_sync_conv_groupsEv()'被多次定义 .build_release/cuda/src/caffe/ex_layers/binary_conv_train_layer.o:tmpxft_00005c4d_00000000-4_binary_conv_train_layer.compute_50.cudafe1.cpp:(.text+0x2b0):第一次在此定义 .build_release/cuda/src/caffe/ex_layers/binary_conv_layer.o:在函数‘caffe::binary_sync_conv_groups()’中: tmpxft_00005c61_00000000-4_binary_conv_layer.compute_50.cudafe1.cpp:(.text+0x420): 'caffe::binary_sync_conv_groups()'被多次定义 .build_release/cuda/src/caffe/ex_layers/binary_conv_train_layer.o:tmpxft_00005c4d_00000000-4_binary_conv_train_layer.compute_50.cudafe1.cpp:(.text+0x2c0):第一次在此定义 collect2: error: ld returned 1 exit status Makefile:564: recipe for target '.build_release/lib/libcaffe.so.1.0.0-rc3' failed make: *** [.build_release/lib/libcaffe.so.1.0.0-rc3] Error 1
how to use your code in caffe?

Accuracy of ResNet-34 and ResNet-50

Thanks for your update of pytorch implementation.

I noticed that the accuracies of binary ResNet-34 and binary ResNet-50 are similar. What do you think is the cause? Just for academic discussion, no offence.

image

权重二值化问题

你好,我在复现pytorch版本的过程中输出binary_weights,但是不是二值化数值,请问是我哪里有遗漏或者没有操作对吗?

About the file

I read your paper recently, it's a nice work! but i can't find any coding file except the readme in the master branch.

a question about BinaryActivation

Hello.Sorry to bother you,I am very confused about the code in class BinaryActivation, could give me some explanation? I don't know what is the meaning of out1, out2 and out3.

Implementation of pre-training

In the implementation of pre-training, you first trained a real-valued resnet to initialize the bnn with the same hyperparameter settings of the original resnet. The architecture of the Bi-Real-net and the standard ResNet is different, which one do you use for pre-training?

If you use the architecture of the standard resnet, is it efficient to load the pre-trained weights into the Bi-Real-net which has different inference graphs?

If you use the architecture of the standard the Bi-Real-net, does it work if you use the same hyperparameter settings of the original resnet because the pooling, batchnorm and activation layers are stacked differently in Binary CNN and CNN.

Implementation of PopCount(XNOR(a, w))

Hello there,

While reading the ex_layers, specifically the Binary Convolution Layer, I cannot see where the aforementioned operation is implemented or how it is implemented (substituting the Convolutions with XNOR and popcounts), can you clarify me this?

Thank you and greetings.

Training log of the PyTorch version

Great work!
Recently, we are replicating your work (Bi-Real-net 34)
Could you please provide the training log of the PyTorch version for our reference?
Thank you very much!

Pre-trained Models

Wonderful work!
Recently, we are researching on the binary object detection, and need the pre-trained binary backbones(Bi-Real Res18/34/50...).
Could you provide such pre-trained models? Thanks very much!

PyTorch Implementation: Forward Pass

First of all, thank you for sharing the PyTorch implementation, it's wonderful.
I've been going over the code and found this line:
binary_weights = binary_weights_no_grad.detach() - cliped_weights.detach() + cliped_weights
in the birealnet.py and was wondering what the purpose of this is for.
My best guess is that it's to merely allow the gradients to exist without actually changing the values of the binary weights, but some helpful clarification would be wonderful!

编译caffe文件时出错

将ex_layers文件夹添加至src和includ文件夹,并替换原始thereshold layer文件后,对caffe进行编译时报错。
2018-10-22 09-59-15

请问运行该网络还需要更改其他文件吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.