GithubHelp home page GithubHelp logo

diversebranchblock's Introduction

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021)

DBB is a powerful ConvNet building block to replace regular conv. It improves the performance without any extra inference-time costs. This repo contains the code for building DBB and converting it into a single conv. You can also get the equivalent kernel and bias in a differentiable way at any time (get_equivalent_kernel_bias in diversebranchblock.py). This may help training-based pruning or quantization.

This is the PyTorch implementation. The MegEngine version is at https://github.com/megvii-model/DiverseBranchBlock

Another nice implementation by @zjykzj. Please check here.

Paper: https://arxiv.org/abs/2103.13425

Update: released the code for building the block, transformations and verification.

Update: a more efficient implementation of BNAndPadLayer

Update: MobileNet, ResNet-18 and ResNet-50 models released. You can download them from Google Drive or Baidu Cloud. For the 1x1-KxK branch of MobileNet, we used internal_channels = 2x input_channels for every depthwise conv. 1x also worked but the accuracy was slightly lower (72.71% v.s. 72.88%). On dense conv like ResNet, we used internal_channels = input_channels, and larger internal_channels seemed useless.

Sometimes I call it ACNet v2 because 'DBB' is two bits larger than 'ACB' in ASCII. (lol)

@inproceedings{ding2021diverse,
title={Diverse Branch Block: Building a Convolution as an Inception-like Unit},
author={Ding, Xiaohan and Zhang, Xiangyu and Han, Jungong and Ding, Guiguang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10886--10895},
year={2021}
}

Abstract

We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs. The block is named Diverse Branch Block (DBB), which enhances the representational capacity of a single convolution by combining diverse branches of different scales and complexities to enrich the feature space, including sequences of convolutions, multi-scale convolutions, and average pooling. After training, a DBB can be equivalently converted into a single conv layer for deployment. Unlike the advancements of novel ConvNet architectures, DBB complicates the training-time microstructure while maintaining the macro architecture, so that it can be used as a drop-in replacement for regular conv layers of any architecture. In this way, the model can be trained to reach a higher level of performance and then transformed into the original inference-time structure for inference. DBB improves ConvNets on image classification (up to 1.9% higher top-1 accuracy on ImageNet), object detection and semantic segmentation.

image image image

Use our pretrained models

You may download the models reported in the paper (ResNet-18, ResNet-50, MobileNet) from Google Drive (https://drive.google.com/drive/folders/1BPuqY_ktKz8LvHjFK5abD0qy3ESp8v6H?usp=sharing) or Baidu Cloud (https://pan.baidu.com/s/1wPaQnLKyNjF_bEMNRo4z6Q, the access code is "dbbk"). For the ease of transfer learning on other tasks, we provide both training-time and inference-time models. For ResNet-18 as an example, assume IMGNET_PATH is the path to your directory that contains the "train" and "val" directories of ImageNet, you may test the accuracy by running

python test.py IMGNET_PATH train ResNet-18_DBB_7101.pth -a ResNet-18 -t DBB

Here "train" indicates the training-time structure

Convert the training-time models into inference-time

You may convert a trained model into the inference-time structure with

python convert.py [weights file of the training-time model to load] [path to save] -a [architecture name]

For example,

python convert.py ResNet-18_DBB_7101.pth ResNet-18_DBB_7101_deploy.pth -a ResNet-18

Then you may test the inference-time model by

python test.py IMGNET_PATH deploy ResNet-18_DBB_7101_deploy.pth -a ResNet-18 -t DBB

Note that the argument "deploy" builds an inference-time model.

ImageNet training

The multi-processing training script in this repo is based on the official PyTorch example for the simplicity and better readability. The modifications include the model-building part and cosine learning rate scheduler. You may train and test like this:

python train.py -a ResNet-18 -t DBB --dist-url tcp://127.0.0.1:23333 --dist-backend nccl --multiprocessing-distributed --world-size 1 --rank 0 --workers 64 IMGNET_PATH
python test.py IMGNET_PATH train model_best.pth.tar -a ResNet-18

Use like this in your own code

Assume your model is like

class SomeModel(nn.Module):
    def __init__(self, ...):
        ...
        self.some_conv = nn.Conv2d(...)
        self.some_bn = nn.BatchNorm2d(...)
        ...
        
    def forward(self, inputs):
        out = ...
        out = self.some_bn(self.some_conv(out))
        ...

For training, just use DiverseBranchBlock to replace the conv-BN. Then SomeModel will be like

class SomeModel(nn.Module):
    def __init__(self, ...):
        ...
        self.some_dbb = DiverseBranchBlock(..., deploy=False)
        ...
        
    def forward(self, inputs):
        out = ...
        out = self.some_dbb(out)
        ...

Train the model just like you train the other regular models. Then call switch_to_deploy of every DiverseBranchBlock, test, and save.

model = SomeModel(...)
train(model)
for m in train_model.modules():
    if hasattr(m, 'switch_to_deploy'):
        m.switch_to_deploy()
test(model)
save(model)

FAQs

Q: Is the inference-time model's output the same as the training-time model?

A: Yes. You can verify that by

python dbb_verify.py

Q: What is the relationship between DBB and RepVGG?

A: RepVGG is a plain architecture, and the RepVGG-style structural re-param is designed for the plain architecture. On a non-plain architecture, a RepVGG block shows no superiority compared to a single 3x3 conv (it improves Res-50 by only 0.03%, as reported in the RepVGG paper). DBB is a universal building block that can be used on numerous architectures.

Q: How to quantize a model with DBB?

A1: Post-training quantization. After training and conversion, you may quantize the converted model with any post-training quantization method. Then you may insert a BN after the conv converted from a DBB and finetune to recover the accuracy just like you quantize and finetune the other models. This is the recommended solution. Please see the quantization example of RepVGG.

A2: Quantization-aware training. During the quantization-aware training, instead of constraining the params in a single kernel (e.g., making every param in {-127, -126, .., 126, 127} for int8) for an ordinary conv, you should constrain the equivalent kernel of a DBB (get_equivalent_kernel_bias()).

Q: I tried to finetune your model with multiple GPUs but got an error. Why are the names of params like "xxxx.weight" in the downloaded weight file but sometimes like "module.xxxx.weight" (shown by nn.Module.named_parameters()) in my model?

A: DistributedDataParallel may prefix "module." to the name of params and cause a mismatch when loading weights by name. The simplest solution is to load the weights (model.load_state_dict(...)) before DistributedDataParallel(model). Otherwise, you may insert "module." before the names like this

checkpoint = torch.load(...)    # This is just a name-value dict
ckpt = {('module.' + k) : v for k, v in checkpoint.items()}
model.load_state_dict(ckpt)

Likewise, if the param names in the checkpoint file start with "module." but those in your model do not, you may strip the names like

ckpt = {k.replace('module.', ''):v for k,v in checkpoint.items()}   # strip the names
model.load_state_dict(ckpt)

Q: So a DBB derives the equivalent KxK kernels before each forwarding to save computations?

A: No! More precisely, we do the conversion only once right after training. Then the training-time model can be discarded, and every resultant block is just a KxK conv. We only save and use the resultant model.

Contact

[email protected] (The original Tsinghua mailbox [email protected] will expire in several months)

Google Scholar Profile: https://scholar.google.com/citations?user=CIjw0KoAAAAJ&hl=en

Homepage: https://dingxiaohan.xyz/

My open-sourced papers and repos:

The Structural Re-parameterization Universe:

  1. RepLKNet (CVPR 2022) Powerful efficient architecture with very large kernels (31x31) and guidelines for using large kernels in model CNNs
    Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
    code.

  2. RepOptimizer (ICLR 2023) uses Gradient Re-parameterization to train powerful models efficiently. The training-time RepOpt-VGG is as simple as the inference-time. It also addresses the problem of quantization.
    Re-parameterizing Your Optimizers rather than Architectures
    code.

  3. RepVGG (CVPR 2021) A super simple and powerful VGG-style ConvNet architecture. Up to 84.16% ImageNet top-1 accuracy!
    RepVGG: Making VGG-style ConvNets Great Again
    code.

  4. RepMLP (CVPR 2022) MLP-style building block and Architecture
    RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
    code.

  5. ResRep (ICCV 2021) State-of-the-art channel pruning (Res50, 55% FLOPs reduction, 76.15% acc)
    ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting
    code.

  6. ACB (ICCV 2019) is a CNN component without any inference-time costs. The first work of our Structural Re-parameterization Universe.
    ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks.
    code.

  7. DBB (CVPR 2021) is a CNN component with higher performance than ACB and still no inference-time costs. Sometimes I call it ACNet v2 because "DBB" is 2 bits larger than "ACB" in ASCII (lol).
    Diverse Branch Block: Building a Convolution as an Inception-like Unit
    code.

Model compression and acceleration:

  1. (CVPR 2019) Channel pruning: Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure
    code

  2. (ICML 2019) Channel pruning: Approximated Oracle Filter Pruning for Destructive CNN Width Optimization
    code

  3. (NeurIPS 2019) Unstructured pruning: Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
    code

diversebranchblock's People

Contributors

dingxiaoh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diversebranchblock's Issues

关于TRANS Ⅲ中的注意事项

您好!不好意思打扰您!!我最近拜读您的论文,看到了TRANS Ⅲ,不得不说这种变换确实很是新颖,但看到其中您提到的注意点,即如果第二层K*K 如果对输入做了0填充,那么公式8是不成立的,解决方案是用第一次等价过来的卷积的偏置 REP(b1) 作为填充,对这一点我有点不太理解,您能给详细解释一下不成立的原因以及解决方案的原因么?谢谢您!

关于DBB替换Res18的多分类表现

大佬您好,看了您的文章之后,我试着用使用DBB模块的Res18网络用于自己的多分类任务中, 使用方法如下:

import torch
import torch.nn as nn
from DiverseBranchBlock.convnet_utils import switch_deploy_flag, switch_conv_bn_impl, build_model

def Dbb_Res(num_classes,pretrained=True):

switch_deploy_flag(False)
switch_conv_bn_impl('DBB')
model = build_model('ResNet-18')

if pretrained ==True:

    model.load_state_dict(torch.load('DiverseBranchBlock\ResNet-18_DBB_7099.pth'))



in_features = model.linear.in_features
model.linear = nn.Linear(in_features, num_classes)
return model

但是在实战中效果却一塌糊涂,预训练res18能达到80%的准确率, 我是用如上方法构建的网络,精度只有6% - .-,请问是我这种方法调用不正确吗,如何调整, 麻烦您了!

MobileNet

目前只看到resnet,请问mobilenet什么时候有哈?

请教大佬

大佬,你DBB为啥不加identity呢? RepVGG里面就尝试了identity。 是有啥考虑吗?

关于TRANS Ⅲ的相关问题

大大您好,感谢您的nice work!
我有一个疑问,您在TRANS Ⅲ中是先1x1,再3x3,那么这时候是需要在1x1后用当前的bias进行padding的。那么可否先3x3,再1x1,那么应该就不需要padding了吧。请问您当时有尝试过吗?

两点疑问

  1. 下图在验证重参数化前后输出差异的时候,为什么需要先将BN权重初始化?
    image

  2. 如果dbb_1x1_kxk中将1x1个数增加(即纵向堆叠很多个1x1,最后一个是3x3),重参数化后误差是否会加大?

期待您的回复~

训练和预测转化

请问作者,训练的模型不转化,直接用来预测、评估可以吗?

模型转换

请问作者大大如果把DBB模块替换了自己网络的某些位置的卷积块,应该怎样得到paper里的降低参数模型呢,我更改部分模块后参数由240M到330M左右,比较大。

应用于自己模型问题。

为什么在README中写道如果用于属于自己模型,实际上需要用DBB取代普通卷积层+BN层,而论文中说法是DBB能够取代单个普通卷积层?

关于模型训练的疑问

您好!我想问下,我在尝试使用DBB模块去替换resnet50的shortcut结构,但是替换后,在加载您谷歌云上的预训练model时,出现了加载失败的情况,应该如何解决?
本人初涉这个领域,还望能给些建议。

DiverseBranchBlock

Hello author, I replaced DiverseBranchBlock with RepBlock in a model. The performance has been slightly improved, but the number of parameters has been greatly increased. Please ask the author for any suggestions.

转换模型

您好, 我在谷歌云盘/百度云下载模型时, 发现resnet18是一个文件夹, 文件夹内没有模型, resnet50有对应的模型, 但是在用convert.py进行转换时,第27行train_model.load_state_dict(ckpt)报错 ,会出现不匹配的key,报错信息如下(部分省略):
RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict: "stage1.0.conv2.dbb_avg.bn.bn.weight", "stage1.0.conv2.dbb_avg.bn.bn.bias", "stage1.0.conv2.dbb_avg.bn.bn.running_mean", "stage1.0.conv2.dbb_avg.bn.bn.running_var", "
stage1.0.conv2.dbb_1x1_kxk.bn1.bn.weight", "stage1.0.conv2.dbb_1x1_kxk.bn1.bn.bias", "stage1.0.conv2.dbb_1x1_kxk.bn1.bn.running_mean", "stage1.0.conv2.dbb_1x1_kxk.bn1.bn.running_var", "stage1.1.conv2.dbb_avg.bn.bn.weight", "stage1.1.conv2.dbb_avg.bn.bn.bias", "stage1.1.conv2.dbb_avg.bn.bn.running_mean", "stage1.1.conv2.dbb_avg.bn.bn.running_var", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.weight", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.bias", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.running_mean", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.running_var", "stage1.2.conv2.dbb_avg.bn.bn.weight", "stage1.2.conv2.dbb_avg.bn.bn.bias", "stage1.2.conv2.dbb_avg.bn.bn.running_mean", "stage1.2.conv2.dbb_avg.bn.bn.running_var", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.weight", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.bias", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.running_mean", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.running_var"........

Unexpected key(s) in state_dict: "stage1.0.conv2.dbb_avg.bn.weight", "stage1.0.conv2.dbb_avg.bn.bias", "stage1.0.conv2.dbb_avg.bn.running_mean", "stage1.0.conv2.dbb_avg.bn.running_var",
"stage1.0conv2.dbb_avg.bn.num_batches_tracked", "stage1.0.conv2.dbb_1x1_kxk.bn1.weight", "stage1.0.conv2.dbb_1x1_kxk.bn1.bias", "stage1.0.conv2.dbb_1x1_kxk.bn1.running_mean", "stage1.0.conv2.dbb_1x1_kxk.bn1.running_var", "stage1.0.conv2.dbb_1x1_kxk.bn1.num_batches_tracked", "stage1.1.conv2.dbb_avg.bn.weight", "stage1.1.conv2.dbb_avg.bn.bias", "stage1.1.conv2.dbb_avg.bn.running_mean", "stage1.1.conv2.dbb_avg.bn.runing_var", "stage1.1.conv2.dbb_avg.bn.num_batches_tracked", "stage1.1.conv2.dbb_1x1_kxk.bn1.weight", "stage1.1.conv2.dbb_1x1_kxk.bn1.bias", "stage1.1.conv2.dbb_1x1_kxk.bn1.running_mean", "stage1.1.conv2.dbb_1x1_kxk.bn1.running_var", "stage1.1.conv2.dbb_1x1_kxk.bn1.num_batches_tracked", "stage1.2.conv2.dbb_avg.bn.weight", "stage1.2.conv2.dbb_avg.bn.bias", "stage1.2.conv2.dbb_avg.bn.running_mean", "stage1.conv2.dbb_avg.bn.running_var", "stage1.2.conv2.dbb_avg.bn.num_batches_tracked", "stage1.2.conv2.dbb_1x1_kxk.bn1.weight", "stage1.2.conv2.dbb_1x1_kxk.bn1.bias", "stage1.2.conv2.dbb_1x1_kxk.bn1.running_mean", "stage1.2.conv2.dbb_1x1_kxk.bn1.running_var", "stage1.2.conv2.dbb_1x1_kxk.bn1.num_batches_tracked", "stage2.0.conv2.dbb_avg.bn.weight", "stage2.0.conv2.dbb_avg.bn.bias", "stage2.0.conv2.dbb_avg.bn.runing_mean", "stage2.0.conv2.dbb_avg.bn.running_var".......

平均池化

您好,DBB模块中有一分支是平均池化,那数据下采样尺度变小了,怎么和11和kk的卷积加到一起呢?

Plug-in version implementation

hi @DingXiaoH , nice work !!! According to borrows your implementation, I'm has realized a plug-in version of DiverseBranchBlock

This plug-in version has the following advantages:

  1. Do not modify original model
  2. After training, can fuse DBB to original architecture
  3. Support mixed insert/fuse operations for ACBlock/RepVGGBlock/DBBlock

How to insert

Use Config FIle

see rd50_dbb_cifar100_224_e100_sgd_calr.yaml

...
MODEL:
  CONV:
    TYPE: 'Conv2d'
    ADD_BLOCKS: ('DiverseBranchBlock',)
...

build resnet50_d with DDB

from zcls.config import cfg
from zcls.model.recognizers.build import build_recognizer

cfg.merge_from_file(args.config_file)
model = build_recognizer(cfg, device=torch.device('cpu'))

Test

see test_dbblock.py

How to fuse

see model_fuse.py

$ python tools/model_fuse.py --help
usage: model_fuse.py [-h] [--verbose] CONFIG_FILE OUTPUT_DIR

Fuse block for ACBlock/RepVGGBLock/DBBlock

positional arguments:
  CONFIG_FILE  path to config file
  OUTPUT_DIR   path to output

optional arguments:
  -h, --help   show this help message and exit
  --verbose    Print Model Info

Other

Structural Parameterization is really a nice idea !!! By using ACBlock, I improved model precision in a Dataset that is more bigger than ImageNet, hope DBB can make better precision

Last, thanks you again

DiverseBranchBlock 在netron可视化出现的问题

你好,我拜读了你这篇文章和19年的文章,并且使用netron可视化了你的base、ACB和DBB模块结构,我发现base和ACB都可以正常显示,但是DBB不对劲,方便留个微信或者QQ解决下这个问题吗,谢谢。

3X3卷积串联3X3卷积

您好,我看您论文里面提到了1X1卷积串联3X3卷积,我想问一下3X3卷积串联3X3卷积从理论上能实现吗?如果对特征图不降尺寸,每个卷积都需要padding的话,又应该怎么做呢?

transIII_1x1_kxk does not behave as expected

Hi,

I verified like this:

conv1 = nn.Conv2d(32, 64, 1, 1, 0, bias=True)
conv2 = nn.Conv2d(64, 128, 3, 1, 1, bias=True)
conv = nn.Conv2d(32, 128, 3, 1, 1, bias=True)

k, b = transIII_1x1_kxk(conv1.weight, conv1.bias, conv2.weight, conv2.bias, 1)
conv.weight.copy_(k)
conv.bias.copy_(b)
inten = torch.randn(2, 32, 224, 224)
out1 = conv2(conv1(inten))
out2 = conv(inten)
print((out1 - out2).abs().max())

And the output is 0.11, which is much too great. Have you noticed this ?

mmsegmentation

Hi! Thanks for your excellent work! I have a question that if I want to use DBB in mmsegmentation, and what should I do? ^_^

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.