dingxiaoh / diversebranchblock Goto Github PK

View Code? Open in Web Editor NEW

312.0 9.0 43.0 296 KB

Diverse Branch Block: Building a Convolution as an Inception-like Unit

License: Apache License 2.0

Python 100.00%

diversebranchblock's Issues

Worse performance with DiverseBlock for Cifar10, ResNet18

Hello.

Thank you for your interesting work, and code.

I tried using your Diverseblock in ResNet18 (according to your instructions, replacing conv+bn with diverse blocks). My code is based on https://github.com/kuangliu/pytorch-cifar. The accuracy drops from 95.4% to 95.1%. Do you have any ideas for why this is?

Thank you.

BNAndPadLayer may cause no padding problem in SyncBatchnorm mode

Just turn all batchnorm layer in DBB to sync mode in pytorch and BNAndPadLayer will cause no padding operations.

请问有三维的版本吗

ValueError: some parameters appear in more than one parameter group

Hi, when I used DiverseBranchBlock to replace Conv-Bn in my network, I met this error
ValueError: some parameters appear in more than one parameter group
Have you met it before?

用DBB block替换REPVGGblock，性能下降

RT 主要是收敛速度和精度会比repVGG低一些

transIII_1x1_kxk does not behave as expected

Hi,

I verified like this:

conv1 = nn.Conv2d(32, 64, 1, 1, 0, bias=True)
conv2 = nn.Conv2d(64, 128, 3, 1, 1, bias=True)
conv = nn.Conv2d(32, 128, 3, 1, 1, bias=True)

k, b = transIII_1x1_kxk(conv1.weight, conv1.bias, conv2.weight, conv2.bias, 1)
conv.weight.copy_(k)
conv.bias.copy_(b)
inten = torch.randn(2, 32, 224, 224)
out1 = conv2(conv1(inten))
out2 = conv(inten)
print((out1 - out2).abs().max())

And the output is 0.11, which is much too great. Have you noticed this ?

拜读了一下paper，有个想法

DBB模块用于repvgg block，是否能获得比原repvgg更强的性能

mmsegmentation

Hi! Thanks for your excellent work! I have a question that if I want to use DBB in mmsegmentation, and what should I do? ^_^

MobileNet

目前只看到resnet，请问mobilenet什么时候有哈？

关于TRANS Ⅲ中的注意事项

您好！不好意思打扰您！！我最近拜读您的论文，看到了TRANS Ⅲ，不得不说这种变换确实很是新颖，但看到其中您提到的注意点，即如果第二层K*K 如果对输入做了0填充，那么公式8是不成立的，解决方案是用第一次等价过来的卷积的偏置 REP(b1) 作为填充，对这一点我有点不太理解，您能给详细解释一下不成立的原因以及解决方案的原因么？谢谢您！

Plug-in version implementation

hi @DingXiaoH , nice work !!! According to borrows your implementation, I'm has realized a plug-in version of DiverseBranchBlock

This plug-in version has the following advantages:

Do not modify original model
After training, can fuse DBB to original architecture
Support mixed insert/fuse operations for ACBlock/RepVGGBlock/DBBlock

How to insert

Use Config FIle

see rd50_dbb_cifar100_224_e100_sgd_calr.yaml

...
MODEL:
  CONV:
    TYPE: 'Conv2d'
    ADD_BLOCKS: ('DiverseBranchBlock',)
...

build resnet50_d with DDB

from zcls.config import cfg
from zcls.model.recognizers.build import build_recognizer

cfg.merge_from_file(args.config_file)
model = build_recognizer(cfg, device=torch.device('cpu'))

Test

see test_dbblock.py

How to fuse

see model_fuse.py

$ python tools/model_fuse.py --help
usage: model_fuse.py [-h] [--verbose] CONFIG_FILE OUTPUT_DIR

Fuse block for ACBlock/RepVGGBLock/DBBlock

positional arguments:
  CONFIG_FILE  path to config file
  OUTPUT_DIR   path to output

optional arguments:
  -h, --help   show this help message and exit
  --verbose    Print Model Info

Other

Structural Parameterization is really a nice idea !!! By using ACBlock, I improved model precision in a Dataset that is more bigger than ImageNet, hope DBB can make better precision

Last, thanks you again

平均池化

您好，DBB模块中有一分支是平均池化，那数据下采样尺度变小了，怎么和11和kk的卷积加到一起呢？

3X3卷积串联3X3卷积

您好，我看您论文里面提到了1X1卷积串联3X3卷积，我想问一下3X3卷积串联3X3卷积从理论上能实现吗？如果对特征图不降尺寸，每个卷积都需要padding的话，又应该怎么做呢？

运行python test.py IMGNET_PATH train ResNet-18_DBB_7101.pth -a ResNet-18 -t DBB出错，您知道这是什么原因吗

请教大佬

大佬，你DBB为啥不加identity呢？ RepVGG里面就尝试了identity。是有啥考虑吗？

训练和预测转化

请问作者，训练的模型不转化，直接用来预测、评估可以吗？

模型转换

请问作者大大如果把DBB模块替换了自己网络的某些位置的卷积块，应该怎样得到paper里的降低参数模型呢，我更改部分模块后参数由240M到330M左右，比较大。

转换模型

您好, 我在谷歌云盘/百度云下载模型时, 发现resnet18是一个文件夹, 文件夹内没有模型, resnet50有对应的模型, 但是在用convert.py进行转换时,第27行train_model.load_state_dict(ckpt)报错 ,会出现不匹配的key,报错信息如下(部分省略):
RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict: "stage1.0.conv2.dbb_avg.bn.bn.weight", "stage1.0.conv2.dbb_avg.bn.bn.bias", "stage1.0.conv2.dbb_avg.bn.bn.running_mean", "stage1.0.conv2.dbb_avg.bn.bn.running_var", "
stage1.0.conv2.dbb_1x1_kxk.bn1.bn.weight", "stage1.0.conv2.dbb_1x1_kxk.bn1.bn.bias", "stage1.0.conv2.dbb_1x1_kxk.bn1.bn.running_mean", "stage1.0.conv2.dbb_1x1_kxk.bn1.bn.running_var", "stage1.1.conv2.dbb_avg.bn.bn.weight", "stage1.1.conv2.dbb_avg.bn.bn.bias", "stage1.1.conv2.dbb_avg.bn.bn.running_mean", "stage1.1.conv2.dbb_avg.bn.bn.running_var", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.weight", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.bias", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.running_mean", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.running_var", "stage1.2.conv2.dbb_avg.bn.bn.weight", "stage1.2.conv2.dbb_avg.bn.bn.bias", "stage1.2.conv2.dbb_avg.bn.bn.running_mean", "stage1.2.conv2.dbb_avg.bn.bn.running_var", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.weight", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.bias", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.running_mean", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.running_var"........

Unexpected key(s) in state_dict: "stage1.0.conv2.dbb_avg.bn.weight", "stage1.0.conv2.dbb_avg.bn.bias", "stage1.0.conv2.dbb_avg.bn.running_mean", "stage1.0.conv2.dbb_avg.bn.running_var",
"stage1.0conv2.dbb_avg.bn.num_batches_tracked", "stage1.0.conv2.dbb_1x1_kxk.bn1.weight", "stage1.0.conv2.dbb_1x1_kxk.bn1.bias", "stage1.0.conv2.dbb_1x1_kxk.bn1.running_mean", "stage1.0.conv2.dbb_1x1_kxk.bn1.running_var", "stage1.0.conv2.dbb_1x1_kxk.bn1.num_batches_tracked", "stage1.1.conv2.dbb_avg.bn.weight", "stage1.1.conv2.dbb_avg.bn.bias", "stage1.1.conv2.dbb_avg.bn.running_mean", "stage1.1.conv2.dbb_avg.bn.runing_var", "stage1.1.conv2.dbb_avg.bn.num_batches_tracked", "stage1.1.conv2.dbb_1x1_kxk.bn1.weight", "stage1.1.conv2.dbb_1x1_kxk.bn1.bias", "stage1.1.conv2.dbb_1x1_kxk.bn1.running_mean", "stage1.1.conv2.dbb_1x1_kxk.bn1.running_var", "stage1.1.conv2.dbb_1x1_kxk.bn1.num_batches_tracked", "stage1.2.conv2.dbb_avg.bn.weight", "stage1.2.conv2.dbb_avg.bn.bias", "stage1.2.conv2.dbb_avg.bn.running_mean", "stage1.conv2.dbb_avg.bn.running_var", "stage1.2.conv2.dbb_avg.bn.num_batches_tracked", "stage1.2.conv2.dbb_1x1_kxk.bn1.weight", "stage1.2.conv2.dbb_1x1_kxk.bn1.bias", "stage1.2.conv2.dbb_1x1_kxk.bn1.running_mean", "stage1.2.conv2.dbb_1x1_kxk.bn1.running_var", "stage1.2.conv2.dbb_1x1_kxk.bn1.num_batches_tracked", "stage2.0.conv2.dbb_avg.bn.weight", "stage2.0.conv2.dbb_avg.bn.bias", "stage2.0.conv2.dbb_avg.bn.runing_mean", "stage2.0.conv2.dbb_avg.bn.running_var".......

关于模型训练的疑问

您好！我想问下，我在尝试使用DBB模块去替换resnet50的shortcut结构，但是替换后，在加载您谷歌云上的预训练model时，出现了加载失败的情况，应该如何解决？
本人初涉这个领域，还望能给些建议。

应用于自己模型问题。

为什么在README中写道如果用于属于自己模型，实际上需要用DBB取代普通卷积层+BN层，而论文中说法是DBB能够取代单个普通卷积层？

Where can I find this paper?

Hi, your ACNet series is very exciting work. Where can I find this paper? Looking forward to reading it!

会存在tensor对不齐问题，经过你这个变换之后就变成了tenso（a）,tensor（b)错误

您好，在拜读了数遍您的论文后，我一直有个疑问，对于转化三而言的意义在哪呢。期待您的回复

pretrained model?

您好，请问是否提供预训练模型呢？

why padding == kernel_size // 2 is asserted?

DiverseBranchBlock/diversebranchblock.py

Line 105 in be15be7

assert padding == kernel_size // 2

Why padding should be equal to kernel // 2? what if Conv2d(kernel_size=4, stride=2, padding=1)?

关於k x k 接k x k conv2d

請問你覺得k x k 接k x k conv2d 有方法能融合起來嗎？

DiverseBranchBlock

Hello author, I replaced DiverseBranchBlock with RepBlock in a model. The performance has been slightly improved, but the number of parameters has been greatly increased. Please ask the author for any suggestions.

Maybe need to reverse `H_pixels_to_pad` & `W_pixels_to_pad`?

Hi,
I just wonder whether here should be F.pad(kernel, [W_pixels_to_pad, W_pixels_to_pad, H_pixels_to_pad, H_pixels_to_pad]), since the F.pad's padding mode should be set as [padding_left, padding_right, padding_top, padding_bottom

DiverseBranchBlock/dbb_transforms.py

Line 44 in cd627d5

 return F.pad(kernel, [H_pixels_to_pad, H_pixels_to_pad, W_pixels_to_pad, W_pixels_to_pad]) 

Best

有tensorflow1.x的版本吗

请问有tensorflow1.x的版本吗

IdentityBasedConv1x1

Can I repalce IdentityBasedConv1x1 with conv1x1.

关于DBB替换Res18的多分类表现

大佬您好，看了您的文章之后，我试着用使用DBB模块的Res18网络用于自己的多分类任务中，使用方法如下：

import torch
import torch.nn as nn
from DiverseBranchBlock.convnet_utils import switch_deploy_flag, switch_conv_bn_impl, build_model

def Dbb_Res(num_classes,pretrained=True):

switch_deploy_flag(False)
switch_conv_bn_impl('DBB')
model = build_model('ResNet-18')

if pretrained ==True:

    model.load_state_dict(torch.load('DiverseBranchBlock\ResNet-18_DBB_7099.pth'))



in_features = model.linear.in_features
model.linear = nn.Linear(in_features, num_classes)
return model

但是在实战中效果却一塌糊涂，预训练res18能达到80%的准确率，我是用如上方法构建的网络，精度只有6% - .-,请问是我这种方法调用不正确吗，如何调整，麻烦您了！

两点疑问

下图在验证重参数化前后输出差异的时候，为什么需要先将BN权重初始化？
如果dbb_1x1_kxk中将1x1个数增加（即纵向堆叠很多个1x1，最后一个是3x3），重参数化后误差是否会加大?

期待您的回复~

dingxiaoh / diversebranchblock Goto Github PK

diversebranchblock's Issues

How to insert

Use Config FIle

Test

How to fuse

Other

Recommend Projects

Recommend Topics

Recommend Org

Jobs