Code for Switchable Normalization from "Differentiable Learning-to-Normalize via Switchable Normalization", https://arxiv.org/abs/1806.10779

Python 45.87% Shell 0.38% HTML 53.75%

normalization deeplearning convolutional-neural-networks imagenet pytorch

switchable-normalization's People

Contributors

Stargazers

Watchers

Forkers

pluo911 bygreencn templeblock yudian777 zhanglinpeng radovankavicky gapdata codeaudit afcarl xerxes01 cxz splevine satoshirobatofujimoto grseb9s shlpu pikqu godforever albertlzg abdelpakey aptxj zwwwayne hszhao leichangqing wwwanghao autohe ml-lab ascenoputing keyky wangguangyuan bousejin briando2005 yamlong nutufts heidies agilajah wangjianyuweg chengmuni66 ruixuejianfei jacke121 wanghan0501 xwushirley arasharchor liu3xing3long irvingshu qitong chaoso guozhongluo 0shimax hephaex shiyongde jules-diez datasoccer newsky lingyunwu14 pzhao16me yushanshan05 jac578 pkurainbow winwinjjiang wpfhtl cosmoshua jiefengpeng csjunxu mrluin soywu shubhampachori12110095 starstylesky fangwudi pandamax leonardyao lingeo zeitgeistqian xzf125244170 tinyloop rosinality suyanzhou626 zsef123 mayukuner yuanqunyong yucornetto mzy97 447555240 wangxu930123 xz6014 jperezrua howal wxiaoman shabbirmarzban dreadlord1984 liyuanyaun hmzjwhmzjw hovhannesmargaryan xinkang hdony zymale yupeii liu-ca haishenhuang wanglixilinx zfxu

switchable-normalization's Issues

Strange behaviors (using_moving_average and last_gamma)

Hello,

I am trying to use SwitchNorm instead of BatchNorm in my current project with pytorch v0.4.0.
I encounter two strange behaviors and I would like to know if I am the only one before starting a more in-depth debugging.

The first behavior is using using_moving_average=True and last_gamma=True. My training and testing losses stay high (both around 14).

The second behavior is using using_moving_average=False and last_gamma=False. My training loss decreases in a normal way (around 0), but my testing loss stays really high (around 24)
I found a hack to be able to use the layer: I modified it so that it computes the batch statistics in eval mode as it is done in nn.BatchNorm2d.

Thanks for your help

Could you share the resnet-101 model pretrained on Imagenet?

Really great work!

I am wondering when it would be convenient for you to share the pretrained model with resnet-101.

Thanks!

I complete the SN by Keras. welcome to advice

location in SN by Keras

how to mix sn and bn

hello,
Thank you for the great jobs and sharing it.I want to use encoder net with bn and decoder net with sn,but got the error "RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation".How can i deal with it.

when I use SN instead of BN, there is a big difference between val acc and train acc

when i use your code resnetv2sn50 ,it is normal difference between val and train(maybe ,i only run 10 epochs).
but when i use my CNN model), there is a big difference between val acc and train acc,like this:

Epoch [1/120]: 100%|#| 625/625 [1:07:57<00:00, 6.43s/it, loss=6.5377, lr=0.1000, top1_avg=1.02, top1_val=3.47, top5_avg=3.75, top5_val=10.40]
Validation: 100%|######| 24/24 [02:20<00:00, 5.84s/it, loss=7.6148, top1_avg=0.54, top5_avg=2.26]
Epoch [2/120]: 100%|#| 625/625 [1:07:23<00:00, 6.40s/it, loss=5.3173, lr=0.2050, top1_avg=6.57, top1_val=10.21, top5_avg=18.45, top5_val=26.07]
Validation: 100%|######| 24/24 [02:18<00:00, 6.10s/it, loss=7.7053, top1_avg=0.50, top5_avg=1.98]
Epoch [3/120]: 100%|#| 625/625 [1:07:15<00:00, 6.51s/it, loss=4.6169, lr=0.3100, top1_avg=13.02, top1_val=15.82, top5_avg=30.95, top5_val=35.99]
Validation: 100%|######| 24/24 [02:19<00:00, 5.85s/it, loss=9.0973, top1_avg=0.21, top5_avg=1.11]
Epoch [4/120]: 100%|#| 625/625 [1:07:30<00:00, 6.48s/it, loss=4.1465, lr=0.4150, top1_avg=18.47, top1_val=21.44, top5_avg=39.85, top5_val=42.77]
Validation: 100%|######| 24/24 [02:18<00:00, 5.92s/it, loss=7.7036, top1_avg=0.90, top5_avg=3.23]
Epoch [5/120]: 100%|#| 625/625 [1:07:26<00:00, 6.52s/it, loss=3.8405, lr=0.5200, top1_avg=22.54, top1_val=24.46, top5_avg=45.77, top5_val=49.37]
Validation: 100%|######| 24/24 [02:18<00:00, 5.75s/it, loss=7.5990, top1_avg=1.46, top5_avg=5.14]
Epoch [6/120]: 100%|#| 625/625 [1:07:15<00:00, 6.41s/it, loss=3.6326, lr=0.6250, top1_avg=25.56, top1_val=27.59, top5_avg=49.71, top5_val=51.22]
Validation: 100%|######| 24/24 [02:20<00:00, 5.88s/it, loss=8.4800, top1_avg=1.21, top5_avg=5.50]
Epoch [7/120]: 100%|#| 625/625 [1:07:25<00:00, 6.25s/it, loss=3.4629, lr=0.6249, top1_avg=28.14, top1_val=27.83, top5_avg=52.97, top5_val=52.20]
Validation: 100%|#####| 24/24 [02:18<00:00, 5.77s/it, loss=7.4793, top1_avg=3.26, top5_avg=10.19]
Epoch [8/120]: 100%|#| 625/625 [1:07:22<00:00, 6.48s/it, loss=3.3294, lr=0.6245, top1_avg=30.29, top1_val=30.03, top5_avg=55.50, top5_val=55.62]
Validation: 100%|#####| 24/24 [02:20<00:00, 5.91s/it, loss=5.3514, top1_avg=8.52, top5_avg=22.21]
Epoch [9/120]: 100%|#| 625/625 [1:07:23<00:00, 6.56s/it, loss=3.2302, lr=0.6239, top1_avg=31.93, top1_val=33.84, top5_avg=57.33, top5_val=58.84]
Validation: 100%|#####| 24/24 [02:21<00:00, 5.81s/it, loss=5.1456, top1_avg=9.44, top5_avg=24.98]
Epoch [10/120]: 100%|#| 625/625 [1:07:24<00:00, 6.44s/it, loss=3.1546, lr=0.6231, top1_avg=33.16, top1_val=33.15, top5_avg=58.67, top5_val=58.30]
Validation: 100%|####| 24/24 [02:19<00:00, 5.91s/it, loss=4.6928, top1_avg=13.57, top5_avg=31.87]
Epoch [11/120]: 100%|#| 625/625 [1:07:24<00:00, 6.12s/it, loss=3.1030, lr=0.6220, top1_avg=34.02, top1_val=36.08, top5_avg=59.62, top5_val=60.50]
Validation: 100%|####| 24/24 [02:20<00:00, 5.87s/it, loss=5.0949, top1_avg=10.75, top5_avg=26.40]
Epoch [12/120]: 100%|#| 625/625 [1:07:25<00:00, 6.28s/it, loss=3.0508, lr=0.6207, top1_avg=34.98, top1_val=35.55, top5_avg=60.59, top5_val=60.11]
Validation: 100%|####| 24/24 [02:22<00:00, 6.03s/it, loss=5.0900, top1_avg=10.55, top5_avg=25.98]

What should I pay attention to? I found you change order between bn(sn) and conv ,is it important?

why not add gn

sn learn importantance among bn，In，Ln，why not add gn

BackPropagation？

I am a beginner of PyTorch. Don't we need to write something like a backward function, like the last section in your paper https://arxiv.org/abs/1806.10779 (the appendix, equation(6)-(11) )? I thought you need to define backward function in the class SwitchNorm(nn.Module), or it automatically backward through the forward question?

Problems about Usage of SyncSN

Very nice work! I try to use your train code in face_recognition, but I met some problems. Frist, rank = int(os.environ['RANK']) and world_size = int(os.environ['WORLD_SIZE']) don't have values, so I added some code os.environ['RANK']=str(0), os.environ['WORLD_SIZE']=str(4). Is that right? Second, my code is stuck at dist.broadcast, it doesn't have any error message, just stuck. Could you give me some advice

Switchable Norm v.s. IBN-Net?

Is this switchable norm able to learn a hybrid normalization layer used in IBN-Net?

As the IBN-Net can improve the domain generalization performance. How could we train the switchable norm to improve the model's generalizability?

Thank you very much!

about SwitchNorm3d

The input corresponding to SwitchNorm3d is (N,C,D,H,W). How to deal with the newly added dimensions and how to mark the subscript of the derived formula, thank you.

where can we find the meta files in the updated loader?

Hi, could you please provide some hints to find meta files of ImageNet like train.txt and val.txt in this version of code?

caffe

Who has the caffe version?please give me a link. TANKS!!!

cannot apply switchnorm1d to 3D input?

It seems that when I want to apply sn1d to a 3D tensor (N,C,L) it fails at check_input_dim and says only accept 2D input?

Nan error caused by “N X C X 1 X 1” input features

Update!
I have found the error is related to the input shape of the SW. When the input shape is NXCX1X1, the output of the SW will become NAN. The nn.BatchNorm can deal with NXCX1X1 correctly. I guess it is caused by that you compute the variance of the single value when you compute the variance of IN. Hope that can help you fix this bug.

Really cool work. I am trying to use the SN for segmentation tasks with your imagenet pretrained resnet50 (ResNet50v2+SN(8,32)-77.57.pth ) to initialize the backbone. I have added an decoder like ASPP module and the ASPP module contains several randomly initialized SN layer. I find the features before the PSP module is OK but become NAN after passing the SN module.

Really strange, I hope you could help me to solve this problem.

I am wondering the difference between resnet50v1+sn and resnet50v2+sn, is this problem related to the choice of the backbone network?

Here I provide the details of my usage for the ASPP module,

class SN_ASPPModule(nn.Module):
    """
    Reference: 
        Deeplabv3, combine the dilated convolution with the global average pooling.
    """
    def __init__(self, features, out_features=512, dilations=(12, 24, 36), using_moving_average=True):
        super(SN_ASPPModule, self).__init__()
        self.using_moving_average = using_moving_average

        self.conv1 = nn.Sequential(AdaptiveAvgPool2d((1,1)),
                                   nn.Conv2d(features, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv2 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv3 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[0], dilation=dilations[0], bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv4 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[1], dilation=dilations[1], bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv5 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[2], dilation=dilations[2], bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))

        self.bottleneck = nn.Sequential(
            nn.Conv2d(out_features * 5, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
            SwitchNorm(out_features, using_moving_average=self.using_moving_average),
            nn.Dropout2d(0.1)
            )

Undefined name 'model_urls' in ./models/resnet_sn.py

Each undefined name has the potential to raise NameError at runtime.

flake8 testing of https://github.com/switchablenorms/Switchable-Normalization on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./models/resnet_sn.py:159:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
                                                 ^
./models/resnet_sn.py:171:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))
                                                 ^
./models/resnet_sn.py:183:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
                                                 ^
./models/resnet_sn.py:195:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
                                                 ^
./models/resnet_sn.py:207:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet152']))
                                                 ^
5     F821 undefined name 'model_urls'
5

Failed to access ResNet101v1+SN (8,32)

Hi, it seems that we can not access the provided Google Could link for ResNet101v1+SN (8,32).

Switchable Normalization ne

mean_weight and var_weight both need softmax ，which assigns value to IN, LN, and BN . I tried the version of TensorFlow, which is more time-consuming and computerized than just BN.
Am I right?
Thx!

Difference between resnetv1 and resnetv2?

What is main difference between them?

Switch Norm 1d for 3D tensors

Hello,
Thank you for providing code implementation for your paper.

I am interested in trying your normalization in my current experiment which works on raw waveform and audio "style". It is thus of prime interest to adaptively modulate different feature normalizations and I hope your proposal would work good to my extent.

However, when I read your
Switchable-Normalization/devkit/ops/switchable_norm.py
the 1d normalization only applies to 2D tensors and the 2d normalization only applies to 4D tensors. Whereas pytorch implementations of BatchNorm1d and InstanceNorm1d applies to both 2D and 3D tensors.

If possible, how should I please apply your SwitchNorm1d to 3D tensors, as for instance the output of conv1d ?

thank you !

traing time?

Could you report your training time with SN and BN?

The value of weight in Figure 7?

When I using the code as
checkpoint = torch.load('ResNet50v2+SN(8,32)-77.57.pth')' for k, v in checkpoint.items(): print(k, ':', v)
I noticed that, for example,
'module.layer1.0.sn3.mean_weight': [-0.5125, 0.3186, 0.1940]
'module.layer1.0.sn3.var_weight': [-0.1849, -1.7146, 1.8994]

But, in the Fig.7 of the paper, the sum of importance weights is 1.
So, what's the relationship of these two kind of weights? I really confused with that.
Could you please give an explanation?
Thank you!

ResNet-50 uses Bottleneck Block

Hi,
I noticed that your ResNet-50 will not use a Bottleneck, the orignal implementation does however:
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L231
https://github.com/switchablenorms/Switchable-Normalization/blob/master/face_recognition/models/backbones/resnet.py#L106

Is this on purpose?

switchablenorms / switchable-normalization Goto Github PK

switchable-normalization's People

Contributors

Stargazers

Watchers

Forkers

switchable-normalization's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs