switchablenorms / switchable-normalization Goto Github PK
View Code? Open in Web Editor NEWCode for Switchable Normalization from "Differentiable Learning-to-Normalize via Switchable Normalization", https://arxiv.org/abs/1806.10779
Code for Switchable Normalization from "Differentiable Learning-to-Normalize via Switchable Normalization", https://arxiv.org/abs/1806.10779
Hello,
I am trying to use SwitchNorm instead of BatchNorm in my current project with pytorch v0.4.0.
I encounter two strange behaviors and I would like to know if I am the only one before starting a more in-depth debugging.
The first behavior is using using_moving_average=True
and last_gamma=True
. My training and testing losses stay high (both around 14).
The second behavior is using using_moving_average=False
and last_gamma=False
. My training loss decreases in a normal way (around 0), but my testing loss stays really high (around 24)
I found a hack to be able to use the layer: I modified it so that it computes the batch statistics in eval mode as it is done in nn.BatchNorm2d.
Thanks for your help
Really great work!
I am wondering when it would be convenient for you to share the pretrained model with resnet-101.
Thanks!
location in SN by Keras
hello,
Thank you for the great jobs and sharing it.I want to use encoder net with bn and decoder net with sn,but got the error "RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation".How can i deal with it.
when i use your code resnetv2sn50 ,it is normal difference between val and train(maybe ,i only run 10 epochs).
but when i use my CNN model), there is a big difference between val acc and train acc,like this:
Epoch [1/120]: 100%|#| 625/625 [1:07:57<00:00, 6.43s/it, loss=6.5377, lr=0.1000, top1_avg=1.02, top1_val=3.47, top5_avg=3.75, top5_val=10.40]
Validation: 100%|######| 24/24 [02:20<00:00, 5.84s/it, loss=7.6148, top1_avg=0.54, top5_avg=2.26]
Epoch [2/120]: 100%|#| 625/625 [1:07:23<00:00, 6.40s/it, loss=5.3173, lr=0.2050, top1_avg=6.57, top1_val=10.21, top5_avg=18.45, top5_val=26.07]
Validation: 100%|######| 24/24 [02:18<00:00, 6.10s/it, loss=7.7053, top1_avg=0.50, top5_avg=1.98]
Epoch [3/120]: 100%|#| 625/625 [1:07:15<00:00, 6.51s/it, loss=4.6169, lr=0.3100, top1_avg=13.02, top1_val=15.82, top5_avg=30.95, top5_val=35.99]
Validation: 100%|######| 24/24 [02:19<00:00, 5.85s/it, loss=9.0973, top1_avg=0.21, top5_avg=1.11]
Epoch [4/120]: 100%|#| 625/625 [1:07:30<00:00, 6.48s/it, loss=4.1465, lr=0.4150, top1_avg=18.47, top1_val=21.44, top5_avg=39.85, top5_val=42.77]
Validation: 100%|######| 24/24 [02:18<00:00, 5.92s/it, loss=7.7036, top1_avg=0.90, top5_avg=3.23]
Epoch [5/120]: 100%|#| 625/625 [1:07:26<00:00, 6.52s/it, loss=3.8405, lr=0.5200, top1_avg=22.54, top1_val=24.46, top5_avg=45.77, top5_val=49.37]
Validation: 100%|######| 24/24 [02:18<00:00, 5.75s/it, loss=7.5990, top1_avg=1.46, top5_avg=5.14]
Epoch [6/120]: 100%|#| 625/625 [1:07:15<00:00, 6.41s/it, loss=3.6326, lr=0.6250, top1_avg=25.56, top1_val=27.59, top5_avg=49.71, top5_val=51.22]
Validation: 100%|######| 24/24 [02:20<00:00, 5.88s/it, loss=8.4800, top1_avg=1.21, top5_avg=5.50]
Epoch [7/120]: 100%|#| 625/625 [1:07:25<00:00, 6.25s/it, loss=3.4629, lr=0.6249, top1_avg=28.14, top1_val=27.83, top5_avg=52.97, top5_val=52.20]
Validation: 100%|#####| 24/24 [02:18<00:00, 5.77s/it, loss=7.4793, top1_avg=3.26, top5_avg=10.19]
Epoch [8/120]: 100%|#| 625/625 [1:07:22<00:00, 6.48s/it, loss=3.3294, lr=0.6245, top1_avg=30.29, top1_val=30.03, top5_avg=55.50, top5_val=55.62]
Validation: 100%|#####| 24/24 [02:20<00:00, 5.91s/it, loss=5.3514, top1_avg=8.52, top5_avg=22.21]
Epoch [9/120]: 100%|#| 625/625 [1:07:23<00:00, 6.56s/it, loss=3.2302, lr=0.6239, top1_avg=31.93, top1_val=33.84, top5_avg=57.33, top5_val=58.84]
Validation: 100%|#####| 24/24 [02:21<00:00, 5.81s/it, loss=5.1456, top1_avg=9.44, top5_avg=24.98]
Epoch [10/120]: 100%|#| 625/625 [1:07:24<00:00, 6.44s/it, loss=3.1546, lr=0.6231, top1_avg=33.16, top1_val=33.15, top5_avg=58.67, top5_val=58.30]
Validation: 100%|####| 24/24 [02:19<00:00, 5.91s/it, loss=4.6928, top1_avg=13.57, top5_avg=31.87]
Epoch [11/120]: 100%|#| 625/625 [1:07:24<00:00, 6.12s/it, loss=3.1030, lr=0.6220, top1_avg=34.02, top1_val=36.08, top5_avg=59.62, top5_val=60.50]
Validation: 100%|####| 24/24 [02:20<00:00, 5.87s/it, loss=5.0949, top1_avg=10.75, top5_avg=26.40]
Epoch [12/120]: 100%|#| 625/625 [1:07:25<00:00, 6.28s/it, loss=3.0508, lr=0.6207, top1_avg=34.98, top1_val=35.55, top5_avg=60.59, top5_val=60.11]
Validation: 100%|####| 24/24 [02:22<00:00, 6.03s/it, loss=5.0900, top1_avg=10.55, top5_avg=25.98]
What should I pay attention to? I found you change order between bn(sn) and conv ,is it important?
sn learn importantance among bn,In,Ln,why not add gn
I am a beginner of PyTorch. Don't we need to write something like a backward function, like the last section in your paper https://arxiv.org/abs/1806.10779 (the appendix, equation(6)-(11) )? I thought you need to define backward function in the class SwitchNorm(nn.Module), or it automatically backward through the forward question?
Very nice work! I try to use your train code in face_recognition, but I met some problems. Frist, rank = int(os.environ['RANK'])
and world_size = int(os.environ['WORLD_SIZE'])
don't have values, so I added some code os.environ['RANK']=str(0)
, os.environ['WORLD_SIZE']=str(4)
. Is that right? Second, my code is stuck at dist.broadcast, it doesn't have any error message, just stuck. Could you give me some advice
Is this switchable norm able to learn a hybrid normalization layer used in IBN-Net?
As the IBN-Net can improve the domain generalization performance. How could we train the switchable norm to improve the model's generalizability?
Thank you very much!
The input corresponding to SwitchNorm3d is (N,C,D,H,W). How to deal with the newly added dimensions and how to mark the subscript of the derived formula, thank you.
Hi, could you please provide some hints to find meta files of ImageNet like train.txt
and val.txt
in this version of code?
Who has the caffe version?please give me a link. TANKS!!!
It seems that when I want to apply sn1d to a 3D tensor (N,C,L) it fails at check_input_dim and says only accept 2D input?
Update!
I have found the error is related to the input shape of the SW. When the input shape is NXCX1X1, the output of the SW will become NAN. The nn.BatchNorm can deal with NXCX1X1 correctly. I guess it is caused by that you compute the variance of the single value when you compute the variance of IN. Hope that can help you fix this bug.
Really cool work. I am trying to use the SN for segmentation tasks with your imagenet pretrained resnet50 (ResNet50v2+SN(8,32)-77.57.pth ) to initialize the backbone. I have added an decoder like ASPP module and the ASPP module contains several randomly initialized SN layer. I find the features before the PSP module is OK but become NAN after passing the SN module.
Really strange, I hope you could help me to solve this problem.
I am wondering the difference between resnet50v1+sn and resnet50v2+sn, is this problem related to the choice of the backbone network?
Here I provide the details of my usage for the ASPP module,
class SN_ASPPModule(nn.Module):
"""
Reference:
Deeplabv3, combine the dilated convolution with the global average pooling.
"""
def __init__(self, features, out_features=512, dilations=(12, 24, 36), using_moving_average=True):
super(SN_ASPPModule, self).__init__()
self.using_moving_average = using_moving_average
self.conv1 = nn.Sequential(AdaptiveAvgPool2d((1,1)),
nn.Conv2d(features, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
SwitchNorm(out_features, using_moving_average=self.using_moving_average))
self.conv2 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
SwitchNorm(out_features, using_moving_average=self.using_moving_average))
self.conv3 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[0], dilation=dilations[0], bias=False),
SwitchNorm(out_features, using_moving_average=self.using_moving_average))
self.conv4 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[1], dilation=dilations[1], bias=False),
SwitchNorm(out_features, using_moving_average=self.using_moving_average))
self.conv5 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[2], dilation=dilations[2], bias=False),
SwitchNorm(out_features, using_moving_average=self.using_moving_average))
self.bottleneck = nn.Sequential(
nn.Conv2d(out_features * 5, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
SwitchNorm(out_features, using_moving_average=self.using_moving_average),
nn.Dropout2d(0.1)
)
Each undefined name has the potential to raise NameError at runtime.
flake8 testing of https://github.com/switchablenorms/Switchable-Normalization on Python 3.6.3
$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
./models/resnet_sn.py:159:50: F821 undefined name 'model_urls'
model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
^
./models/resnet_sn.py:171:50: F821 undefined name 'model_urls'
model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))
^
./models/resnet_sn.py:183:50: F821 undefined name 'model_urls'
model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
^
./models/resnet_sn.py:195:50: F821 undefined name 'model_urls'
model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
^
./models/resnet_sn.py:207:50: F821 undefined name 'model_urls'
model.load_state_dict(model_zoo.load_url(model_urls['resnet152']))
^
5 F821 undefined name 'model_urls'
5
Hi, it seems that we can not access the provided Google Could link for ResNet101v1+SN (8,32).
What is main difference between them?
Hello,
Thank you for providing code implementation for your paper.
I am interested in trying your normalization in my current experiment which works on raw waveform and audio "style". It is thus of prime interest to adaptively modulate different feature normalizations and I hope your proposal would work good to my extent.
However, when I read your
Switchable-Normalization/devkit/ops/switchable_norm.py
the 1d normalization only applies to 2D tensors and the 2d normalization only applies to 4D tensors. Whereas pytorch implementations of BatchNorm1d and InstanceNorm1d applies to both 2D and 3D tensors.
If possible, how should I please apply your SwitchNorm1d to 3D tensors, as for instance the output of conv1d ?
thank you !
Could you report your training time with SN and BN?
When I using the code as
checkpoint = torch.load('ResNet50v2+SN(8,32)-77.57.pth')' for k, v in checkpoint.items(): print(k, ':', v)
I noticed that, for example,
'module.layer1.0.sn3.mean_weight': [-0.5125, 0.3186, 0.1940]
'module.layer1.0.sn3.var_weight': [-0.1849, -1.7146, 1.8994]
But, in the Fig.7 of the paper, the sum of importance weights is 1.
So, what's the relationship of these two kind of weights? I really confused with that.
Could you please give an explanation?
Thank you!
Hi,
I noticed that your ResNet-50 will not use a Bottleneck, the orignal implementation does however:
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L231
https://github.com/switchablenorms/Switchable-Normalization/blob/master/face_recognition/models/backbones/resnet.py#L106
Is this on purpose?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.