GithubHelp home page GithubHelp logo

switchablenorms / switchable-normalization Goto Github PK

View Code? Open in Web Editor NEW
864.0 25.0 134.0 38.51 MB

Code for Switchable Normalization from "Differentiable Learning-to-Normalize via Switchable Normalization", https://arxiv.org/abs/1806.10779

Python 45.87% Shell 0.38% HTML 53.75%
normalization deeplearning convolutional-neural-networks imagenet pytorch

switchable-normalization's People

Contributors

jiaminren avatar pluo911 avatar switchablenorms avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

switchable-normalization's Issues

Strange behaviors (using_moving_average and last_gamma)

Hello,

I am trying to use SwitchNorm instead of BatchNorm in my current project with pytorch v0.4.0.
I encounter two strange behaviors and I would like to know if I am the only one before starting a more in-depth debugging.

The first behavior is using using_moving_average=True and last_gamma=True. My training and testing losses stay high (both around 14).

The second behavior is using using_moving_average=False and last_gamma=False. My training loss decreases in a normal way (around 0), but my testing loss stays really high (around 24)
I found a hack to be able to use the layer: I modified it so that it computes the batch statistics in eval mode as it is done in nn.BatchNorm2d.

Thanks for your help

how to mix sn and bn

hello,
Thank you for the great jobs and sharing it.I want to use encoder net with bn and decoder net with sn,but got the error "RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation".How can i deal with it.

when I use SN instead of BN, there is a big difference between val acc and train acc

when i use your code resnetv2sn50 ,it is normal difference between val and train(maybe ,i only run 10 epochs).
but when i use my CNN model), there is a big difference between val acc and train acc,like this:

Epoch [1/120]: 100%|#| 625/625 [1:07:57<00:00, 6.43s/it, loss=6.5377, lr=0.1000, top1_avg=1.02, top1_val=3.47, top5_avg=3.75, top5_val=10.40]
Validation: 100%|######| 24/24 [02:20<00:00, 5.84s/it, loss=7.6148, top1_avg=0.54, top5_avg=2.26]
Epoch [2/120]: 100%|#| 625/625 [1:07:23<00:00, 6.40s/it, loss=5.3173, lr=0.2050, top1_avg=6.57, top1_val=10.21, top5_avg=18.45, top5_val=26.07]
Validation: 100%|######| 24/24 [02:18<00:00, 6.10s/it, loss=7.7053, top1_avg=0.50, top5_avg=1.98]
Epoch [3/120]: 100%|#| 625/625 [1:07:15<00:00, 6.51s/it, loss=4.6169, lr=0.3100, top1_avg=13.02, top1_val=15.82, top5_avg=30.95, top5_val=35.99]
Validation: 100%|######| 24/24 [02:19<00:00, 5.85s/it, loss=9.0973, top1_avg=0.21, top5_avg=1.11]
Epoch [4/120]: 100%|#| 625/625 [1:07:30<00:00, 6.48s/it, loss=4.1465, lr=0.4150, top1_avg=18.47, top1_val=21.44, top5_avg=39.85, top5_val=42.77]
Validation: 100%|######| 24/24 [02:18<00:00, 5.92s/it, loss=7.7036, top1_avg=0.90, top5_avg=3.23]
Epoch [5/120]: 100%|#| 625/625 [1:07:26<00:00, 6.52s/it, loss=3.8405, lr=0.5200, top1_avg=22.54, top1_val=24.46, top5_avg=45.77, top5_val=49.37]
Validation: 100%|######| 24/24 [02:18<00:00, 5.75s/it, loss=7.5990, top1_avg=1.46, top5_avg=5.14]
Epoch [6/120]: 100%|#| 625/625 [1:07:15<00:00, 6.41s/it, loss=3.6326, lr=0.6250, top1_avg=25.56, top1_val=27.59, top5_avg=49.71, top5_val=51.22]
Validation: 100%|######| 24/24 [02:20<00:00, 5.88s/it, loss=8.4800, top1_avg=1.21, top5_avg=5.50]
Epoch [7/120]: 100%|#| 625/625 [1:07:25<00:00, 6.25s/it, loss=3.4629, lr=0.6249, top1_avg=28.14, top1_val=27.83, top5_avg=52.97, top5_val=52.20]
Validation: 100%|#####| 24/24 [02:18<00:00, 5.77s/it, loss=7.4793, top1_avg=3.26, top5_avg=10.19]
Epoch [8/120]: 100%|#| 625/625 [1:07:22<00:00, 6.48s/it, loss=3.3294, lr=0.6245, top1_avg=30.29, top1_val=30.03, top5_avg=55.50, top5_val=55.62]
Validation: 100%|#####| 24/24 [02:20<00:00, 5.91s/it, loss=5.3514, top1_avg=8.52, top5_avg=22.21]
Epoch [9/120]: 100%|#| 625/625 [1:07:23<00:00, 6.56s/it, loss=3.2302, lr=0.6239, top1_avg=31.93, top1_val=33.84, top5_avg=57.33, top5_val=58.84]
Validation: 100%|#####| 24/24 [02:21<00:00, 5.81s/it, loss=5.1456, top1_avg=9.44, top5_avg=24.98]
Epoch [10/120]: 100%|#| 625/625 [1:07:24<00:00, 6.44s/it, loss=3.1546, lr=0.6231, top1_avg=33.16, top1_val=33.15, top5_avg=58.67, top5_val=58.30]
Validation: 100%|####| 24/24 [02:19<00:00, 5.91s/it, loss=4.6928, top1_avg=13.57, top5_avg=31.87]
Epoch [11/120]: 100%|#| 625/625 [1:07:24<00:00, 6.12s/it, loss=3.1030, lr=0.6220, top1_avg=34.02, top1_val=36.08, top5_avg=59.62, top5_val=60.50]
Validation: 100%|####| 24/24 [02:20<00:00, 5.87s/it, loss=5.0949, top1_avg=10.75, top5_avg=26.40]
Epoch [12/120]: 100%|#| 625/625 [1:07:25<00:00, 6.28s/it, loss=3.0508, lr=0.6207, top1_avg=34.98, top1_val=35.55, top5_avg=60.59, top5_val=60.11]
Validation: 100%|####| 24/24 [02:22<00:00, 6.03s/it, loss=5.0900, top1_avg=10.55, top5_avg=25.98]

What should I pay attention to? I found you change order between bn(sn) and conv ,is it important?

why not add gn

sn learn importantance among bn,In,Ln,why not add gn

BackPropagation?

I am a beginner of PyTorch. Don't we need to write something like a backward function, like the last section in your paper https://arxiv.org/abs/1806.10779 (the appendix, equation(6)-(11) )? I thought you need to define backward function in the class SwitchNorm(nn.Module), or it automatically backward through the forward question?

Problems about Usage of SyncSN

Very nice work! I try to use your train code in face_recognition, but I met some problems. Frist, rank = int(os.environ['RANK']) and world_size = int(os.environ['WORLD_SIZE']) don't have values, so I added some code os.environ['RANK']=str(0), os.environ['WORLD_SIZE']=str(4). Is that right? Second, my code is stuck at dist.broadcast, it doesn't have any error message, just stuck. Could you give me some advice

Switchable Norm v.s. IBN-Net?

Is this switchable norm able to learn a hybrid normalization layer used in IBN-Net?

As the IBN-Net can improve the domain generalization performance. How could we train the switchable norm to improve the model's generalizability?

Thank you very much!

about SwitchNorm3d

The input corresponding to SwitchNorm3d is (N,C,D,H,W). How to deal with the newly added dimensions and how to mark the subscript of the derived formula, thank you.

caffe

Who has the caffe version?please give me a link. TANKS!!!

Nan error caused by “N X C X 1 X 1” input features

Update!
I have found the error is related to the input shape of the SW. When the input shape is NXCX1X1, the output of the SW will become NAN. The nn.BatchNorm can deal with NXCX1X1 correctly. I guess it is caused by that you compute the variance of the single value when you compute the variance of IN. Hope that can help you fix this bug.

Really cool work. I am trying to use the SN for segmentation tasks with your imagenet pretrained resnet50 (ResNet50v2+SN(8,32)-77.57.pth ) to initialize the backbone. I have added an decoder like ASPP module and the ASPP module contains several randomly initialized SN layer. I find the features before the PSP module is OK but become NAN after passing the SN module.

Really strange, I hope you could help me to solve this problem.

I am wondering the difference between resnet50v1+sn and resnet50v2+sn, is this problem related to the choice of the backbone network?

Here I provide the details of my usage for the ASPP module,

class SN_ASPPModule(nn.Module):
    """
    Reference: 
        Deeplabv3, combine the dilated convolution with the global average pooling.
    """
    def __init__(self, features, out_features=512, dilations=(12, 24, 36), using_moving_average=True):
        super(SN_ASPPModule, self).__init__()
        self.using_moving_average = using_moving_average

        self.conv1 = nn.Sequential(AdaptiveAvgPool2d((1,1)),
                                   nn.Conv2d(features, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv2 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv3 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[0], dilation=dilations[0], bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv4 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[1], dilation=dilations[1], bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv5 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[2], dilation=dilations[2], bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))

        self.bottleneck = nn.Sequential(
            nn.Conv2d(out_features * 5, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
            SwitchNorm(out_features, using_moving_average=self.using_moving_average),
            nn.Dropout2d(0.1)
            )

Undefined name 'model_urls' in ./models/resnet_sn.py

Each undefined name has the potential to raise NameError at runtime.

flake8 testing of https://github.com/switchablenorms/Switchable-Normalization on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./models/resnet_sn.py:159:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
                                                 ^
./models/resnet_sn.py:171:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))
                                                 ^
./models/resnet_sn.py:183:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
                                                 ^
./models/resnet_sn.py:195:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
                                                 ^
./models/resnet_sn.py:207:50: F821 undefined name 'model_urls'
        model.load_state_dict(model_zoo.load_url(model_urls['resnet152']))
                                                 ^
5     F821 undefined name 'model_urls'
5

Switchable Normalization ne

image
mean_weight and var_weight both need softmax ,which assigns value to IN, LN, and BN . I tried the version of TensorFlow, which is more time-consuming and computerized than just BN.
Am I right?
Thx!

Switch Norm 1d for 3D tensors

Hello,
Thank you for providing code implementation for your paper.

I am interested in trying your normalization in my current experiment which works on raw waveform and audio "style". It is thus of prime interest to adaptively modulate different feature normalizations and I hope your proposal would work good to my extent.

However, when I read your
Switchable-Normalization/devkit/ops/switchable_norm.py
the 1d normalization only applies to 2D tensors and the 2d normalization only applies to 4D tensors. Whereas pytorch implementations of BatchNorm1d and InstanceNorm1d applies to both 2D and 3D tensors.

If possible, how should I please apply your SwitchNorm1d to 3D tensors, as for instance the output of conv1d ?

thank you !

traing time?

Could you report your training time with SN and BN?

The value of weight in Figure 7?

When I using the code as
checkpoint = torch.load('ResNet50v2+SN(8,32)-77.57.pth')' for k, v in checkpoint.items(): print(k, ':', v)
I noticed that, for example,
'module.layer1.0.sn3.mean_weight': [-0.5125, 0.3186, 0.1940]
'module.layer1.0.sn3.var_weight': [-0.1849, -1.7146, 1.8994]

But, in the Fig.7 of the paper, the sum of importance weights is 1.
So, what's the relationship of these two kind of weights? I really confused with that.
Could you please give an explanation?
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.