Light

Problems about Usage of SyncSN about switchable-normalization HOT 12 OPEN

switchablenorms commented on May 27, 2024

Problems about Usage of SyncSN

from switchable-normalization.

Comments (12)

JiaminRen commented on May 27, 2024

What problem? I will give a example soon.

from switchable-normalization.

stillwaterman commented on May 27, 2024

@JiaminRen my code is stuck at dist.broadcast, no error message, backend is nccl. Do you test the train code or what configuration I didn’t do

from switchable-normalization.

JiaminRen commented on May 27, 2024

which task did you test? imagenet or face recognition?

from switchable-normalization.

stillwaterman commented on May 27, 2024

@JiaminRen I just tried to imitate your train code in face recognition to use SyncSN in my code, but
I didn't succeed. I met two problem, frist, rank = int(os.environ['RANK']) and world_size = int(os.environ['WORLD_SIZE']) don't have values, so I added some code os.environ['RANK']=str(0), os.environ['WORLD_SIZE']=str(4), second is dist.broadcast

from switchable-normalization.

JiaminRen commented on May 27, 2024

Have you changed any code? Just running the script face_recognition/train.sh will be ok.

from switchable-normalization.

stillwaterman commented on May 27, 2024

@JiaminRen I quickly test the face_recognition train.py, unfortunately I met the same problems. I think maybe some system configurations I missed.

from switchable-normalization.

stillwaterman commented on May 27, 2024

@JiaminRen my system is ubuntu18.04 and I use ananconda to install pytorch, program is stuck at dist.broadcast

from switchable-normalization.

JiaminRen commented on May 27, 2024

This is a distributed framework, and it should be run on multi-gpus by using torch.distributed.launch.

from switchable-normalization.

stillwaterman commented on May 27, 2024

Thanks, torch.distributed.launch can solve problems. But sync way consumes a lot of GPU memory, always out of memory

from switchable-normalization.

stillwaterman commented on May 27, 2024

Sorry to bother you again. Actually, when I was using SyncSN, I got some different errors. I tried to imitate the way you used in train.py, but my model outputs NaNs, which will not happened in SN. Another error is subprocess.CalledProcessError: Command returned non-zero exit 1. Do you have any idea? Thanks

from switchable-normalization.

Related Issues (20)

how to mix sn and bn HOT 2
Switchable Normalization ne
caffe
traing time?
why not add gn HOT 2
cannot apply switchnorm1d to 3D input? HOT 1
where can we find the meta files in the updated loader? HOT 1
ResNet-50 uses Bottleneck Block HOT 2
Switch Norm 1d for 3D tensors HOT 1
Difference between resnetv1 and resnetv2? HOT 1
Switchable Norm v.s. IBN-Net?
The value of weight in Figure 7? HOT 1
about SwitchNorm3d HOT 1
Could you share the resnet-101 model pretrained on Imagenet? HOT 16
Nan error caused by “N X C X 1 X 1” input features HOT 4
I complete the SN by Keras. welcome to advice HOT 2
BackPropagation？
when I use SN instead of BN, there is a big difference between val acc and train acc HOT 1
Failed to access ResNet101v1+SN (8,32) HOT 1

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs