Comments (2)
Hi!
There is a lot to say about this! :D
(This is still active research, so we haven't put much certain information out there so far, just because we were still working it out orselves)
Beforehand, I want to point out:
- We are working on re-writing the tutorial to include such things and better explain how to build a stable model.
- If you look at the following repository, this network does all the point I mention below, and is very stable, even with >100 coupling blocks: https://github.com/VLL-HD/IB-INN You should be able to just use the architecture directly, or just parts of it.
- Here is a very big INN that even gives good performance on ImageNet, and is still completely stable (just to show it's possible): https://github.com/VLL-HD/trustworthy_GCs
For the actual answers:
- Indeed, BatchNorm does increase the stability in most of our experiments
- The testing error problem also took as a long while to work out. What happens is that the running average kept by the pytorch batchNorm layers that is used when the network is set to
.eval()
is not accurate enough (especially because of how sensitive NFs are to shifting mean/std.
With the network in.train()
mode, the mean/std is computed for each batch, and the running average is ignored, so the problem doesn't occur there.
The way around it: for validation druing training, leave the model in.train()
mode (not perfect, but better than the unreliable numbers)
At test time, keeping the network fixed, reset the batchnorm running averages, set the momentum of the batchnorm layers to None (infinite average), and run the train dataset through for one or two epochs. Then, the test loss is correct.
You can find that in https://github.com/VLL-HD/IB-INN/blob/master/evaluation/__init__.py#L18 - initialization also plays a big role. There is an
AllInOneBlock
coupling block in FrEIA (since recently), that combines coupling, scaling and permutation in one easy to use block (the three things are almost always used together anyway, so it only slows things down having them separate). You can find the initialization here: https://github.com/VLL-HD/IB-INN/blob/master/inn_architecture.py#L30 (although note, the arguments to the AllInOneBlock have changed with the inclusion to FrEIA to be more understandable. The docstring should contain everything you need to know. Specifically, try to set the global_affine_init to something like 0.7, that stops the outputs from exploding.) - Gradient clipping (as for RNNs) can also help. I tend to get good results with
torch.nn.utils.clip_grad_norm_(parameters, 5.)
Feel free to re-open the issue if you are still having NaN troubles after that!
from freia.
Hi @ardizzone,
Thanks so much for the detailed reply. Interesting find on the testing error problem! It looks like this problem may be general to pytorch (https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561/21)
I spent a while exploring different regions of the parameter space (number of coupling layers, coefficient function network depth & width, learning rate) until understood when training was likely to diverge.
My general approach was to:
-
Find the smallest possible model capacity for the distribution being modeled. This is done by increasing the number of neurons in the coefficient network until testing error no longer improved. Generally, for a fixed number of neurons I found no significant impact on model performance when increasing the number of coupling layers (with smaller coefficient networks) vs. increasing the size of the coefficient networks while decreasing the number of coupling layers. I can't speak to training stability as a function of this trade-off
-
Once the architecture is set, coarsely increase the learning rate from 1e-4 incrementally until training diverges (1e-4, 2.5e-4, ...). Then increase the learning rate from the largest stable learning rate. For example, if training converged for 5e-4 but diverged at 7.5e-4, test 5e-4, 5.5e-4, 6e-4, ....
Cheers
from freia.
Related Issues (20)
- Nice repo!!
- Incoherence between version
- Tips of this repo says 1x1 convolutions should constitute every other, or every third coupling block. What is this based on? HOT 1
- Add Flow++ HOT 2
- Expect a detailed documentation of this library
- Bug in binned.py
- Error when exporting FrEIA library to ONNX
- Adapt MLP to FrEIA framework HOT 1
- Installation problem, versions HOT 1
- FrEIA.modules.Split incorrect completion of missing dimension when initialized with section_sizes Sequence
- Dimension error about the `SequenceINN` HOT 4
- New version release on PyPI
- Some examples in the documentation are missing HOT 3
- Error in the implementation of ActNormalization HOT 6
- [BUG] Nondeterministic
- [BUG] IndexError: tuple index out of range HOT 2
- ActNorm is incorrectly implemented HOT 4
- Example of GaussianMixtureModel
- RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/c10/cuda/CUDACachingAllocator.cpp":830 HOT 4
- Framework usage
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from freia.