Comments (5)
Hi @roderickObrist, good to hear SWA is working well for you :) To make the figures we treat all the parameters of the network as one large (say 10 million-dimensional) vector. It includes both the biases and the weights. Say we have a total of D
parameters. We treat the whole parameter space as just R^D
. For each visualization we then pick three vectors v1
, v2
and v3
in this R^D
parameter space. These typically correspond to the weights of some networks, like SGD iterates from different iterations. Then, we construct the unique 2-d plane (affine subspace) that passes through these three vectors. We then plot the loss restricted to this 2-d subspace.
To answer your questions:
- Weights include both weights and biases, and they are not from a single layer. This is the full vector of all the network's parameters.
- We have a public implementation of a very similar visualization for our other paper here: https://github.com/timgaripov/dnn-mode-connectivity/blob/master/plane.py. I believe you would need to change this part here https://github.com/timgaripov/dnn-mode-connectivity/blob/master/plane.py#L96-L101, and load the weights of three networks
v1
,v2
,v3
in thew
list.
from swa.
Thank you kindly, I will implement this in my own project over the next few days.
from swa.
Please see footnote 1 on page 2 of the camera-ready version of the paper:
http://auai.org/uai2018/proceedings/papers/313.pdf
There we tried to clarify the exact procedure for making the loss and test error surface visualizations.
from swa.
I will close the issue for now, but I will be happy to answer if you will have further questions about those figures.
from swa.
@izmailovpavel Hi and thank you for the great work, I've been implementing SWA in my research project and the results are great. I just have a few questions regarding the illustrations.
- Are the weight vectors literally the weights (not biases) from a single linear layer of a network or are they the concatenation of the entire model?
- Would you be comfortable providing the snippet of code you used to make the figures? (Does not need to be functional/polished or commented). Just so I can double check my own implementation.
Thank you for what you have done for the community.
from swa.
Related Issues (20)
- About bn_update HOT 5
- About conv_init(m)
- CUDA out of memory HOT 1
- Unable to re-produce SGD numbers HOT 7
- [ Question ] Isn't the mid-training evaluation of the SWA performance corrupting the batchnorm running averages ? HOT 4
- SWA compared to Exponential Moving Average (EMA) HOT 4
- Codes for figure 1,3,4, and 5 HOT 1
- Isn't the weight learnt from each minibatchs are supposed to be averaged? HOT 5
- release code for training imagenet HOT 1
- Finetuning model with SWA HOT 1
- How many weights instances stored in memory simultaneously? HOT 1
- SWA with distributed training HOT 1
- SWA with Torchbearer HOT 3
- Figure 3 plot HOT 1
- About preresnet results HOT 2
- Can we use Adam or other optimizer instead of SGD to train the network? HOT 3
- About calculating parameters of BN HOT 1
- performance drop due to batch norm params recalculation HOT 1
- How about sgdr? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from swa.