Comments (6)
for those interested: with LR=100 I was able to reproduce the results:
Epoch: [99][4990/5005] Time 0.081 ( 0.214) Data 0.003 ( 0.047) Loss 1.7134e+00 (1.3751e+00) Acc@1 50.00 ( 67.64) Acc@5 78.12 ( 86.84)
Epoch: [99][5000/5005] Time 0.107 ( 0.214) Data 0.000 ( 0.047) Loss 8.4820e-01 (1.3749e+00) Acc@1 75.00 ( 67.65) Acc@5 90.62 ( 86.84)
Test: [ 0/196] Time 13.979 (13.979) Loss 7.4870e-01 (7.4870e-01) Acc@1 83.59 ( 83.59) Acc@5 94.14 ( 94.14)
Test: [ 10/196] Time 0.280 ( 2.259) Loss 1.2925e+00 (9.7772e-01) Acc@1 68.75 ( 75.39) Acc@5 91.80 ( 93.00)
Test: [ 20/196] Time 2.129 ( 1.771) Loss 1.0400e+00 (9.7490e-01) Acc@1 78.52 ( 75.73) Acc@5 89.06 ( 92.34)
Test: [ 30/196] Time 0.271 ( 1.565) Loss 1.0907e+00 (9.5482e-01) Acc@1 72.27 ( 76.32) Acc@5 92.19 ( 92.46)
Test: [ 40/196] Time 1.167 ( 1.478) Loss 1.0960e+00 (1.0593e+00) Acc@1 70.70 ( 73.10) Acc@5 94.92 ( 91.86)
Test: [ 50/196] Time 0.300 ( 1.447) Loss 7.4471e-01 (1.0644e+00) Acc@1 80.08 ( 72.65) Acc@5 96.09 ( 92.04)
Test: [ 60/196] Time 0.277 ( 1.412) Loss 1.3356e+00 (1.0653e+00) Acc@1 66.02 ( 72.53) Acc@5 89.84 ( 92.23)
Test: [ 70/196] Time 0.302 ( 1.466) Loss 9.6414e-01 (1.0376e+00) Acc@1 73.44 ( 73.15) Acc@5 92.97 ( 92.46)
Test: [ 80/196] Time 0.301 ( 1.427) Loss 1.7982e+00 (1.0572e+00) Acc@1 55.47 ( 72.81) Acc@5 81.25 ( 92.10)
Test: [ 90/196] Time 0.301 ( 1.425) Loss 2.2781e+00 (1.1174e+00) Acc@1 48.05 ( 71.63) Acc@5 75.78 ( 91.29)
Test: [100/196] Time 0.348 ( 1.383) Loss 1.8547e+00 (1.1743e+00) Acc@1 54.30 ( 70.44) Acc@5 81.25 ( 90.54)
Test: [110/196] Time 0.284 ( 1.394) Loss 1.2131e+00 (1.1956e+00) Acc@1 70.31 ( 70.07) Acc@5 87.50 ( 90.17)
Test: [120/196] Time 0.271 ( 1.391) Loss 1.6675e+00 (1.2134e+00) Acc@1 65.23 ( 69.85) Acc@5 82.42 ( 89.84)
Test: [130/196] Time 0.285 ( 1.385) Loss 1.0386e+00 (1.2471e+00) Acc@1 72.27 ( 69.05) Acc@5 92.97 ( 89.42)
Test: [140/196] Time 0.284 ( 1.353) Loss 1.4044e+00 (1.2719e+00) Acc@1 66.41 ( 68.58) Acc@5 88.67 ( 89.13)
Test: [150/196] Time 0.284 ( 1.360) Loss 1.5603e+00 (1.2957e+00) Acc@1 71.09 ( 68.23) Acc@5 81.64 ( 88.70)
Test: [160/196] Time 0.288 ( 1.344) Loss 1.1818e+00 (1.3140e+00) Acc@1 71.48 ( 68.00) Acc@5 89.06 ( 88.41)
Test: [170/196] Time 0.302 ( 1.360) Loss 8.9217e-01 (1.3329e+00) Acc@1 78.12 ( 67.58) Acc@5 94.14 ( 88.16)
Test: [180/196] Time 0.285 ( 1.346) Loss 1.4819e+00 (1.3459e+00) Acc@1 62.50 ( 67.30) Acc@5 88.28 ( 88.01)
Test: [190/196] Time 0.270 ( 1.365) Loss 1.2494e+00 (1.3437e+00) Acc@1 65.23 ( 67.35) Acc@5 92.58 ( 88.03)
* Acc@1 67.548 Acc@5 88.120
from simsiam.
Oh I see, for the released model, we did not search over different SGD lr. In order to reproduce the 4096 batch size LARS, you can also do gradient accumulation -- which is actually not that hard given that linear-eval is light and BN free.
from simsiam.
Did you refer to the provided log file and check if it matches?
from simsiam.
Hi, appreciate the quick response!
If the question is whether the above posted log is the one from the repo: no, it's mine. Comparing to the log from the repo is a bit difficult because it corresponds to bs 4096 and LARC optimization. Is there a log for SGD and bs 256?
from simsiam.
Oh I see, for SGD, you may want to search over different lr in order to get results that match LARS. You can try to increase or lower the lr.
from simsiam.
Yes, thank you, I'm aware of that :-). I was kindly asking for hints on the hyperparameter settings that in your case produced the reported results ;-).
from simsiam.
Related Issues (20)
- loss question HOT 4
- Single GPU Training HOT 3
- Loss descent slowly HOT 1
- Checkpoints with 50 epochs HOT 1
- [Problem] problem occured when trained on custom dataset HOT 4
- difference between byol in table19 and simsiam? HOT 1
- larger batch size with linear scale does not work
- checkpoint for epoch 800
- Slow convergence with SGD linear evaluation HOT 1
- SyncBatchnorm usage in main_lincls.py HOT 2
- About the projection and prediction head dimension configs
- Loss collapse during training HOT 1
- About the shape of image[0] and image[1] HOT 1
- Pre-trained weights for SimCLR, MoCov2, BYOL, SwaV HOT 1
- The value of loss function nn.CosineSimilarity is negative HOT 1
- self.encoder.fc in builder.py
- which vector is used in experiment (the vector atfer predictor or after projection)?
- Why are training and testing so slow HOT 2
- Could not find any class folder
- Loss increasing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simsiam.