Hello, I'm trying to reproduce the ImageNet results with SGD in a DD

Failing on Reproducing ImageNet Linear Classification Results with SGD about simsiam HOT 6 CLOSED

facebookresearch commented on June 30, 2024

Failing on Reproducing ImageNet Linear Classification Results with SGD

from simsiam.

Comments (6)

ferreirafabio commented on June 30, 2024 4

for those interested: with LR=100 I was able to reproduce the results:

Epoch: [99][4990/5005]  Time  0.081 ( 0.214)    Data  0.003 ( 0.047)    Loss 1.7134e+00 (1.3751e+00)    Acc@1  50.00 ( 67.64)   Acc@5  78.12 ( 86.84)
Epoch: [99][5000/5005]  Time  0.107 ( 0.214)    Data  0.000 ( 0.047)    Loss 8.4820e-01 (1.3749e+00)    Acc@1  75.00 ( 67.65)   Acc@5  90.62 ( 86.84)
Test: [  0/196] Time 13.979 (13.979)    Loss 7.4870e-01 (7.4870e-01)    Acc@1  83.59 ( 83.59)   Acc@5  94.14 ( 94.14)
Test: [ 10/196] Time  0.280 ( 2.259)    Loss 1.2925e+00 (9.7772e-01)    Acc@1  68.75 ( 75.39)   Acc@5  91.80 ( 93.00)
Test: [ 20/196] Time  2.129 ( 1.771)    Loss 1.0400e+00 (9.7490e-01)    Acc@1  78.52 ( 75.73)   Acc@5  89.06 ( 92.34)
Test: [ 30/196] Time  0.271 ( 1.565)    Loss 1.0907e+00 (9.5482e-01)    Acc@1  72.27 ( 76.32)   Acc@5  92.19 ( 92.46)
Test: [ 40/196] Time  1.167 ( 1.478)    Loss 1.0960e+00 (1.0593e+00)    Acc@1  70.70 ( 73.10)   Acc@5  94.92 ( 91.86)
Test: [ 50/196] Time  0.300 ( 1.447)    Loss 7.4471e-01 (1.0644e+00)    Acc@1  80.08 ( 72.65)   Acc@5  96.09 ( 92.04)
Test: [ 60/196] Time  0.277 ( 1.412)    Loss 1.3356e+00 (1.0653e+00)    Acc@1  66.02 ( 72.53)   Acc@5  89.84 ( 92.23)
Test: [ 70/196] Time  0.302 ( 1.466)    Loss 9.6414e-01 (1.0376e+00)    Acc@1  73.44 ( 73.15)   Acc@5  92.97 ( 92.46)
Test: [ 80/196] Time  0.301 ( 1.427)    Loss 1.7982e+00 (1.0572e+00)    Acc@1  55.47 ( 72.81)   Acc@5  81.25 ( 92.10)
Test: [ 90/196] Time  0.301 ( 1.425)    Loss 2.2781e+00 (1.1174e+00)    Acc@1  48.05 ( 71.63)   Acc@5  75.78 ( 91.29)
Test: [100/196] Time  0.348 ( 1.383)    Loss 1.8547e+00 (1.1743e+00)    Acc@1  54.30 ( 70.44)   Acc@5  81.25 ( 90.54)
Test: [110/196] Time  0.284 ( 1.394)    Loss 1.2131e+00 (1.1956e+00)    Acc@1  70.31 ( 70.07)   Acc@5  87.50 ( 90.17)
Test: [120/196] Time  0.271 ( 1.391)    Loss 1.6675e+00 (1.2134e+00)    Acc@1  65.23 ( 69.85)   Acc@5  82.42 ( 89.84)
Test: [130/196] Time  0.285 ( 1.385)    Loss 1.0386e+00 (1.2471e+00)    Acc@1  72.27 ( 69.05)   Acc@5  92.97 ( 89.42)
Test: [140/196] Time  0.284 ( 1.353)    Loss 1.4044e+00 (1.2719e+00)    Acc@1  66.41 ( 68.58)   Acc@5  88.67 ( 89.13)
Test: [150/196] Time  0.284 ( 1.360)    Loss 1.5603e+00 (1.2957e+00)    Acc@1  71.09 ( 68.23)   Acc@5  81.64 ( 88.70)
Test: [160/196] Time  0.288 ( 1.344)    Loss 1.1818e+00 (1.3140e+00)    Acc@1  71.48 ( 68.00)   Acc@5  89.06 ( 88.41)
Test: [170/196] Time  0.302 ( 1.360)    Loss 8.9217e-01 (1.3329e+00)    Acc@1  78.12 ( 67.58)   Acc@5  94.14 ( 88.16)
Test: [180/196] Time  0.285 ( 1.346)    Loss 1.4819e+00 (1.3459e+00)    Acc@1  62.50 ( 67.30)   Acc@5  88.28 ( 88.01)
Test: [190/196] Time  0.270 ( 1.365)    Loss 1.2494e+00 (1.3437e+00)    Acc@1  65.23 ( 67.35)   Acc@5  92.58 ( 88.03)
 * Acc@1 67.548 Acc@5 88.120

from simsiam.

endernewton commented on June 30, 2024 1

Oh I see, for the released model, we did not search over different SGD lr. In order to reproduce the 4096 batch size LARS, you can also do gradient accumulation -- which is actually not that hard given that linear-eval is light and BN free.

from simsiam.

endernewton commented on June 30, 2024

Did you refer to the provided log file and check if it matches?

from simsiam.

ferreirafabio commented on June 30, 2024

Hi, appreciate the quick response!

If the question is whether the above posted log is the one from the repo: no, it's mine. Comparing to the log from the repo is a bit difficult because it corresponds to bs 4096 and LARC optimization. Is there a log for SGD and bs 256?

from simsiam.

endernewton commented on June 30, 2024

Oh I see, for SGD, you may want to search over different lr in order to get results that match LARS. You can try to increase or lower the lr.

from simsiam.

ferreirafabio commented on June 30, 2024

Yes, thank you, I'm aware of that :-). I was kindly asking for hints on the hyperparameter settings that in your case produced the reported results ;-).

from simsiam.

Failing on Reproducing ImageNet Linear Classification Results with SGD about simsiam HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs