Training yolov5 model appears nan about adan HOT 6 CLOSED

sail-sg commented on May 22, 2024

Training yolov5 model appears nan

from adan.

Comments (6)

xialuxi commented on May 22, 2024

ADan:
Epoch gpu_mem box obj cls kps labels img_size
4/399 3.19G 0.05632 0.0556 0.01426 0.1152 129 416

Epoch gpu_mem box obj cls kps labels img_size
5/399 3.19G nan nan nan nan 107 416

SGD
Epoch gpu_mem box obj cls kps labels img_size
4/399 3.02G 0.05808 0.0563 0.01553 0.1285 129 416

Epoch gpu_mem box obj cls kps labels img_size
5/399 3.02G 0.05504 0.0543 0.0144 0.1113 107 416

Epoch gpu_mem box obj cls kps labels img_size
6/399 3.02G 0.05382 0.05239 0.0139 0.1092 380 416

from adan.

XingyuXie commented on May 22, 2024

Hi, @xialuxi,
I suggest using a smaller LR of Adan to train.
BZW, it seems that most yoloVx models are trained by SGD.
The previous research may encounter some problems in using adaptive optimizers such as Adam to train yolo models. Many issues in the repo of yoloV7 mention that Adam may not provide good results, see WongKinYiu/yolov7#702 (comment) and WongKinYiu/yolov7#730 (comment).

So I suggest tuning the LR and weight-decay based on yoloVx with the official Adam's setting.
If the final result is unsatisfactory, it may cause by Adan or Adam, but because of the difference between the adaptive type optimizer and the SGD type optimizer.

If you still need further help with using Adan and parameter tuning, please don't hesitate to leave a message here. Or add my WeChat: xyxie_joy.

from adan.

xialuxi commented on May 22, 2024

Adam:
Epoch gpu_mem box obj cls kps labels img_size
3/399 3.22G 0.06394 0.05859 0.01641 0.1517 173 416

Epoch gpu_mem box obj cls kps labels img_size
4/399 3.22G 0.06176 0.05778 0.0158 0.1367 129 416

Epoch gpu_mem box obj cls kps labels img_size
5/399 3.22G 0.05997 0.05672 0.01512 0.1282 107 416

Epoch gpu_mem box obj cls kps labels img_size
6/399 3.22G 0.05835 0.05481 0.01452 0.1185 112 416

from adan.

XingyuXie commented on May 22, 2024

Dear @xialuxi,
What is your hyper-parameter of Adan and Adam? Could you please paste them here?
So I may provide a more reasonable LR, Wd, or clip for Adan.

Best

from adan.

xialuxi commented on May 22, 2024

SGD:
lr = 0.01, momentum=0.937, weight_decay=0.0005, nesterov=True
Adam:
lr = 0.01, betas=(0.937, 0.999), weight_decay=0.0005
Adan:
lr = 0.01, betas=(0.98, 0.92, 0.99), eps=1.0e-08, weight_decay=0.0005

from adan.

XingyuXie commented on May 22, 2024

@xialuxi you may try lr = 1e-3, betas=(0.98, 0.92, 0.99), eps=1.0e-08, weight_decay=0.02 for Adan.
As usual, the LR for the adaptive optimizer should be 10-100x times smaller than the LR used in SGD. Thus we suggest lr = 1e-3 for Adan.
The reason that we set wd=0.02 is due to the use of decoupled weight decay.

from adan.

Recommend Projects

Training yolov5 model appears nan about adan HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs