Comments (6)
ADan:
Epoch gpu_mem box obj cls kps labels img_size
4/399 3.19G 0.05632 0.0556 0.01426 0.1152 129 416
Epoch gpu_mem box obj cls kps labels img_size
5/399 3.19G nan nan nan nan 107 416
SGD
Epoch gpu_mem box obj cls kps labels img_size
4/399 3.02G 0.05808 0.0563 0.01553 0.1285 129 416
Epoch gpu_mem box obj cls kps labels img_size
5/399 3.02G 0.05504 0.0543 0.0144 0.1113 107 416
Epoch gpu_mem box obj cls kps labels img_size
6/399 3.02G 0.05382 0.05239 0.0139 0.1092 380 416
from adan.
Hi, @xialuxi,
I suggest using a smaller LR of Adan to train.
BZW, it seems that most yoloVx models are trained by SGD.
The previous research may encounter some problems in using adaptive optimizers such as Adam to train yolo models. Many issues in the repo of yoloV7 mention that Adam may not provide good results, see WongKinYiu/yolov7#702 (comment) and WongKinYiu/yolov7#730 (comment).
So I suggest tuning the LR
and weight-decay
based on yoloVx with the official Adam's setting.
If the final result is unsatisfactory, it may cause by Adan or Adam, but because of the difference between the adaptive type optimizer and the SGD type optimizer.
If you still need further help with using Adan and parameter tuning, please don't hesitate to leave a message here. Or add my WeChat: xyxie_joy.
from adan.
Adam:
Epoch gpu_mem box obj cls kps labels img_size
3/399 3.22G 0.06394 0.05859 0.01641 0.1517 173 416
Epoch gpu_mem box obj cls kps labels img_size
4/399 3.22G 0.06176 0.05778 0.0158 0.1367 129 416
Epoch gpu_mem box obj cls kps labels img_size
5/399 3.22G 0.05997 0.05672 0.01512 0.1282 107 416
Epoch gpu_mem box obj cls kps labels img_size
6/399 3.22G 0.05835 0.05481 0.01452 0.1185 112 416
from adan.
Dear @xialuxi,
What is your hyper-parameter of Adan and Adam? Could you please paste them here?
So I may provide a more reasonable LR, Wd, or clip for Adan.
Best
from adan.
SGD:
lr = 0.01, momentum=0.937, weight_decay=0.0005, nesterov=True
Adam:
lr = 0.01, betas=(0.937, 0.999), weight_decay=0.0005
Adan:
lr = 0.01, betas=(0.98, 0.92, 0.99), eps=1.0e-08, weight_decay=0.0005
from adan.
@xialuxi you may try lr = 1e-3, betas=(0.98, 0.92, 0.99), eps=1.0e-08, weight_decay=0.02
for Adan.
As usual, the LR for the adaptive optimizer should be 10-100x times smaller than the LR used in SGD. Thus we suggest lr = 1e-3
for Adan.
The reason that we set wd=0.02
is due to the use of decoupled weight decay.
from adan.
Related Issues (20)
- Suggestions for applying to visual dense prediction tasks. HOT 6
- HumanEval shall not be used for training. HOT 1
- OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. HOT 12
- Some questions about learning rate. HOT 7
- GPU type and GPU nums and total training time on Transformer-XL, GPT-2 HOT 2
- processing data for BERT experiment HOT 4
- module 'fused_adan' has no attribute 'adan_multi_tensor' HOT 1
- Gradient clipping option in DeepSpeed HOT 1
- Deepspeed Integration HOT 4
- Restarting strategy HOT 4
- Adan相比于SGD在前 74 epochs保持领先,但是后续收敛变慢,我改如何调整lr等超参数? HOT 2
- Concrete weight decay configuration for GPT-2 pretraining HOT 1
- How to implement Adan optimizer in Yolov7? HOT 1
- 在我的cnn模型中,lr=0.01时,在20-30epoch,map可以提升的很快但是后续会成为NAN。但是如果使用0.001不会直接为NAN,但是效果不好,请问这个现象代表着什么问题?谢谢! HOT 4
- Settings for instruction-tuning HOT 2
- About the pre-trained model HOT 1
- RuntimeError: The detected CUDA version (12.2) mismatches the version that was used to compile PyTorch (11.8). HOT 2
- Install Error HOT 4
- 如何设置Adan学习率 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adan.