sail-sg / adan Goto Github PK
View Code? Open in Web Editor NEWAdan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
License: Apache License 2.0
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
License: Apache License 2.0
Thank you for your impressive work.
I would like to ask if this work applied to unsupervised 3DGAN has the same improvement as other tasks?
According to your paper, you used adan with β1 = 0.02, β2 = 0.01, and β3 = 0.01 when fine-tuning Bert. But in your config file, they are all 0.9x like here. Which is right?
Hi,
Thank you very much for your brilliant work on Adan!
And from you paper, it said Adan should get a lower loss (both Train and test) than Adamw according to Figure 1. However, I got a higher training loss with Adan than AdamW in ViT-H:
Steps | Adamw_train_loss | Adan_train_loss |
---|---|---|
200 | 6.9077 | 6.9077 |
400 | 6.9074 | 6.9075 |
600 | 6.9068 | 6.9073 |
800 | 6.9061 | 6.907 |
1000 | 6.905 | 6.9064 |
1200 | 6.9036 | 6.9056 |
1400 | 6.9014 | 6.9044 |
1600 | 6.899 | 6.9028 |
1800 | 6.8953 | 6.9003 |
2000 | 6.8911 | 6.8971 |
2200 | 6.8848 | 6.8929 |
2400 | 6.8789 | 6.8893 |
2600 | 6.8699 | 6.8843 |
2800 | 6.8626 | 6.8805 |
3000 | 6.8528 | 6.8744 |
3200 | 6.8402 | 6.868 |
3400 | 6.8293 | 6.862 |
3600 | 6.8172 | 6.8547 |
3800 | 6.7989 | 6.8465 |
4000 | 6.7913 | 6.8405 |
I used the same HPs as AdamW and only changed beta from (0.9, 0.999) to (0.9, 0.92, 0.999).
I only trained for few steps to see the trend. But it seems the loss gap from AdamW is quite big, should I change other HPs to better using Adan? How can I get a lower Loss than AdamW?
I noticed that Adan prefers a large batch size in Vision tasks, should we using a larger batch size?
Or should I train with more steps to see the trend?
Thank you!
Thank you for your brilliant work.
I want to ask some questions about Adan's learning rate.
Does Adan use learning rate decay in the paper?
Is the Adan optimizer sensitive to the initial learning rate?
How to set the learning rate compared with adam under the same task conditions?
Thank you!
Could you please release the pre-trained ViT-S based on MAE?
Can I use Adan optimizer in Yolov7, if yes, what are the steps to implement this.
Hello! Thank you for your work. Now I have a problem.
I don't know how to solve it
Traceback (most recent call last):
File "/home/anaconda/envs/main/Lib/python5.8/site-packages/tonch/optim/optimizer.py" line 113,in wrapperreturn func(*args,**kwargs)
File "/home/anaconda/envs/ main/Lit/python3.8/site-packages/torch/autogpad/gnad_mode.py",line 27,in decorate_contextreturn func(*args,**kwargs)
File "/home/main/adan.py", line 121,in step
beta1, beta2, beta3 = group [ ' betas ']
valueError: not enough values to unpack (expected 3, got 2)
why there is no sgd-style implementation or experiments?
(lemma 1 in the paper)
Thank you for your impressive work. I have some questions in your adan.py
about step
function.
In line 179-180, that is:
for p, copy_grad in zip(group['params'], copy_grads):
self.state[p]['pre_grad'] = copy_grad
It seems that you want to save the corresponding pre_grad
. But I have the following bug:
I think this is because the former contains all parameters, while the latter only contains parameters with gradient. So I made the following changes:
for p, copy_grad in zip(params_with_grad, copy_grads):
self.state[p]['pre_grad'] = copy_grad
With this modification, I can run normally. Do you think what problems I have encountered and that this modification is correct? @XingyuXie
Hi there,
I'm just wondering about the no_prox
setting.
First of all, does it stand for "approximation"?
In the paper, Algorithm 1, line 7 corresponds to no_prox=True
—
why is the default setting in this repo False
? Why do you include this option at all?
Were the experiments in the paper done as the algorithm states, or with no_prox=True
?
Again, I really appreciate the work! Am just struggling with this detail.
Hi, very interesting work!
The only problem i see is that your optimizer is slower that sgd/adamw which may discourage some people from using it. Do you plan adding an implementation using torch._foreach...
functions? Examples could be seen in torch.optim. This would significantly speed-up your optimizer while having literally no drawbacks.
If you're interested i could take a look and implement this myself, but it would be in 1-2 weeks when i'm less busy
Step 2 of Usage in the documentation says
from adam import Adan
I was wondering if it you meant
from adan import Adan
Hi, Adan是一个性能十分优秀的优化器,谢谢你们的工作。
但我最近在尝试用Adan进行指令微调时,发现loss曲线很漂亮,但是下游任务表现(GSM-8k)不如预期。
同样的数据处理和评测,AdamW大概9.63,Adan只有5.08左右。
AdamW超参数:weight_decay 0.01, lr 2e-5
Adan超参数:weight_decay 0.02,按照repo的建议lr尝试了2e-4 1e-4, GSM8k都比较低
lr scheduler都是3%升到最高然后下降到0
使用的代码:
from adan import Adan
optimizer = Adan(model.parameters(), lr=args.lr, weight_decay=0.02, foreach=True, fused=True)
想知道有没有一些对指令微调的超参设置建议?
How to install without CUDA_HOME environment variable? For example https://github.com/mapillary/inplace_abn don`t ask about CUDA_HOME.
xxx@xxx:~$ python3 -m pip install git+https://github.com/sail-sg/Adan.git
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/sail-sg/Adan.git
Cloning https://github.com/sail-sg/Adan.git to /tmp/pip-req-build-zs78qhzq
Running command git clone --filter=blob:none --quiet https://github.com/sail-sg/Adan.git /tmp/pip-req-build-zs78qhzq
Resolved https://github.com/sail-sg/Adan.git to commit 8f559205f67e565b3bea09554354d69000bd819c
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-req-build-zs78qhzq/setup.py", line 5, in <module>
cuda_extension = CUDAExtension(
File "/home/xxx/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1047, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/home/xxx/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/home/xxx/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2230, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
您好请问您是否有研究过将Adan用于Diffusion模型训练,其学习率应该如何设置,可否与使用AdamW的学习率一样?
Dear authors:
According to the README.md
of this amazing project, the weight_decay
param should be 0.02
, while in the configuration file attached in #32, the WD
seems to be 0.05
. Also, only beta3
is explicitly specified in the aforementioned configuration file, I can only inspect from https://github.com/sail-sg/Adan/blob/main/gpt2/README.md that
beta1 = 0.98
beta2 = 0.92
However, weight_decay=0.02
together with the other hyperparams above yields an inferior val loss curve compared with (that of the AdamW baseline)[https://github.com/karpathy/nanoGPT/blob/master/config/train_gpt2.py]. Thus, do you have any suggestion about the hyperparams I mentioned? Thanks!
HumanEval is a evaluation dataset, you shouldn't train on it and evaluate on exactly the same dataset.
Instead, you can use the github part in the Pile, or other coding source data for training. Before training, make sure the training set doesn't contain HumanEval to avoid probable data leakage.
Hi, i try Adan on a keypoints task, i got error like this:
./aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [96,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [97,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [2,0,0], thread: [32,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [2,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` fathread: [51,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [52,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [53,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [54,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [55,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [56,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [57,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [58,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [59,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [3,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [94,0,0] Assertion `input_val >= zero && input_val <= one` failed.
../aten/src/ATen/native/cuda/Loss.cu:129: operator(): block: [0,0,0], thread: [95,0,0] Assertion `input_val >= zero && input_val <= one` failed.
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
my config:
BASE_LR: 0.05 # maybe 0.012?
STEPS: (40000, 65000, 70000, 85000) # step point need to carefully check
WARMUP_FACTOR: 0.001
# WARMUP_ITERS: 1200
WARMUP_ITERS: 3500
MAX_ITER: 900000
# LR_SCHEDULER_NAME: "WarmupCosineLR"
LR_SCHEDULER_NAME: "WarmupMultiStepLR"
WEIGHT_DECAY: 0.02
MOMENTUM: 0.9
BACKBONE_MULTIPLIER: 0.9
OPTIMIZER: "Adan"
this is on detectron2, config on 8 GPU
why does it happen?
Hi authors,
Building wheels for collected packages: adan Building wheel for adan (setup.py) ... error error: subprocess-exited-with-error × python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [54 lines of output] running bdist_wheel /root/miniconda3/envs/neurips24/lib/python3.8/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_py creating build creating build/lib.linux-x86_64-cpython-38 copying adan.py -> build/lib.linux-x86_64-cpython-38 running build_ext Traceback (most recent call last): File "", line 2, in File "", line 34, in File "/tmp/pip-req-build-wcs6gasc/setup.py", line 20, in setup( File "/root/miniconda3/envs/xxx/lib/python3.8/site-packages/setuptools/init.py", line 103, in setup return distutils.core.setup(**attrs) File "/root/miniconda3/envs/xxx/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) ...
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda)) RuntimeError: The detected CUDA version (12.2) mismatches the version that was used to compile PyTorch (11.8). Please make sure to use the same CUDA versions. [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for adan Running setup.py clean for adan Failed to build adan ERROR: Could not build wheels for adan, which is required to install pyproject.toml-based projects
Looking at arxiv version. In Appendix C in the last two lines of the Eq. 10 and the first line of the following update rule: \theta
in the last term should have index k-1
instead of k
.
(Not sure if this is the appropriate place to report paper typos, please tell me if there is a more sutable one).
Hello!
I think I found a bug in the Adan optimizer, which affects embedding tables.
I implemented Adan optimzier in Tensorflow 2. You could find the implementation here
I wanted to keep the implementation as close to the original code as possible. However, there are different approaches for updating "sparse" tensors in TensorFlow and PyTorch. An example of a "sparse" tensor is an embedding matrix. Pytorch treats "sparse" data as if it was dense. TensorFlow has two functions for making updates - _resource_apply_dense
for dense and _resource_apply_sparse
for "sparse".
I decided to test the correctness of my implementation using the following logic:
tf_adan/test_adan_*.py
)I noticed that loss history and weights after optimization is the same for dense parameters. However, my implementation shows a better loss for embedding params weights after optimization isn't the same. It's especially noticeable in cases when the batch contains only a few possible categories. For example, categorical features have 2k unique values, while the batch size equals 100:
I think the source of the bug is the following:
Line 130 in d864647
As I understand, prev_grad for all "new" gradients on step>1 won't be replaced with the current gradient.
I'm unsure if it's a bug in your implementation or in mine. I also tested Adam optimizer in tf and torch, see:
Losses for Adam optimizers in tf/torch are almost the same.
What do you think? Looking forward to your thoughts.
Is there a TensorFlow/Keras implementation of Adan? If no official version, do you know of any third-party implementation? Or alternatively, how many lines would you expect an implementation to have? (If not much I may do it myself and ask for your review if you have time.)
Hey guys, I had some problems when I installed FusedAdan.
The information is below here. It reminds me that I don't have nvcc, but actually I have. Please help me.
(MDT) root@ubuntu20:~/Adan# pip install .
Processing /root/Adan
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from adan==0.0.2) (2.2.1+cu118)
Requirement already satisfied: filelock in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (3.9.0)
Requirement already satisfied: typing-extensions>=4.8.0 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (4.8.0)
Requirement already satisfied: sympy in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (1.12)
Requirement already satisfied: networkx in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (3.2.1)
Requirement already satisfied: jinja2 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (3.1.2)
Requirement already satisfied: fsspec in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (2024.2.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.8.89 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (11.8.89)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.8.89 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (11.8.89)
Requirement already satisfied: nvidia-cuda-cupti-cu11==11.8.87 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (11.8.87)
Requirement already satisfied: nvidia-cudnn-cu11==8.7.0.84 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (8.7.0.84)
Requirement already satisfied: nvidia-cublas-cu11==11.11.3.6 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (11.11.3.6)
Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (10.9.0.58)
Requirement already satisfied: nvidia-curand-cu11==10.3.0.86 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (10.3.0.86)
Requirement already satisfied: nvidia-cusolver-cu11==11.4.1.48 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (11.4.1.48)
Requirement already satisfied: nvidia-cusparse-cu11==11.7.5.86 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (11.7.5.86)
Requirement already satisfied: nvidia-nccl-cu11==2.19.3 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (2.19.3)
Requirement already satisfied: nvidia-nvtx-cu11==11.8.86 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (11.8.86)
Requirement already satisfied: triton==2.2.0 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from torch->adan==0.0.2) (2.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from jinja2->torch->adan==0.0.2) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /home/vipuser/anaconda3/envs/MDT/lib/python3.10/site-packages (from sympy->torch->adan==0.0.2) (1.3.0)
Building wheels for collected packages: adan
Building wheel for adan (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [7 lines of output]
running bdist_wheel
running build
running build_py
creating build/lib.linux-x86_64-cpython-310
copying adan.py -> build/lib.linux-x86_64-cpython-310
running build_ext
error: [Errno 2] No such file or directory: ':/usr/local/cuda-11.8/bin/nvcc'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for adan
Running setup.py clean for adan
Failed to build adan
ERROR: Could not build wheels for adan, which is required to install pyproject.toml-based projects
The following steps are modified from Fairseq-Roberta. For completeness, we list some key steps here.
I would like to ask why you modified the dataset settings? In the original fairseq, it seems we can just download the raw data.
https://github.com/sail-sg/Adan/tree/main/NLP/BERT#ii-generate-raw-data
Can you share the code for generating raw code?
Hi authors, I found there is an option called "gradient_clipping"
in DeepSpeed's configuration options, which seems to be a clipping-by-norm method, too. Does this option has any potential interaction with the max_grad_norm
param of Adan?
Hey, the repository does not implement the momentum restarting strategy from what I can tell.
If this is something you still have available, would you be so kind to add it in here? It would be super great to optimize Adan training further. :)
Hi there,
L6
in Algorithm 1), but in the code, it is outside of the square root. Could you expand on the reason for this?
Thanks for the inspiring work! However the TASK2PATH
paths in download_glue_data.py
under directory ./NLP/BERT/
seems to have denied permission to access. e.g. "https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FCoLA.zip?alt=media&token=46d5e637-3411-4188-bc44-5809b5bfb5f4"
Hi! Thank you for sharing your code.
I would like to know for each Transformer-XL, GPT-2 settings.
I saw logs, but I didn't figure out the exact number
https://github.com/sail-sg/Adan/tree/main/gpt2#results-and-logs-on-gpt2-345m
https://github.com/sail-sg/Adan/blob/main/gpt2/pretrain.sh
https://github.com/sail-sg/Adan/tree/main/NLP/Transformer-XL/exp_results
Thank you!
Hi~Thanks for your excellent work. Adan optimzier has rechived great success in my different experiments.
However, I really want any suggestions for integrating Adan with deepspeed.
I tried using the ds_config with adamw and simply replacing adamw with adan (of course, I adjusted the learning rate and weight decay correspondingly), but it's pretty slow.
Thank you in advance.
HI~Thanks for you excitring work. I would like to know the performance of Adan for visual dense prediction tasks. I notice you mention that Adan is suitable for large batchsize. So I wondered if it would also work better for visual dense prediction tasks, which are usually not possible with a large batchsize. I have tried Adan in several tasks, but the results are similar or even inferior to its sgd/adamw counterparts. I have followed the best practices you mention in the paper and repo and was wondering if you have done similar experiments or if you have suggestions for tuning the parameters.
Thanks!
Best.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.