GithubHelp home page GithubHelp logo

microsoft / simmim Goto Github PK

View Code? Open in Web Editor NEW
870.0 22.0 80.0 282 KB

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Home Page: https://arxiv.org/abs/2111.09886

License: MIT License

Python 100.00%
self-supervised-learning masked-image-modeling image-classification swin-transformer

simmim's Introduction

SimMIM

By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*.

This repo is the official implementation of "SimMIM: A Simple Framework for Masked Image Modeling".

Updates

09/29/2022

SimMIM was merged to Swin Transformer repo on GitHub.

03/02/2022

SimMIM got accepted by CVPR 2022. SimMIM was used in "Swin Transformer V2" to alleviate the data hungry problem for large-scale vision model training.

12/09/2021

Initial commits:

  1. Pre-trained and fine-tuned models on ImageNet-1K (Swin Base, Swin Large, and ViT Base) are provided.
  2. The supported code for ImageNet-1K pre-training and fine-tuneing is provided.

Introduction

SimMIM is initially described in arxiv, which serves as a simple framework for masked image modeling. From systematically study, we find that simple designs of each component have revealed very strong representation learning performance: 1) random masking of the input image with a moderately large masked patch size (e.g., 32) makes a strong pre-text task; 2) predicting raw pixels of RGB values by direct regression performs no worse than the patch classification approaches with complex designs; 3) the prediction head can be as light as a linear layer, with no worse performance than heavier ones.

Main Results on ImageNet

Swin Transformer

ImageNet-1K Pre-trained and Fine-tuned Models

name pre-train epochs pre-train resolution fine-tune resolution acc@1 pre-trained model fine-tuned model
Swin-Base 100 192x192 192x192 82.8 google/config google/config
Swin-Base 100 192x192 224x224 83.5 google/config google/config
Swin-Base 800 192x192 224x224 84.0 google/config google/config
Swin-Large 800 192x192 224x224 85.4 google/config google/config
SwinV2-Huge 800 192x192 224x224 85.7 / /
SwinV2-Huge 800 192x192 512x512 87.1 / /

Vision Transformer

ImageNet-1K Pre-trained and Fine-tuned Models

name pre-train epochs pre-train resolution fine-tune resolution acc@1 pre-trained model fine-tuned model
ViT-Base 800 224x224 224x224 83.8 google/config google/config

Citing SimMIM

@inproceedings{xie2021simmim,
  title={SimMIM: A Simple Framework for Masked Image Modeling},
  author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han},
  booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Getting Started

Installation

  • Install CUDA 11.3 with cuDNN 8 following the official installation guide of CUDA and cuDNN.

  • Setup conda environment:

# Create environment
conda create -n SimMIM python=3.8 -y
conda activate SimMIM

# Install requirements
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y

# Install apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..

# Clone SimMIM
git clone https://github.com/microsoft/SimMIM
cd SimMIM

# Install other requirements
pip install -r requirements.txt

Evaluating provided models

To evaluate a provided model on ImageNet validation set, run:

python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_finetune.py \
--eval --cfg <config-file> --resume <checkpoint> --data-path <imagenet-path>

For example, to evaluate the Swin Base model on a single GPU, run:

python -m torch.distributed.launch --nproc_per_node 1 main_finetune.py \
--eval --cfg configs/swin_base__800ep/simmim_finetune__swin_base__img224_window7__800ep.yaml --resume simmim_finetune__swin_base__img224_window7__800ep.pth --data-path <imagenet-path>

Pre-training with SimMIM

To pre-train models with SimMIM, run:

python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_simmim.py \ 
--cfg <config-file> --data-path <imagenet-path>/train [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]

For example, to pre-train Swin Base for 800 epochs on one DGX-2 server, run:

python -m torch.distributed.launch --nproc_per_node 16 main_simmim.py \ 
--cfg configs/swin_base__800ep/simmim_pretrain__swin_base__img192_window6__800ep.yaml --batch-size 128 --data-path <imagenet-path>/train [--output <output-directory> --tag <job-tag>]

Fine-tuning pre-trained models

To fine-tune models pre-trained by SimMIM, run:

python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_finetune.py \ 
--cfg <config-file> --data-path <imagenet-path> --pretrained <pretrained-ckpt> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]

For example, to fine-tune Swin Base pre-trained by SimMIM on one DGX-2 server, run:

python -m torch.distributed.launch --nproc_per_node 16 main_finetune.py \ 
--cfg configs/swin_base__800ep/simmim_finetune__swin_base__img224_window7__800ep.yaml --batch-size 128 --data-path <imagenet-path> --pretrained <pretrained-ckpt> [--output <output-directory> --tag <job-tag>]

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

simmim's People

Contributors

ancientmooner avatar caoyue10 avatar microsoft-github-operations[bot] avatar microsoftopensource avatar zdaxie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simmim's Issues

About data augment

thanks for sharing your excellent work๏ผ
i have a question about data augment in pretraining๏ผš
have you try other data augment like RandomResizedCrop() + RandomHorizontalFlip() + RandomVerticalFlip() or other compose๏ผŸ
and RandomResizedCrop() + RandomHorizontalFlip() work the best?

why masking embedding feature maps but not input images

Hi! This is a wonderful work.
After reading the paper, my intuition is that the input image will be masked for reconstruction.
Then, while I read the code, I was surprised to find that you did not mask the input image directly, rather you masked the features maps output by Patch_Embed. See here

x = x * (1. - w) + mask_tokens * w

I was wondering what is the reasoning behind this approach and how is it different from the more intuitive one of masking the original image directly.
Thank you.

can not reproduce your results. trained from your released pre-trained vit-base model

Hello, I tired to reproduce your vit-B results on imagenet-1k. I run the scripts following your readme.md, except that I used one NVIDIA-A100 GPUs *8 instead two nodes.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node 8 main_finetune.py
--cfg /data/users/zhangjunlei/code/simmim/SimMIM-main/configs/vit_base__800ep/simmim_finetune__vit_base__img224__800ep.yaml
--batch-size 128
--data-path /data/users/zhangjunlei/dataset/IMT
--pretrained /data/users/models/simmim_pretrain__vit_base__img224__800ep.pth
--output /data/users/zhangjunlei/output/mim
--tag finetuneIMN_downloadedPretrainedVitbbaseline
--accumulation-steps 2

I run the code without modifying your code. But the max acc is 83.6. Your paper is 83.8. The log is listed following:

[2022-04-01 01:12:41 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.066 (3.066) Loss 0.3681 (0.3681) Acc@1 93.359 (93.359) Acc@5 98.730 (98.730) Mem 39691MB
[2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.596 Acc@5 96.636
[2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.62%
[2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.2106559070000715e-06, 1.2106559070000715e-06, 1.3305729095853683e-06, 1.3305729095853683e-06, 1.5150606058704401e-06, 1.5150606058704401e-06, 1.798887830924397e-06, 1.798887830924397e-06, 2.235545100238177e-06, 2.235545100238177e-06, 2.9073255145670688e-06, 2.9073255145670688e-06, 3.940833844303826e-06, 3.940833844303826e-06, 5.53084665928345e-06, 5.53084665928345e-06, 7.977020220790567e-06, 7.977020220790567e-06, 1.1740364161570747e-05, 1.1740364161570747e-05, 1.7530124070463328e-05, 1.7530124070463328e-05, 2.6437447007221146e-05, 2.6437447007221146e-05, 4.0141020756079324e-05, 4.0141020756079324e-05, 6.122344190816884e-05, 6.122344190816884e-05]
[2022-04-01 01:12:57 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][0/1251] eta 1:10:34 lr 0.000061 time 3.3847 (3.3847) loss 1.5133 (1.5133) grad_norm 2.3348 (2.3348) mem 39691MB
[2022-04-01 01:14:07 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][100/1251] eta 0:13:56 lr 0.000060 time 0.6853 (0.7264) loss 1.4034 (1.3413) grad_norm 2.6896 (2.7750) mem 39691MB
[2022-04-01 01:15:17 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][200/1251] eta 0:12:29 lr 0.000059 time 0.6943 (0.7133) loss 1.4613 (1.3375) grad_norm 2.3938 (2.7978) mem 39691MB
[2022-04-01 01:16:27 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][300/1251] eta 0:11:14 lr 0.000057 time 0.6894 (0.7089) loss 0.7287 (1.3413) grad_norm 2.0634 (2.7904) mem 39691MB
[2022-04-01 01:17:37 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][400/1251] eta 0:10:01 lr 0.000056 time 0.6978 (0.7069) loss 1.0116 (1.3347) grad_norm 2.0660 (nan) mem 39691MB
[2022-04-01 01:18:47 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][500/1251] eta 0:08:49 lr 0.000055 time 0.6884 (0.7053) loss 1.4866 (1.3390) grad_norm 2.5877 (nan) mem 39691MB
[2022-04-01 01:19:56 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][600/1251] eta 0:07:38 lr 0.000053 time 0.6879 (0.7044) loss 1.3395 (1.3345) grad_norm 2.0532 (nan) mem 39691MB
[2022-04-01 01:21:07 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][700/1251] eta 0:06:27 lr 0.000052 time 0.6993 (0.7038) loss 1.5304 (1.3324) grad_norm 2.6916 (nan) mem 39691MB
[2022-04-01 01:22:16 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][800/1251] eta 0:05:17 lr 0.000051 time 0.6891 (0.7033) loss 1.0127 (1.3341) grad_norm 1.9580 (nan) mem 39691MB
[2022-04-01 01:23:26 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][900/1251] eta 0:04:06 lr 0.000050 time 0.6901 (0.7028) loss 0.8164 (1.3327) grad_norm 1.9068 (nan) mem 39691MB
[2022-04-01 01:24:36 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][1000/1251] eta 0:02:56 lr 0.000048 time 0.6892 (0.7024) loss 1.5583 (1.3314) grad_norm 2.2595 (nan) mem 39691MB
[2022-04-01 01:25:46 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][1100/1251] eta 0:01:46 lr 0.000047 time 0.6845 (0.7022) loss 1.3442 (1.3315) grad_norm 2.4191 (nan) mem 39691MB
[2022-04-01 01:26:56 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][1200/1251] eta 0:00:35 lr 0.000046 time 0.7005 (0.7019) loss 1.2828 (1.3303) grad_norm 2.3000 (nan) mem 39691MB
[2022-04-01 01:27:31 simmim_finetune] (main_finetune.py 230): INFO EPOCH 93 training takes 0:14:38
[2022-04-01 01:27:34 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.048 (3.048) Loss 0.3621 (0.3621) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.550 Acc@5 96.636
[2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.5%
[2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.62%
[2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.1549451215637642e-06, 1.1549451215637642e-06, 1.2431484613501714e-06, 1.2431484613501714e-06, 1.378845907175413e-06, 1.378845907175413e-06, 1.5876112084450153e-06, 1.5876112084450153e-06, 1.9087885950136343e-06, 1.9087885950136343e-06, 2.402907651273048e-06, 2.402907651273048e-06, 3.1630908147490695e-06, 3.1630908147490695e-06, 4.332603373942948e-06, 4.332603373942948e-06, 6.131853465010453e-06, 6.131853465010453e-06, 8.899930528191233e-06, 8.899930528191233e-06, 1.3158510625392428e-05, 1.3158510625392428e-05, 1.971017231339427e-05, 1.971017231339427e-05, 2.9789651833397102e-05, 2.9789651833397102e-05, 4.5296543402632226e-05, 4.5296543402632226e-05]
[2022-04-01 01:27:50 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][0/1251] eta 1:12:29 lr 0.000045 time 3.4766 (3.4766) loss 1.2928 (1.2928) grad_norm 2.0603 (2.0603) mem 39691MB
[2022-04-01 01:29:00 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][100/1251] eta 0:13:58 lr 0.000044 time 0.6995 (0.7282) loss 1.5299 (1.3223) grad_norm 2.9111 (2.7888) mem 39691MB
[2022-04-01 01:30:10 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][200/1251] eta 0:12:31 lr 0.000043 time 0.6885 (0.7147) loss 0.9353 (1.3269) grad_norm 2.2225 (2.7882) mem 39691MB
[2022-04-01 01:31:20 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][300/1251] eta 0:11:14 lr 0.000042 time 0.6891 (0.7095) loss 1.2098 (1.3210) grad_norm 2.1065 (2.7789) mem 39691MB
[2022-04-01 01:32:30 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][400/1251] eta 0:10:01 lr 0.000041 time 0.6902 (0.7068) loss 1.3892 (1.3320) grad_norm 2.4860 (2.7760) mem 39691MB
[2022-04-01 01:33:40 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][500/1251] eta 0:08:50 lr 0.000040 time 0.6902 (0.7058) loss 1.4630 (1.3225) grad_norm 2.4686 (2.7736) mem 39691MB
[2022-04-01 01:34:50 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][600/1251] eta 0:07:38 lr 0.000039 time 0.6897 (0.7047) loss 1.0282 (1.3204) grad_norm 2.3859 (2.7731) mem 39691MB
[2022-04-01 01:36:00 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][700/1251] eta 0:06:27 lr 0.000037 time 0.6890 (0.7039) loss 1.2109 (1.3232) grad_norm 2.2731 (2.7694) mem 39691MB
[2022-04-01 01:37:10 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][800/1251] eta 0:05:17 lr 0.000036 time 0.6896 (0.7033) loss 1.3349 (1.3265) grad_norm 2.1328 (2.7723) mem 39691MB
[2022-04-01 01:38:20 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][900/1251] eta 0:04:06 lr 0.000035 time 0.7249 (0.7029) loss 1.5155 (1.3238) grad_norm 2.3807 (2.7779) mem 39691MB
[2022-04-01 01:39:30 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][1000/1251] eta 0:02:56 lr 0.000034 time 0.6991 (0.7025) loss 1.5035 (1.3234) grad_norm 3.5433 (2.7713) mem 39691MB
[2022-04-01 01:40:40 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][1100/1251] eta 0:01:46 lr 0.000033 time 0.6895 (0.7022) loss 1.3362 (1.3223) grad_norm 2.2755 (2.7748) mem 39691MB
[2022-04-01 01:41:50 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][1200/1251] eta 0:00:35 lr 0.000032 time 0.6905 (0.7020) loss 1.1312 (1.3200) grad_norm 2.2855 (2.7730) mem 39691MB
[2022-04-01 01:42:25 simmim_finetune] (main_finetune.py 230): INFO EPOCH 94 training takes 0:14:38
[2022-04-01 01:42:28 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.078 (3.078) Loss 0.3602 (0.3602) Acc@1 93.457 (93.457) Acc@5 98.730 (98.730) Mem 39691MB
[2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.656 Acc@5 96.634
[2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.7%
[2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.107709723978354e-06, 1.107709723978354e-06, 1.1690240608640958e-06, 1.1690240608640958e-06, 1.2633538099190834e-06, 1.2633538099190834e-06, 1.4084765007729102e-06, 1.4084765007729102e-06, 1.6317421790095668e-06, 1.6317421790095668e-06, 1.9752278378351925e-06, 1.9752278378351925e-06, 2.5036673129515393e-06, 2.5036673129515393e-06, 3.316651120822842e-06, 3.316651120822842e-06, 4.567395440624847e-06, 4.567395440624847e-06, 6.491617471089471e-06, 6.491617471089471e-06, 9.45195905641966e-06, 9.45195905641966e-06, 1.4006330726158412e-05, 1.4006330726158412e-05, 2.1013056371910337e-05, 2.1013056371910337e-05, 3.1792634288451755e-05, 3.1792634288451755e-05]
[2022-04-01 01:42:44 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][0/1251] eta 1:14:20 lr 0.000032 time 3.5654 (3.5654) loss 1.4688 (1.4688) grad_norm 2.3501 (2.3501) mem 39691MB
[2022-04-01 01:43:54 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][100/1251] eta 0:13:59 lr 0.000031 time 0.6891 (0.7293) loss 1.0123 (1.3321) grad_norm 2.7074 (2.8018) mem 39691MB
[2022-04-01 01:45:04 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][200/1251] eta 0:12:30 lr 0.000030 time 0.6897 (0.7144) loss 1.0944 (1.3172) grad_norm 2.3161 (nan) mem 39691MB
[2022-04-01 01:46:14 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][300/1251] eta 0:11:14 lr 0.000029 time 0.6853 (0.7096) loss 1.5585 (1.3295) grad_norm 2.4806 (nan) mem 39691MB
[2022-04-01 01:47:24 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][400/1251] eta 0:10:01 lr 0.000028 time 0.6891 (0.7070) loss 1.5054 (1.3297) grad_norm 2.2357 (nan) mem 39691MB
[2022-04-01 01:48:34 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][500/1251] eta 0:08:49 lr 0.000027 time 0.6900 (0.7056) loss 0.9077 (1.3258) grad_norm 2.1560 (nan) mem 39691MB
[2022-04-01 01:49:44 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][600/1251] eta 0:07:38 lr 0.000026 time 0.6884 (0.7048) loss 0.8389 (1.3274) grad_norm 2.0718 (nan) mem 39691MB
[2022-04-01 01:50:54 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][700/1251] eta 0:06:27 lr 0.000025 time 0.6897 (0.7041) loss 1.1630 (1.3244) grad_norm 2.7615 (nan) mem 39691MB
[2022-04-01 01:52:04 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][800/1251] eta 0:05:17 lr 0.000024 time 0.6899 (0.7035) loss 1.3777 (1.3235) grad_norm 2.2742 (nan) mem 39691MB
[2022-04-01 01:53:14 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][900/1251] eta 0:04:06 lr 0.000024 time 0.6901 (0.7030) loss 1.3908 (1.3225) grad_norm 2.2728 (nan) mem 39691MB
[2022-04-01 01:54:24 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][1000/1251] eta 0:02:56 lr 0.000023 time 0.6911 (0.7026) loss 1.1246 (1.3207) grad_norm 2.3794 (nan) mem 39691MB
[2022-04-01 01:55:34 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][1100/1251] eta 0:01:46 lr 0.000022 time 0.6891 (0.7024) loss 1.4379 (1.3210) grad_norm 2.5842 (nan) mem 39691MB
[2022-04-01 01:56:44 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][1200/1251] eta 0:00:35 lr 0.000021 time 0.6986 (0.7021) loss 1.4860 (1.3207) grad_norm 2.2715 (nan) mem 39691MB
[2022-04-01 01:57:19 simmim_finetune] (main_finetune.py 230): INFO EPOCH 95 training takes 0:14:38
[2022-04-01 01:57:19 simmim_finetune] (utils.py 60): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_95.pth saving......
[2022-04-01 01:57:20 simmim_finetune] (utils.py 62): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_95.pth saved !!!
[2022-04-01 01:57:23 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 2.916 (2.916) Loss 0.3640 (0.3640) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.602 Acc@5 96.632
[2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.068996329878458e-06, 1.068996329878458e-06, 1.1082728599612733e-06, 1.1082728599612733e-06, 1.168698290857912e-06, 1.168698290857912e-06, 1.2616604922373566e-06, 1.2616604922373566e-06, 1.4046792635903478e-06, 1.4046792635903478e-06, 1.62470814259495e-06, 1.62470814259495e-06, 1.963214110294338e-06, 1.963214110294338e-06, 2.4839925221395496e-06, 2.4839925221395496e-06, 3.285190078824491e-06, 3.285190078824491e-06, 4.517801704493633e-06, 4.517801704493633e-06, 6.414127282446157e-06, 6.414127282446157e-06, 9.331551248526965e-06, 9.331551248526965e-06, 1.3819895811728205e-05, 1.3819895811728205e-05, 2.0725041293576267e-05, 2.0725041293576267e-05]
[2022-04-01 01:57:39 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][0/1251] eta 1:12:07 lr 0.000021 time 3.4596 (3.4596) loss 1.4750 (1.4750) grad_norm 2.6895 (2.6895) mem 39691MB
[2022-04-01 01:58:49 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][100/1251] eta 0:13:55 lr 0.000020 time 0.6900 (0.7259) loss 1.5745 (1.2920) grad_norm 2.6246 (2.7933) mem 39691MB
[2022-04-01 01:59:58 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][200/1251] eta 0:12:28 lr 0.000019 time 0.6896 (0.7125) loss 0.8455 (1.3166) grad_norm 2.4287 (2.7917) mem 39691MB
[2022-04-01 02:01:09 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][300/1251] eta 0:11:13 lr 0.000018 time 0.6900 (0.7086) loss 0.9218 (1.3165) grad_norm 2.2249 (2.7938) mem 39691MB
[2022-04-01 02:02:19 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][400/1251] eta 0:10:01 lr 0.000018 time 0.6907 (0.7068) loss 0.9292 (1.3220) grad_norm 2.1898 (2.7949) mem 39691MB
[2022-04-01 02:03:29 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][500/1251] eta 0:08:49 lr 0.000017 time 0.6890 (0.7053) loss 1.1393 (1.3168) grad_norm 2.5243 (2.7906) mem 39691MB
[2022-04-01 02:04:39 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][600/1251] eta 0:07:38 lr 0.000016 time 0.6888 (0.7045) loss 1.3772 (1.3255) grad_norm 2.5142 (2.7838) mem 39691MB
[2022-04-01 02:05:49 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][700/1251] eta 0:06:27 lr 0.000016 time 0.6895 (0.7038) loss 1.5106 (1.3251) grad_norm 2.3905 (2.7886) mem 39691MB
[2022-04-01 02:06:59 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][800/1251] eta 0:05:17 lr 0.000015 time 0.6896 (0.7033) loss 1.4453 (1.3282) grad_norm 2.3186 (2.7831) mem 39691MB
[2022-04-01 02:08:09 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][900/1251] eta 0:04:06 lr 0.000014 time 0.6890 (0.7029) loss 1.6301 (1.3289) grad_norm 2.6045 (2.7818) mem 39691MB
[2022-04-01 02:09:19 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][1000/1251] eta 0:02:56 lr 0.000014 time 0.6890 (0.7027) loss 1.3683 (1.3259) grad_norm 2.2704 (2.7809) mem 39691MB
[2022-04-01 02:10:29 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][1100/1251] eta 0:01:46 lr 0.000013 time 0.6898 (0.7024) loss 1.1971 (1.3261) grad_norm 2.1725 (inf) mem 39691MB
[2022-04-01 02:11:39 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][1200/1251] eta 0:00:35 lr 0.000012 time 0.6898 (0.7022) loss 1.4186 (1.3273) grad_norm 2.1364 (inf) mem 39691MB
[2022-04-01 02:12:14 simmim_finetune] (main_finetune.py 230): INFO EPOCH 96 training takes 0:14:38
[2022-04-01 02:12:17 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.006 (3.006) Loss 0.3643 (0.3643) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.592 Acc@5 96.658
[2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.0388431447101286e-06, 1.0388431447101286e-06, 1.060954812742414e-06, 1.060954812742414e-06, 1.0949727635613148e-06, 1.0949727635613148e-06, 1.1473080725134695e-06, 1.1473080725134695e-06, 1.2278239324398616e-06, 1.2278239324398616e-06, 1.3516944861727725e-06, 1.3516944861727725e-06, 1.542264568838789e-06, 1.542264568838789e-06, 1.8354493114018914e-06, 1.8354493114018914e-06, 2.286502761498972e-06, 2.286502761498972e-06, 2.9804311462637114e-06, 2.9804311462637114e-06, 4.048013276671003e-06, 4.048013276671003e-06, 5.69044732345145e-06, 5.69044732345145e-06, 8.21726893388291e-06, 8.21726893388291e-06, 1.2104686796085155e-05, 1.2104686796085155e-05]
[2022-04-01 02:12:33 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][0/1251] eta 1:10:45 lr 0.000012 time 3.3938 (3.3938) loss 1.1607 (1.1607) grad_norm 2.6309 (2.6309) mem 39691MB
[2022-04-01 02:13:43 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][100/1251] eta 0:13:56 lr 0.000012 time 0.6900 (0.7268) loss 1.4923 (1.3099) grad_norm 2.3810 (2.7979) mem 39691MB
[2022-04-01 02:14:53 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][200/1251] eta 0:12:30 lr 0.000011 time 0.6901 (0.7136) loss 1.3939 (1.3374) grad_norm 2.4213 (2.8035) mem 39691MB
[2022-04-01 02:16:03 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][300/1251] eta 0:11:14 lr 0.000010 time 0.6900 (0.7089) loss 1.4464 (1.3354) grad_norm 2.3146 (2.7829) mem 39691MB
[2022-04-01 02:17:13 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][400/1251] eta 0:10:01 lr 0.000010 time 0.7050 (0.7067) loss 1.4488 (1.3360) grad_norm 2.3009 (2.7833) mem 39691MB
[2022-04-01 02:18:23 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][500/1251] eta 0:08:49 lr 0.000009 time 0.6896 (0.7056) loss 1.5225 (1.3309) grad_norm 2.6082 (2.7907) mem 39691MB
[2022-04-01 02:19:33 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][600/1251] eta 0:07:38 lr 0.000009 time 0.6896 (0.7047) loss 1.5717 (1.3272) grad_norm 2.2648 (2.7870) mem 39691MB
[2022-04-01 02:20:43 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][700/1251] eta 0:06:28 lr 0.000008 time 0.6906 (0.7042) loss 1.2725 (1.3265) grad_norm 2.9951 (2.7761) mem 39691MB
[2022-04-01 02:21:53 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][800/1251] eta 0:05:17 lr 0.000008 time 0.6897 (0.7038) loss 1.5141 (1.3220) grad_norm 2.0174 (2.7768) mem 39691MB
[2022-04-01 02:23:03 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][900/1251] eta 0:04:06 lr 0.000007 time 0.6903 (0.7034) loss 1.5645 (1.3245) grad_norm 2.3992 (2.7796) mem 39691MB
[2022-04-01 02:24:13 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][1000/1251] eta 0:02:56 lr 0.000007 time 0.6900 (0.7030) loss 0.8204 (1.3204) grad_norm 2.7277 (2.7826) mem 39691MB
[2022-04-01 02:25:23 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][1100/1251] eta 0:01:46 lr 0.000007 time 0.6897 (0.7027) loss 1.3750 (1.3185) grad_norm 1.9739 (2.7843) mem 39691MB
[2022-04-01 02:26:33 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][1200/1251] eta 0:00:35 lr 0.000006 time 0.6888 (0.7025) loss 1.2978 (1.3201) grad_norm 2.1809 (2.7888) mem 39691MB
[2022-04-01 02:27:08 simmim_finetune] (main_finetune.py 230): INFO EPOCH 97 training takes 0:14:38
[2022-04-01 02:27:11 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.004 (3.004) Loss 0.3647 (0.3647) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.596 Acc@5 96.674
[2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.0172799260266887e-06, 1.0172799260266887e-06, 1.0271166164073458e-06, 1.0271166164073458e-06, 1.0422499862237412e-06, 1.0422499862237412e-06, 1.0655320936335803e-06, 1.0655320936335803e-06, 1.101350720417948e-06, 1.101350720417948e-06, 1.156456300086206e-06, 1.156456300086206e-06, 1.2412341149604493e-06, 1.2412341149604493e-06, 1.371661522459285e-06, 1.371661522459285e-06, 1.5723190724574938e-06, 1.5723190724574938e-06, 1.8810229955316616e-06, 1.8810229955316616e-06, 2.3559521079534575e-06, 2.3559521079534575e-06, 3.0866122809100673e-06, 3.0866122809100673e-06, 4.210704854689466e-06, 4.210704854689466e-06, 5.9400780451193105e-06, 5.9400780451193105e-06]
[2022-04-01 02:27:27 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][0/1251] eta 1:07:01 lr 0.000006 time 3.2148 (3.2148) loss 1.6244 (1.6244) grad_norm 2.8371 (2.8371) mem 39691MB
[2022-04-01 02:28:37 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][100/1251] eta 0:13:52 lr 0.000006 time 0.6894 (0.7234) loss 1.4783 (1.3294) grad_norm 2.2639 (2.7801) mem 39691MB
[2022-04-01 02:29:47 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][200/1251] eta 0:12:27 lr 0.000005 time 0.6890 (0.7116) loss 1.5537 (1.3179) grad_norm 2.2326 (2.7782) mem 39691MB
[2022-04-01 02:30:57 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][300/1251] eta 0:11:12 lr 0.000005 time 0.6901 (0.7073) loss 1.4142 (1.3260) grad_norm 3.1747 (2.7874) mem 39691MB
[2022-04-01 02:32:07 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][400/1251] eta 0:10:00 lr 0.000004 time 0.6900 (0.7054) loss 1.1928 (1.3229) grad_norm 2.6400 (2.7897) mem 39691MB
[2022-04-01 02:33:17 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][500/1251] eta 0:08:48 lr 0.000004 time 0.6899 (0.7043) loss 1.3475 (1.3193) grad_norm 2.2912 (2.7904) mem 39691MB
[2022-04-01 02:34:27 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][600/1251] eta 0:07:38 lr 0.000004 time 0.6880 (0.7036) loss 1.4417 (1.3187) grad_norm 2.1597 (2.7875) mem 39691MB
[2022-04-01 02:35:37 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][700/1251] eta 0:06:27 lr 0.000004 time 0.6896 (0.7030) loss 0.9872 (1.3167) grad_norm 2.2810 (2.7852) mem 39691MB
[2022-04-01 02:36:46 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][800/1251] eta 0:05:16 lr 0.000003 time 0.6900 (0.7026) loss 1.2168 (1.3158) grad_norm 2.3407 (2.7873) mem 39691MB
[2022-04-01 02:37:56 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][900/1251] eta 0:04:06 lr 0.000003 time 0.6897 (0.7023) loss 1.1657 (1.3130) grad_norm 2.2128 (2.7891) mem 39691MB
[2022-04-01 02:39:06 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][1000/1251] eta 0:02:56 lr 0.000003 time 0.6889 (0.7020) loss 1.4734 (1.3160) grad_norm 2.2695 (2.7867) mem 39691MB
[2022-04-01 02:40:16 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][1100/1251] eta 0:01:45 lr 0.000003 time 0.6888 (0.7018) loss 1.4531 (1.3167) grad_norm 2.3891 (2.7864) mem 39691MB
[2022-04-01 02:41:26 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][1200/1251] eta 0:00:35 lr 0.000002 time 0.6893 (0.7015) loss 0.9294 (1.3169) grad_norm 2.5364 (inf) mem 39691MB
[2022-04-01 02:42:01 simmim_finetune] (main_finetune.py 230): INFO EPOCH 98 training takes 0:14:37
[2022-04-01 02:42:04 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.071 (3.071) Loss 0.3626 (0.3626) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.570 Acc@5 96.640
[2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.0043279541216191e-06, 1.0043279541216191e-06, 1.0067916651705151e-06, 1.0067916651705151e-06, 1.010581989861124e-06, 1.010581989861124e-06, 1.0164132586159072e-06, 1.0164132586159072e-06, 1.0253844413155735e-06, 1.0253844413155735e-06, 1.0391862608535217e-06, 1.0391862608535217e-06, 1.060419829373442e-06, 1.060419829373442e-06, 1.0930868578656273e-06, 1.0930868578656273e-06, 1.1433438247766814e-06, 1.1433438247766814e-06, 1.2206622354090724e-06, 1.2206622354090724e-06, 1.3396136363819813e-06, 1.3396136363819813e-06, 1.5226157917249184e-06, 1.5226157917249184e-06, 1.8041575691755905e-06, 1.8041575691755905e-06, 2.237298765253548e-06, 2.237298765253548e-06]
[2022-04-01 02:42:20 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][0/1251] eta 1:09:49 lr 0.000002 time 3.3489 (3.3489) loss 1.0033 (1.0033) grad_norm 2.1860 (2.1860) mem 39691MB
[2022-04-01 02:43:30 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][100/1251] eta 0:13:54 lr 0.000002 time 0.6967 (0.7249) loss 0.8255 (1.2927) grad_norm 2.2087 (2.7138) mem 39691MB
[2022-04-01 02:44:40 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][200/1251] eta 0:12:28 lr 0.000002 time 0.6892 (0.7121) loss 1.0470 (1.2953) grad_norm 1.9035 (2.7317) mem 39691MB
[2022-04-01 02:45:50 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][300/1251] eta 0:11:13 lr 0.000002 time 0.6904 (0.7080) loss 0.9423 (1.2993) grad_norm 2.3368 (2.7545) mem 39691MB
[2022-04-01 02:47:00 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][400/1251] eta 0:10:00 lr 0.000002 time 0.6895 (0.7058) loss 1.1605 (1.3136) grad_norm 2.5136 (2.7636) mem 39691MB
[2022-04-01 02:48:10 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][500/1251] eta 0:08:49 lr 0.000001 time 0.6903 (0.7046) loss 0.9751 (1.3128) grad_norm 2.5944 (2.7818) mem 39691MB
[2022-04-01 02:49:20 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][600/1251] eta 0:07:38 lr 0.000001 time 0.7461 (0.7037) loss 1.6204 (1.3107) grad_norm 2.2264 (2.7801) mem 39691MB
[2022-04-01 02:50:30 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][700/1251] eta 0:06:27 lr 0.000001 time 0.6892 (0.7030) loss 1.2943 (1.3101) grad_norm 2.3324 (2.7893) mem 39691MB
[2022-04-01 02:51:40 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][800/1251] eta 0:05:16 lr 0.000001 time 0.6891 (0.7026) loss 1.3340 (1.3111) grad_norm 2.0425 (2.7911) mem 39691MB
[2022-04-01 02:52:50 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][900/1251] eta 0:04:06 lr 0.000001 time 0.7028 (0.7021) loss 1.1269 (1.3138) grad_norm 2.3656 (2.7906) mem 39691MB
[2022-04-01 02:54:00 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][1000/1251] eta 0:02:56 lr 0.000001 time 0.6908 (0.7020) loss 1.0681 (1.3161) grad_norm 2.0883 (2.7906) mem 39691MB
[2022-04-01 02:55:10 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][1100/1251] eta 0:01:45 lr 0.000001 time 0.6897 (0.7017) loss 0.9208 (1.3166) grad_norm 2.1637 (2.7937) mem 39691MB
[2022-04-01 02:56:20 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][1200/1251] eta 0:00:35 lr 0.000001 time 0.6889 (0.7016) loss 1.5211 (1.3184) grad_norm 2.0254 (2.7999) mem 39691MB
[2022-04-01 02:56:55 simmim_finetune] (main_finetune.py 230): INFO EPOCH 99 training takes 0:14:37
[2022-04-01 02:56:55 simmim_finetune] (utils.py 60): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_99.pth saving......
[2022-04-01 02:56:56 simmim_finetune] (utils.py 62): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_99.pth saved !!!
[2022-04-01 02:56:59 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 2.885 (2.885) Loss 0.3653 (0.3653) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.606 Acc@5 96.688
[2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 148): INFO Training time 1 day, 0:49:35
[2022-04-01 03:17:39 simmim_finetune] (main_finetune.py 344): INFO Full config saved to /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/config.json
[2022-04-01 03:17:39 simmim_finetune] (main_finetune.py 347): INFO AMP_OPT_LEVEL: O1

Geometric interpolation ViT

Hi,

In the interpolation for ViT, you have:

SimMIM/utils.py

Line 226 in bec329f

dst_num_pos, _ = model.state_dict()[key].size()

However, the keys are from:

SimMIM/utils.py

Line 218 in bec329f

all_keys = list(checkpoint_model.keys())

which includes:

SimMIM/utils.py

Lines 213 to 215 in bec329f

for i in range(num_layers):
checkpoint_model["blocks.%d.attn.relative_position_bias_table" % i] = rel_pos_bias.clone()
checkpoint_model.pop("rel_pos_bias.relative_position_bias_table")

However, these keys are not present in the model, and give an error.

I want to use the ViT base models for downstream tasks.

Could you please tell me if I am missing something in this?

Thanks.

A question about masking strategy

For Swin Transformer, the default masked patch size is 32x32, but the first patch embedding layer needs 4x4 patch size. So I want know if the learned masked token is designed for 4x4 patch size, but you also use 32x32 masking strategy?

Allow arbitrary-sized images by dynamic masking: upstream changes from Swin-Transformer-Object-Detection / SOLQ

Hi!

To combine Swin transformer backbone with Deformable DETR detector, SOLQ did some changes to swin_transformer.py that allow to compute the padding mask dynamically and allow for arbitrary-sized images in input (I think this is supported for relative positional encoding only).

Similar edits were done by your colleagues in https://github.com/SwinTransformer/Swin-Transformer-Object-Detection/blob/master/mmdet/models/backbones/swin_transformer.py

If this interests you, maybe you could import those edits from SOLQ / Swin-Transformer-Object-Detection or implement similar edits. This will make it simpler to experiment with SimMIM checkpoints / backbone code in object detection context and make sure that checkpoints load correctly.

Information about relative positional encoding

Hi thanks for sharing your work.
I was wondering if someone can provide we with some additional information about the following line (and the two lines after that):

relative_position_index[0, 0:] = self.num_relative_distance - 3

Why exactly do we subtract 3, 2 and 1 respectively? And isn't the statement on line 80 overwriting the statement two lines prior on line 78?

What exactly does the relative_position_index matrix hold?

็”จ่‡ช็›‘็ฃ็ป“ๆžœ็š„้ข„่ฎญ็ปƒๆƒ้‡ๅฏนไธ‹ๆธธไปปๅŠก็ฒพๅบฆๅฝฑๅ“

่ฏท้—ฎๅฆ‚ๆžœไธ‹ๆธธไปปๅŠกๆˆ‘ไธ็”จFineTuning็š„้ข„่ฎญ็ปƒๆƒ้‡๏ผŒ็›ดๆŽฅ็”จ่‡ช็›‘็ฃ็ป“ๆžœ็š„้ข„่ฎญ็ปƒๆƒ้‡๏ผŒๅฏนไธ‹ๆธธไปปๅŠก็ฒพๅบฆไผšๆœ‰ๅฝฑๅ“ๅ—๏ผŒๆฏ”ๅฆ‚ๅˆ†ๅ‰ฒ

Inconsistency of ViT-base config as described in the paper

Thanks for sharing your excellent work!

I notice that there are some differences in ViT-base pre-training config provided here and described in the paper, e.g., base_lr, warmup_epochs, lr_scheduler. May I know why these settings are different?

Thank you!

Plan to implement downstream tasks

Hi,

I'm very impressed by your work.

I wonder whether you have a plan to do experiments that transfer your pre-trained model to downstream tasks such as object detection as MAE.

Many thanks in advance.

Any plan to support 3D version?

Hi, thanks for the great codes! Just wondering if you have any plan to support the 3D version of SimMIM (Video Swin Masked Encoding)? I think there is some paper about video MAE but it is not in Swin styles...

How can I solve this problem

/root/miniconda3/bin/python: can't open file 'main_simmim.py--cfg': [Errno 2] No such file or directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 1986) of binary: /root/miniconda3/bin/python
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Train from scratch

Hi,
I want to train the model on the costom dataset including non-realistic images (painting, comic). I think your code requires pretrained model (ImageNet-1k) to build the SimMIM, but I am concerned about the decrease on accuracy due to differences in domains. Could you give me how to train the SimMIM model from scratch on the costom datasets?

Thanks in advance.

How can you resolve the mismatch of patch_size in <patch_embed> module between pretrained model and finetuned model?

As far as I know, for ViT experiment: you use random mask with patch size 32. But when running fine-tuning, you use a default patch size of 16. I got confused on how you tackle the mismatch in <patch_size> between these. As I dig into your utils.py, the load_pretrained function only resolves the mismatch patch size for Relative Position Embeddings. Please correct me if I miss some things?

USE_RPB and USE_SHARED_RPB are inconsistent in pretrain and finetune

Hi, Thanks for the amazing work and share the oss code!
I was wondering if you could share the reason of this?

MODEL.VIT.USE_RPB: True in simmim_finetune__vit_base__img224__800ep.yaml
MODEL.VIT.USE_SHARED_RPB: True in simmim_pretrain__vit_base__img224__800ep.yaml

Do we expect use the same RPB between pretrain and finetune?

Thanks,

192x192 pretraining resolution

Did you try higher resolutions for pretraining? Why did you choose such a relatively small one ?

  • And 2nd question:

How much performance difference is with ViT G between 100 and 800 epochs?

inverse swin

Dear SimMiM team,

Thank you for sharing this great work. I really like it.

In the section 3.3 prediction head, you mention about 'inverse swin'. Could you specify what it is? Could you provide reference about its exact structure?

Thank you for your help.

Best Wishes,

Alex

SimMIM with Absolute Position Embedding

Hello,

I would like to use SimMIM model pre-trained with Absolute position embedding = True, and RPE ( incl. shared) = False.

However, I do not have the resources to train with an effective batch size of 2048.

Would it please be possible to provide appropriate pre-trained models on the repo?

Thanks and regards

Performance using the cosine distance

Hi @caoyue10 , thanks for your insightful work.

I found that the experiments and discussion in your paper state that different types of distance (e.g. l2, l1) in calculating the loss perform equally well. However, I would like to further know that if this still holds for the Cosine distance as well?

Since cosine distance has been prevalent in previous CL works, and it involves a l2-normalization, I think experimenting with this could be helpful. Could you shed some light on this?

Best.

Confusion about fine-tune

Thank you very much for your work. I tried to use your method to conduct pre-training on my data set, and then compared with the initialization model and imagenet supervised training model respectively. The results showed that the convergence speed of SimMIM pre-training model was similar to the initialization model, although the accuracy would be gradually better than the initialization model after several iterations. But not as good as the convergence of the supervised training model. Is your method as described by MAE in Fine-tune : still improves accuracy after many iterations? Looking forward to your reply.

Setting for Linear eval

Hi, thanks for the great work.
Would you like to share the setting or code of linear evalution in Table.6(ViT-B, 56.7%)?

linear probe

In Table 6 and Section 4.2 of your paper, you show results on linear probe (even though I recognize that is not the major contribution of this work). It says that you use an "intermediate layer of ViT-B" for the linear probe. Can you specify which layer is used? Also, is the probe just on the [CLS] token output (which would be surprising, as this token has no loss on it during pretraining), or does it do something like average over the patch token outputs?

About mask_token in SimMIM codes

Hi, thanks for this great work. After reading codes, I have a question about self.mask_token in SimMIM projects. What the role of self.mask_token in the init func of SwinTransformerForSimMIM? It sames like adding another learnable params in the model. what would happen if deleting this param?
SimMIM_problems

Hope for your answer.

Loss goes nan after 14 epochs

Hi, thank you for releasing such a wonderful work. I tried to replicate the results using the following command:

python -m torch.distributed.launch --nproc_per_node 8 main_simmim.py --cfg configs/swin_base__100ep/simmim_pretrain__swin_base__img192_window6__100ep.yaml --data-path /mnt/fsx/datasets/imagenet/train --accumulation-steps 2

which gave me nan loss after 14 epochs:

[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 22): INFO >>>>>>>>>> Build Optimizer for Pre-training Stage
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 27): INFO No weight decay: {'encoder.mask_token', 'encoder.absolute_pos_embed'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 30): INFO No weight decay keywords: {'encoder.relative_position_bias_table'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 63): INFO No decay params: ['encoder.mask_token', 'encoder.patch_embed.proj.bias', 'encoder.patch_embed.norm.weight', 'encoder.patch_embed.norm.bias', 'encoder.layers.0.blocks.0.norm1.weight', 'encoder.layers.0.blocks.0.norm1.bias', 'encoder.layers.0.blocks.0.attn.qkv.bias', 'encoder.layers.0.blocks.0.attn.proj.bias', 'encoder.layers.0.blocks.0.norm2.weight', 'encoder.layers.0.blocks.0.norm2.bias', 'encoder.layers.0.blocks.0.mlp.fc1.bias', 'encoder.layers.0.blocks.0.mlp.fc2.bias', 'encoder.layers.0.blocks.1.norm1.weight', 'encoder.layers.0.blocks.1.norm1.bias', 'encoder.layers.0.blocks.1.attn.qkv.bias', 'encoder.layers.0.blocks.1.attn.proj.bias', 'encoder.layers.0.blocks.1.norm2.weight', 'encoder.layers.0.blocks.1.norm2.bias', 'encoder.layers.0.blocks.1.mlp.fc1.bias', 'encoder.layers.0.blocks.1.mlp.fc2.bias', 'encoder.layers.0.downsample.norm.weight', 'encoder.layers.0.downsample.norm.bias', 'encoder.layers.1.blocks.0.norm1.weight', 'encoder.layers.1.blocks.0.norm1.bias', 'encoder.layers.1.blocks.0.attn.qkv.bias', 'encoder.layers.1.blocks.0.attn.proj.bias', 'encoder.layers.1.blocks.0.norm2.weight', 'encoder.layers.1.blocks.0.norm2.bias', 'encoder.layers.1.blocks.0.mlp.fc1.bias', 'encoder.layers.1.blocks.0.mlp.fc2.bias', 'encoder.layers.1.blocks.1.norm1.weight', 'encoder.layers.1.blocks.1.norm1.bias', 'encoder.layers.1.blocks.1.attn.qkv.bias', 'encoder.layers.1.blocks.1.attn.proj.bias', 'encoder.layers.1.blocks.1.norm2.weight', 'encoder.layers.1.blocks.1.norm2.bias', 'encoder.layers.1.blocks.1.mlp.fc1.bias', 'encoder.layers.1.blocks.1.mlp.fc2.bias', 'encoder.layers.1.downsample.norm.weight', 'encoder.layers.1.downsample.norm.bias', 'encoder.layers.2.blocks.0.norm1.weight', 'encoder.layers.2.blocks.0.norm1.bias', 'encoder.layers.2.blocks.0.attn.qkv.bias', 'encoder.layers.2.blocks.0.attn.proj.bias', 'encoder.layers.2.blocks.0.norm2.weight', 'encoder.layers.2.blocks.0.norm2.bias', 'encoder.layers.2.blocks.0.mlp.fc1.bias', 'encoder.layers.2.blocks.0.mlp.fc2.bias', 'encoder.layers.2.blocks.1.norm1.weight', 'encoder.layers.2.blocks.1.norm1.bias', 'encoder.layers.2.blocks.1.attn.qkv.bias', 'encoder.layers.2.blocks.1.attn.proj.bias', 'encoder.layers.2.blocks.1.norm2.weight', 'encoder.layers.2.blocks.1.norm2.bias', 'encoder.layers.2.blocks.1.mlp.fc1.bias', 'encoder.layers.2.blocks.1.mlp.fc2.bias', 'encoder.layers.2.blocks.2.norm1.weight', 'encoder.layers.2.blocks.2.norm1.bias', 'encoder.layers.2.blocks.2.attn.qkv.bias', 'encoder.layers.2.blocks.2.attn.proj.bias', 'encoder.layers.2.blocks.2.norm2.weight', 'encoder.layers.2.blocks.2.norm2.bias', 'encoder.layers.2.blocks.2.mlp.fc1.bias', 'encoder.layers.2.blocks.2.mlp.fc2.bias', 'encoder.layers.2.blocks.3.norm1.weight', 'encoder.layers.2.blocks.3.norm1.bias', 'encoder.layers.2.blocks.3.attn.qkv.bias', 'encoder.layers.2.blocks.3.attn.proj.bias', 'encoder.layers.2.blocks.3.norm2.weight', 'encoder.layers.2.blocks.3.norm2.bias', 'encoder.layers.2.blocks.3.mlp.fc1.bias', 'encoder.layers.2.blocks.3.mlp.fc2.bias', 'encoder.layers.2.blocks.4.norm1.weight', 'encoder.layers.2.blocks.4.norm1.bias', 'encoder.layers.2.blocks.4.attn.qkv.bias', 'encoder.layers.2.blocks.4.attn.proj.bias', 'encoder.layers.2.blocks.4.norm2.weight', 'encoder.layers.2.blocks.4.norm2.bias', 'encoder.layers.2.blocks.4.mlp.fc1.bias', 'encoder.layers.2.blocks.4.mlp.fc2.bias', 'encoder.layers.2.blocks.5.norm1.weight', 'encoder.layers.2.blocks.5.norm1.bias', 'encoder.layers.2.blocks.5.attn.qkv.bias', 'encoder.layers.2.blocks.5.attn.proj.bias', 'encoder.layers.2.blocks.5.norm2.weight', 'encoder.layers.2.blocks.5.norm2.bias', 'encoder.layers.2.blocks.5.mlp.fc1.bias', 'encoder.layers.2.blocks.5.mlp.fc2.bias', 'encoder.layers.2.blocks.6.norm1.weight', 'encoder.layers.2.blocks.6.norm1.bias', 'encoder.layers.2.blocks.6.attn.qkv.bias', 'encoder.layers.2.blocks.6.attn.proj.bias', 'encoder.layers.2.blocks.6.norm2.weight', 'encoder.layers.2.blocks.6.norm2.bias', 'encoder.layers.2.blocks.6.mlp.fc1.bias', 'encoder.layers.2.blocks.6.mlp.fc2.bias', 'encoder.layers.2.blocks.7.norm1.weight', 'encoder.layers.2.blocks.7.norm1.bias', 'encoder.layers.2.blocks.7.attn.qkv.bias', 'encoder.layers.2.blocks.7.attn.proj.bias', 'encoder.layers.2.blocks.7.norm2.weight', 'encoder.layers.2.blocks.7.norm2.bias', 'encoder.layers.2.blocks.7.mlp.fc1.bias', 'encoder.layers.2.blocks.7.mlp.fc2.bias', 'encoder.layers.2.blocks.8.norm1.weight', 'encoder.layers.2.blocks.8.norm1.bias', 'encoder.layers.2.blocks.8.attn.qkv.bias', 'encoder.layers.2.blocks.8.attn.proj.bias', 'encoder.layers.2.blocks.8.norm2.weight', 'encoder.layers.2.blocks.8.norm2.bias', 'encoder.layers.2.blocks.8.mlp.fc1.bias', 'encoder.layers.2.blocks.8.mlp.fc2.bias', 'encoder.layers.2.blocks.9.norm1.weight', 'encoder.layers.2.blocks.9.norm1.bias', 'encoder.layers.2.blocks.9.attn.qkv.bias', 'encoder.layers.2.blocks.9.attn.proj.bias', 'encoder.layers.2.blocks.9.norm2.weight', 'encoder.layers.2.blocks.9.norm2.bias', 'encoder.layers.2.blocks.9.mlp.fc1.bias', 'encoder.layers.2.blocks.9.mlp.fc2.bias', 'encoder.layers.2.blocks.10.norm1.weight', 'encoder.layers.2.blocks.10.norm1.bias', 'encoder.layers.2.blocks.10.attn.qkv.bias', 'encoder.layers.2.blocks.10.attn.proj.bias', 'encoder.layers.2.blocks.10.norm2.weight', 'encoder.layers.2.blocks.10.norm2.bias', 'encoder.layers.2.blocks.10.mlp.fc1.bias', 'encoder.layers.2.blocks.10.mlp.fc2.bias', 'encoder.layers.2.blocks.11.norm1.weight', 'encoder.layers.2.blocks.11.norm1.bias', 'encoder.layers.2.blocks.11.attn.qkv.bias', 'encoder.layers.2.blocks.11.attn.proj.bias', 'encoder.layers.2.blocks.11.norm2.weight', 'encoder.layers.2.blocks.11.norm2.bias', 'encoder.layers.2.blocks.11.mlp.fc1.bias', 'encoder.layers.2.blocks.11.mlp.fc2.bias', 'encoder.layers.2.blocks.12.norm1.weight', 'encoder.layers.2.blocks.12.norm1.bias', 'encoder.layers.2.blocks.12.attn.qkv.bias', 'encoder.layers.2.blocks.12.attn.proj.bias', 'encoder.layers.2.blocks.12.norm2.weight', 'encoder.layers.2.blocks.12.norm2.bias', 'encoder.layers.2.blocks.12.mlp.fc1.bias', 'encoder.layers.2.blocks.12.mlp.fc2.bias', 'encoder.layers.2.blocks.13.norm1.weight', 'encoder.layers.2.blocks.13.norm1.bias', 'encoder.layers.2.blocks.13.attn.qkv.bias', 'encoder.layers.2.blocks.13.attn.proj.bias', 'encoder.layers.2.blocks.13.norm2.weight', 'encoder.layers.2.blocks.13.norm2.bias', 'encoder.layers.2.blocks.13.mlp.fc1.bias', 'encoder.layers.2.blocks.13.mlp.fc2.bias', 'encoder.layers.2.blocks.14.norm1.weight', 'encoder.layers.2.blocks.14.norm1.bias', 'encoder.layers.2.blocks.14.attn.qkv.bias', 'encoder.layers.2.blocks.14.attn.proj.bias', 'encoder.layers.2.blocks.14.norm2.weight', 'encoder.layers.2.blocks.14.norm2.bias', 'encoder.layers.2.blocks.14.mlp.fc1.bias', 'encoder.layers.2.blocks.14.mlp.fc2.bias', 'encoder.layers.2.blocks.15.norm1.weight', 'encoder.layers.2.blocks.15.norm1.bias', 'encoder.layers.2.blocks.15.attn.qkv.bias', 'encoder.layers.2.blocks.15.attn.proj.bias', 'encoder.layers.2.blocks.15.norm2.weight', 'encoder.layers.2.blocks.15.norm2.bias', 'encoder.layers.2.blocks.15.mlp.fc1.bias', 'encoder.layers.2.blocks.15.mlp.fc2.bias', 'encoder.layers.2.blocks.16.norm1.weight', 'encoder.layers.2.blocks.16.norm1.bias', 'encoder.layers.2.blocks.16.attn.qkv.bias', 'encoder.layers.2.blocks.16.attn.proj.bias', 'encoder.layers.2.blocks.16.norm2.weight', 'encoder.layers.2.blocks.16.norm2.bias', 'encoder.layers.2.blocks.16.mlp.fc1.bias', 'encoder.layers.2.blocks.16.mlp.fc2.bias', 'encoder.layers.2.blocks.17.norm1.weight', 'encoder.layers.2.blocks.17.norm1.bias', 'encoder.layers.2.blocks.17.attn.qkv.bias', 'encoder.layers.2.blocks.17.attn.proj.bias', 'encoder.layers.2.blocks.17.norm2.weight', 'encoder.layers.2.blocks.17.norm2.bias', 'encoder.layers.2.blocks.17.mlp.fc1.bias', 'encoder.layers.2.blocks.17.mlp.fc2.bias', 'encoder.layers.2.downsample.norm.weight', 'encoder.layers.2.downsample.norm.bias', 'encoder.layers.3.blocks.0.norm1.weight', 'encoder.layers.3.blocks.0.norm1.bias', 'encoder.layers.3.blocks.0.attn.qkv.bias', 'encoder.layers.3.blocks.0.attn.proj.bias', 'encoder.layers.3.blocks.0.norm2.weight', 'encoder.layers.3.blocks.0.norm2.bias', 'encoder.layers.3.blocks.0.mlp.fc1.bias', 'encoder.layers.3.blocks.0.mlp.fc2.bias', 'encoder.layers.3.blocks.1.norm1.weight', 'encoder.layers.3.blocks.1.norm1.bias', 'encoder.layers.3.blocks.1.attn.qkv.bias', 'encoder.layers.3.blocks.1.attn.proj.bias', 'encoder.layers.3.blocks.1.norm2.weight', 'encoder.layers.3.blocks.1.norm2.bias', 'encoder.layers.3.blocks.1.mlp.fc1.bias', 'encoder.layers.3.blocks.1.mlp.fc2.bias', 'encoder.norm.weight', 'encoder.norm.bias', 'decoder.0.bias']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 64): INFO Has decay params: ['encoder.patch_embed.proj.weight', 'encoder.layers.0.blocks.0.attn.relative_position_bias_table', 'encoder.layers.0.blocks.0.attn.qkv.weight', 'encoder.layers.0.blocks.0.attn.proj.weight', 'encoder.layers.0.blocks.0.mlp.fc1.weight', 'encoder.layers.0.blocks.0.mlp.fc2.weight', 'encoder.layers.0.blocks.1.attn.relative_position_bias_table', 'encoder.layers.0.blocks.1.attn.qkv.weight', 'encoder.layers.0.blocks.1.attn.proj.weight', 'encoder.layers.0.blocks.1.mlp.fc1.weight', 'encoder.layers.0.blocks.1.mlp.fc2.weight', 'encoder.layers.0.downsample.reduction.weight', 'encoder.layers.1.blocks.0.attn.relative_position_bias_table', 'encoder.layers.1.blocks.0.attn.qkv.weight', 'encoder.layers.1.blocks.0.attn.proj.weight', 'encoder.layers.1.blocks.0.mlp.fc1.weight', 'encoder.layers.1.blocks.0.mlp.fc2.weight', 'encoder.layers.1.blocks.1.attn.relative_position_bias_table', 'encoder.layers.1.blocks.1.attn.qkv.weight', 'encoder.layers.1.blocks.1.attn.proj.weight', 'encoder.layers.1.blocks.1.mlp.fc1.weight', 'encoder.layers.1.blocks.1.mlp.fc2.weight', 'encoder.layers.1.downsample.reduction.weight', 'encoder.layers.2.blocks.0.attn.relative_position_bias_table', 'encoder.layers.2.blocks.0.attn.qkv.weight', 'encoder.layers.2.blocks.0.attn.proj.weight', 'encoder.layers.2.blocks.0.mlp.fc1.weight', 'encoder.layers.2.blocks.0.mlp.fc2.weight', 'encoder.layers.2.blocks.1.attn.relative_position_bias_table', 'encoder.layers.2.blocks.1.attn.qkv.weight', 'encoder.layers.2.blocks.1.attn.proj.weight', 'encoder.layers.2.blocks.1.mlp.fc1.weight', 'encoder.layers.2.blocks.1.mlp.fc2.weight', 'encoder.layers.2.blocks.2.attn.relative_position_bias_table', 'encoder.layers.2.blocks.2.attn.qkv.weight', 'encoder.layers.2.blocks.2.attn.proj.weight', 'encoder.layers.2.blocks.2.mlp.fc1.weight', 'encoder.layers.2.blocks.2.mlp.fc2.weight', 'encoder.layers.2.blocks.3.attn.relative_position_bias_table', 'encoder.layers.2.blocks.3.attn.qkv.weight', 'encoder.layers.2.blocks.3.attn.proj.weight', 'encoder.layers.2.blocks.3.mlp.fc1.weight', 'encoder.layers.2.blocks.3.mlp.fc2.weight', 'encoder.layers.2.blocks.4.attn.relative_position_bias_table', 'encoder.layers.2.blocks.4.attn.qkv.weight', 'encoder.layers.2.blocks.4.attn.proj.weight', 'encoder.layers.2.blocks.4.mlp.fc1.weight', 'encoder.layers.2.blocks.4.mlp.fc2.weight', 'encoder.layers.2.blocks.5.attn.relative_position_bias_table', 'encoder.layers.2.blocks.5.attn.qkv.weight', 'encoder.layers.2.blocks.5.attn.proj.weight', 'encoder.layers.2.blocks.5.mlp.fc1.weight', 'encoder.layers.2.blocks.5.mlp.fc2.weight', 'encoder.layers.2.blocks.6.attn.relative_position_bias_table', 'encoder.layers.2.blocks.6.attn.qkv.weight', 'encoder.layers.2.blocks.6.attn.proj.weight', 'encoder.layers.2.blocks.6.mlp.fc1.weight', 'encoder.layers.2.blocks.6.mlp.fc2.weight', 'encoder.layers.2.blocks.7.attn.relative_position_bias_table', 'encoder.layers.2.blocks.7.attn.qkv.weight', 'encoder.layers.2.blocks.7.attn.proj.weight', 'encoder.layers.2.blocks.7.mlp.fc1.weight', 'encoder.layers.2.blocks.7.mlp.fc2.weight', 'encoder.layers.2.blocks.8.attn.relative_position_bias_table', 'encoder.layers.2.blocks.8.attn.qkv.weight', 'encoder.layers.2.blocks.8.attn.proj.weight', 'encoder.layers.2.blocks.8.mlp.fc1.weight', 'encoder.layers.2.blocks.8.mlp.fc2.weight', 'encoder.layers.2.blocks.9.attn.relative_position_bias_table', 'encoder.layers.2.blocks.9.attn.qkv.weight', 'encoder.layers.2.blocks.9.attn.proj.weight', 'encoder.layers.2.blocks.9.mlp.fc1.weight', 'encoder.layers.2.blocks.9.mlp.fc2.weight', 'encoder.layers.2.blocks.10.attn.relative_position_bias_table', 'encoder.layers.2.blocks.10.attn.qkv.weight', 'encoder.layers.2.blocks.10.attn.proj.weight', 'encoder.layers.2.blocks.10.mlp.fc1.weight', 'encoder.layers.2.blocks.10.mlp.fc2.weight', 'encoder.layers.2.blocks.11.attn.relative_position_bias_table', 'encoder.layers.2.blocks.11.attn.qkv.weight', 'encoder.layers.2.blocks.11.attn.proj.weight', 'encoder.layers.2.blocks.11.mlp.fc1.weight', 'encoder.layers.2.blocks.11.mlp.fc2.weight', 'encoder.layers.2.blocks.12.attn.relative_position_bias_table', 'encoder.layers.2.blocks.12.attn.qkv.weight', 'encoder.layers.2.blocks.12.attn.proj.weight', 'encoder.layers.2.blocks.12.mlp.fc1.weight', 'encoder.layers.2.blocks.12.mlp.fc2.weight', 'encoder.layers.2.blocks.13.attn.relative_position_bias_table', 'encoder.layers.2.blocks.13.attn.qkv.weight', 'encoder.layers.2.blocks.13.attn.proj.weight', 'encoder.layers.2.blocks.13.mlp.fc1.weight', 'encoder.layers.2.blocks.13.mlp.fc2.weight', 'encoder.layers.2.blocks.14.attn.relative_position_bias_table', 'encoder.layers.2.blocks.14.attn.qkv.weight', 'encoder.layers.2.blocks.14.attn.proj.weight', 'encoder.layers.2.blocks.14.mlp.fc1.weight', 'encoder.layers.2.blocks.14.mlp.fc2.weight', 'encoder.layers.2.blocks.15.attn.relative_position_bias_table', 'encoder.layers.2.blocks.15.attn.qkv.weight', 'encoder.layers.2.blocks.15.attn.proj.weight', 'encoder.layers.2.blocks.15.mlp.fc1.weight', 'encoder.layers.2.blocks.15.mlp.fc2.weight', 'encoder.layers.2.blocks.16.attn.relative_position_bias_table', 'encoder.layers.2.blocks.16.attn.qkv.weight', 'encoder.layers.2.blocks.16.attn.proj.weight', 'encoder.layers.2.blocks.16.mlp.fc1.weight', 'encoder.layers.2.blocks.16.mlp.fc2.weight', 'encoder.layers.2.blocks.17.attn.relative_position_bias_table', 'encoder.layers.2.blocks.17.attn.qkv.weight', 'encoder.layers.2.blocks.17.attn.proj.weight', 'encoder.layers.2.blocks.17.mlp.fc1.weight', 'encoder.layers.2.blocks.17.mlp.fc2.weight', 'encoder.layers.2.downsample.reduction.weight', 'encoder.layers.3.blocks.0.attn.relative_position_bias_table', 'encoder.layers.3.blocks.0.attn.qkv.weight', 'encoder.layers.3.blocks.0.attn.proj.weight', 'encoder.layers.3.blocks.0.mlp.fc1.weight', 'encoder.layers.3.blocks.0.mlp.fc2.weight', 'encoder.layers.3.blocks.1.attn.relative_position_bias_table', 'encoder.layers.3.blocks.1.attn.qkv.weight', 'encoder.layers.3.blocks.1.attn.proj.weight', 'encoder.layers.3.blocks.1.mlp.fc1.weight', 'encoder.layers.3.blocks.1.mlp.fc2.weight', 'decoder.0.weight']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 43): INFO AdamW (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0008
    weight_decay: 0.05

Parameter Group 1
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0008
    weight_decay: 0.0
)
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 83): INFO number of params: 89874104
[2022-02-05 09:22:26 simmim_pretrain] (utils.py 81): INFO All checkpoints founded in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep: []
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 100): INFO no checkpoint found in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep, ignoring auto resume
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 105): INFO Start training
[2022-02-05 09:24:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][0/1251]	eta 1 day, 15:53:49 lr 0.000004	time 114.8121 (114.8121)	loss 0.5543 (0.5543)	grad_norm 0.2902 (0.2902)	mem 17192MB
[2022-02-05 09:45:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][100/1251]	eta 4:24:36 lr 0.000010	time 0.3949 (13.7934)	loss 0.4499 (0.4969)	grad_norm 1.0401 (0.2900)	mem 18238MB
[2022-02-05 10:06:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][200/1251]	eta 3:52:30 lr 0.000017	time 75.5072 (13.2732)	loss 0.3752 (0.4565)	grad_norm 2.8639 (1.6425)	mem 18238MB
[2022-02-05 10:28:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][300/1251]	eta 3:27:27 lr 0.000023	time 0.3941 (13.0894)	loss 0.3553 (0.4264)	grad_norm 2.0591 (2.8358)	mem 18238MB
[2022-02-05 10:48:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][400/1251]	eta 3:02:30 lr 0.000029	time 57.4084 (12.8679)	loss 0.3173 (0.4040)	grad_norm 1.1405 (3.6005)	mem 18238MB
[2022-02-05 11:08:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][500/1251]	eta 2:38:59 lr 0.000036	time 0.3942 (12.7019)	loss 0.3129 (0.3879)	grad_norm 4.7302 (4.0156)	mem 18238MB
[2022-02-05 11:29:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][600/1251]	eta 2:17:56 lr 0.000042	time 86.9880 (12.7132)	loss 0.3042 (0.3741)	grad_norm 2.4576 (4.0197)	mem 18238MB
[2022-02-05 11:49:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][700/1251]	eta 1:55:17 lr 0.000048	time 0.3943 (12.5542)	loss 0.2920 (0.3630)	grad_norm 4.6089 (4.0017)	mem 18239MB
[2022-02-05 12:09:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][800/1251]	eta 1:34:10 lr 0.000055	time 73.9639 (12.5290)	loss 0.2979 (0.3536)	grad_norm 3.4510 (3.9055)	mem 18239MB
[2022-02-05 12:29:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][900/1251]	eta 1:13:00 lr 0.000061	time 0.3981 (12.4787)	loss 0.2693 (0.3459)	grad_norm 1.5775 (3.8091)	mem 18239MB
[2022-02-05 12:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1000/1251]	eta 0:52:00 lr 0.000068	time 18.3918 (12.4334)	loss 0.2786 (0.3394)	grad_norm 1.2491 (3.7356)	mem 18239MB
[2022-02-05 13:10:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1100/1251]	eta 0:31:18 lr 0.000074	time 0.4033 (12.4426)	loss 0.2725 (0.3335)	grad_norm 2.2311 (3.6312)	mem 18239MB
[2022-02-05 13:30:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1200/1251]	eta 0:10:32 lr 0.000080	time 31.6500 (12.4020)	loss 0.2715 (0.3286)	grad_norm 1.2720 (3.5534)	mem 18239MB
[2022-02-05 13:39:44 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 0 training takes 4:17:18
[2022-02-05 13:39:44 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saving......
[2022-02-05 13:39:46 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saved !!!
[2022-02-05 13:39:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][0/1251]	eta 1:01:34 lr 0.000083	time 2.9530 (2.9530)	loss 0.2705 (0.2705)	grad_norm 0.8280 (0.8280)	mem 18239MB
[2022-02-05 13:41:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][100/1251]	eta 0:14:17 lr 0.000090	time 0.6114 (0.7453)	loss 0.2802 (0.2693)	grad_norm 3.6450 (2.3059)	mem 18239MB
[2022-02-05 13:42:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][200/1251]	eta 0:14:40 lr 0.000096	time 0.7879 (0.8375)	loss 0.2727 (0.2691)	grad_norm 2.2279 (2.2994)	mem 18239MB
[2022-02-05 13:44:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][300/1251]	eta 0:13:41 lr 0.000103	time 0.4401 (0.8638)	loss 0.2757 (0.2682)	grad_norm 1.1539 (2.2752)	mem 18239MB
[2022-02-05 13:45:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][400/1251]	eta 0:11:34 lr 0.000109	time 0.4306 (0.8162)	loss 0.2588 (0.2672)	grad_norm 1.2593 (2.2458)	mem 18239MB
[2022-02-05 13:46:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][500/1251]	eta 0:10:38 lr 0.000115	time 0.5900 (0.8503)	loss 0.2552 (0.2668)	grad_norm 1.4727 (2.2056)	mem 18240MB
[2022-02-05 13:47:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][600/1251]	eta 0:08:45 lr 0.000122	time 0.4254 (0.8066)	loss 0.2584 (0.2662)	grad_norm 1.1834 (2.1712)	mem 18240MB
[2022-02-05 13:48:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][700/1251]	eta 0:06:56 lr 0.000128	time 0.4058 (0.7558)	loss 0.2641 (0.2653)	grad_norm 1.1315 (2.1186)	mem 18240MB
[2022-02-05 13:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][800/1251]	eta 0:05:41 lr 0.000134	time 0.4352 (0.7570)	loss 0.2742 (0.2649)	grad_norm 0.7488 (2.0964)	mem 18240MB
[2022-02-05 13:51:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][900/1251]	eta 0:04:35 lr 0.000141	time 0.4130 (0.7842)	loss 0.2476 (0.2644)	grad_norm 0.6401 (2.0539)	mem 18240MB
[2022-02-05 13:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1000/1251]	eta 0:03:08 lr 0.000147	time 0.4153 (0.7508)	loss 0.2717 (0.2639)	grad_norm 2.2334 (2.0098)	mem 18240MB
[2022-02-05 13:53:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1100/1251]	eta 0:01:51 lr 0.000154	time 0.4521 (0.7393)	loss 0.2551 (0.2633)	grad_norm 1.4980 (1.9817)	mem 18240MB
[2022-02-05 13:55:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1200/1251]	eta 0:00:39 lr 0.000160	time 0.4667 (0.7788)	loss 0.2664 (0.2627)	grad_norm 0.7340 (1.9572)	mem 18240MB
[2022-02-05 13:56:06 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 1 training takes 0:16:20
[2022-02-05 13:56:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][0/1251]	eta 1:16:36 lr 0.000163	time 3.6739 (3.6739)	loss 0.2620 (0.2620)	grad_norm 0.9611 (0.9611)	mem 18240MB
[2022-02-05 13:56:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][100/1251]	eta 0:09:22 lr 0.000169	time 0.4276 (0.4883)	loss 0.2562 (0.2552)	grad_norm 0.5311 (1.6903)	mem 18240MB
[2022-02-05 13:58:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][200/1251]	eta 0:11:20 lr 0.000176	time 0.4207 (0.6473)	loss 0.2618 (0.2542)	grad_norm 0.6081 (1.6235)	mem 18240MB
[2022-02-05 13:59:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][300/1251]	eta 0:09:36 lr 0.000182	time 0.4451 (0.6061)	loss 0.2528 (0.2531)	grad_norm 0.4520 (1.6033)	mem 18240MB
[2022-02-05 14:00:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][400/1251]	eta 0:09:29 lr 0.000189	time 0.4445 (0.6689)	loss 0.2413 (0.2525)	grad_norm 0.6562 (1.5654)	mem 18240MB
[2022-02-05 14:01:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][500/1251]	eta 0:08:08 lr 0.000195	time 2.1151 (0.6503)	loss 0.2539 (0.2520)	grad_norm 1.8790 (1.5394)	mem 18240MB
[2022-02-05 14:03:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][600/1251]	eta 0:07:55 lr 0.000201	time 0.5791 (0.7308)	loss 0.2295 (0.2516)	grad_norm 1.3565 (1.5373)	mem 18240MB
[2022-02-05 14:04:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][700/1251]	eta 0:06:53 lr 0.000208	time 0.4240 (0.7508)	loss 0.2464 (0.2511)	grad_norm 0.5189 (1.5236)	mem 18240MB
[2022-02-05 14:06:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][800/1251]	eta 0:05:48 lr 0.000214	time 2.2608 (0.7731)	loss 0.2481 (0.2507)	grad_norm 0.4695 (1.4909)	mem 18240MB
[2022-02-05 14:08:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][900/1251]	eta 0:04:40 lr 0.000220	time 0.4068 (0.7993)	loss 0.2637 (0.2503)	grad_norm 1.5514 (1.4829)	mem 18240MB
[2022-02-05 14:09:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1000/1251]	eta 0:03:23 lr 0.000227	time 1.3535 (0.8115)	loss 0.2443 (0.2498)	grad_norm 0.9744 (1.4653)	mem 18240MB
[2022-02-05 14:11:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1100/1251]	eta 0:02:04 lr 0.000233	time 0.4410 (0.8271)	loss 0.2425 (0.2491)	grad_norm 1.9947 (1.4427)	mem 18240MB
[2022-02-05 14:12:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1200/1251]	eta 0:00:42 lr 0.000239	time 0.4275 (0.8318)	loss 0.2465 (0.2486)	grad_norm 0.6265 (1.4366)	mem 18240MB
[2022-02-05 14:13:21 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 2 training takes 0:17:15
[2022-02-05 14:13:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][0/1251]	eta 1:24:40 lr 0.000243	time 4.0614 (4.0614)	loss 0.2433 (0.2433)	grad_norm 0.7067 (0.7067)	mem 18240MB
[2022-02-05 14:14:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][100/1251]	eta 0:09:27 lr 0.000249	time 0.4092 (0.4935)	loss 0.2476 (0.2430)	grad_norm 1.1159 (1.3134)	mem 18240MB
[2022-02-05 14:15:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][200/1251]	eta 0:10:07 lr 0.000255	time 0.3964 (0.5784)	loss 0.2400 (0.2425)	grad_norm 0.3384 (1.2386)	mem 18240MB
[2022-02-05 14:16:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][300/1251]	eta 0:08:53 lr 0.000262	time 0.4966 (0.5605)	loss 0.2404 (0.2416)	grad_norm 0.3401 (1.1964)	mem 18240MB
[2022-02-05 14:17:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][400/1251]	eta 0:09:07 lr 0.000268	time 0.7116 (0.6430)	loss 0.2314 (0.2411)	grad_norm 1.4900 (1.2040)	mem 18240MB
[2022-02-05 14:19:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][500/1251]	eta 0:09:34 lr 0.000275	time 0.4066 (0.7646)	loss 0.2282 (0.2405)	grad_norm 0.5011 (1.2036)	mem 18240MB
[2022-02-05 14:21:30 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][600/1251]	eta 0:08:49 lr 0.000281	time 0.4160 (0.8126)	loss 0.2414 (0.2404)	grad_norm 0.9795 (1.1974)	mem 18240MB
[2022-02-05 14:23:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][700/1251]	eta 0:07:41 lr 0.000287	time 0.4862 (0.8377)	loss 0.2334 (0.2401)	grad_norm 0.4512 (1.1759)	mem 18240MB
[2022-02-05 14:24:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][800/1251]	eta 0:06:27 lr 0.000294	time 0.5067 (0.8583)	loss 0.2418 (0.2398)	grad_norm 1.2394 (1.1746)	mem 18240MB
[2022-02-05 14:26:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][900/1251]	eta 0:05:03 lr 0.000300	time 0.4366 (0.8635)	loss 0.2361 (0.2394)	grad_norm 0.5397 (1.1654)	mem 18240MB
[2022-02-05 14:27:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1000/1251]	eta 0:03:37 lr 0.000306	time 1.1073 (0.8658)	loss 0.2352 (0.2390)	grad_norm 0.6021 (1.1541)	mem 18241MB
[2022-02-05 14:29:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1100/1251]	eta 0:02:08 lr 0.000313	time 0.5436 (0.8526)	loss 0.2295 (0.2387)	grad_norm 1.1236 (1.1382)	mem 18241MB
[2022-02-05 14:30:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1200/1251]	eta 0:00:44 lr 0.000319	time 0.4830 (0.8667)	loss 0.2486 (0.2385)	grad_norm 0.4466 (1.1277)	mem 18241MB
[2022-02-05 14:31:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 3 training takes 0:18:11
[2022-02-05 14:31:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][0/1251]	eta 1:30:19 lr 0.000322	time 4.3320 (4.3320)	loss 0.2309 (0.2309)	grad_norm 0.3473 (0.3473)	mem 18241MB
[2022-02-05 14:32:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][100/1251]	eta 0:09:42 lr 0.000329	time 0.4199 (0.5059)	loss 0.2313 (0.2347)	grad_norm 0.9780 (0.9537)	mem 18241MB
[2022-02-05 14:33:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][200/1251]	eta 0:10:42 lr 0.000335	time 0.4042 (0.6115)	loss 0.2380 (0.2338)	grad_norm 0.4685 (0.9641)	mem 18241MB
[2022-02-05 14:35:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][300/1251]	eta 0:12:57 lr 0.000341	time 0.4448 (0.8171)	loss 0.2274 (0.2339)	grad_norm 0.5854 (0.9808)	mem 18241MB
[2022-02-05 14:37:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][400/1251]	eta 0:12:15 lr 0.000348	time 0.9284 (0.8642)	loss 0.2300 (0.2342)	grad_norm 0.5273 (0.9884)	mem 18241MB
[2022-02-05 14:38:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][500/1251]	eta 0:10:11 lr 0.000354	time 0.4123 (0.8136)	loss 0.2346 (0.2337)	grad_norm 0.7111 (0.9791)	mem 18241MB
[2022-02-05 14:39:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][600/1251]	eta 0:09:01 lr 0.000361	time 0.4479 (0.8323)	loss 0.2305 (0.2336)	grad_norm 0.7723 (0.9726)	mem 18241MB
[2022-02-05 14:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][700/1251]	eta 0:07:37 lr 0.000367	time 1.2561 (0.8298)	loss 0.2416 (0.2333)	grad_norm 0.7113 (0.9652)	mem 18241MB
[2022-02-05 14:43:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][800/1251]	eta 0:06:29 lr 0.000373	time 2.0054 (0.8637)	loss 0.2229 (0.2332)	grad_norm 0.3053 (0.9582)	mem 18241MB
[2022-02-05 14:44:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][900/1251]	eta 0:05:07 lr 0.000380	time 2.2077 (0.8764)	loss 0.2203 (0.2330)	grad_norm 0.9912 (0.9536)	mem 18241MB
[2022-02-05 14:46:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1000/1251]	eta 0:03:41 lr 0.000386	time 0.4317 (0.8842)	loss 0.2330 (0.2327)	grad_norm 0.4332 (0.9454)	mem 18241MB
[2022-02-05 14:47:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1100/1251]	eta 0:02:09 lr 0.000392	time 1.9930 (0.8594)	loss 0.2376 (0.2325)	grad_norm 0.3494 (0.9425)	mem 18241MB
[2022-02-05 14:49:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1200/1251]	eta 0:00:44 lr 0.000399	time 3.0251 (0.8816)	loss 0.2229 (0.2322)	grad_norm 0.3280 (0.9404)	mem 18241MB
[2022-02-05 14:49:56 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 4 training takes 0:18:23
[2022-02-05 14:50:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][0/1251]	eta 1:20:39 lr 0.000402	time 3.8685 (3.8685)	loss 0.2361 (0.2361)	grad_norm 0.2441 (0.2441)	mem 18241MB
[2022-02-05 14:50:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][100/1251]	eta 0:09:10 lr 0.000408	time 0.4087 (0.4786)	loss 0.2268 (0.2297)	grad_norm 0.4077 (0.9384)	mem 18241MB
[2022-02-05 14:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][200/1251]	eta 0:11:06 lr 0.000415	time 0.4741 (0.6344)	loss 0.2332 (0.2293)	grad_norm 0.6293 (0.8776)	mem 18241MB
[2022-02-05 14:53:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][300/1251]	eta 0:09:43 lr 0.000421	time 0.4483 (0.6141)	loss 0.4291 (0.2770)	grad_norm 0.1236 (nan)	mem 18241MB
[2022-02-05 14:54:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][400/1251]	eta 0:09:25 lr 0.000427	time 0.4158 (0.6646)	loss 0.4575 (0.3163)	grad_norm 0.7630 (nan)	mem 18241MB
[2022-02-05 14:55:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][500/1251]	eta 0:07:59 lr 0.000434	time 0.4047 (0.6385)	loss 0.4399 (0.3400)	grad_norm 1.2759 (nan)	mem 18241MB
[2022-02-05 14:57:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][600/1251]	eta 0:07:54 lr 0.000440	time 0.3981 (0.7282)	loss 0.5130 (0.3663)	grad_norm 0.0916 (nan)	mem 18241MB
[2022-02-05 14:58:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][700/1251]	eta 0:06:42 lr 0.000446	time 0.4456 (0.7310)	loss 0.4537 (0.3812)	grad_norm 0.5641 (nan)	mem 18241MB
[2022-02-05 14:59:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][800/1251]	eta 0:05:14 lr 0.000453	time 0.4507 (0.6973)	loss 0.5098 (0.3919)	grad_norm 0.3081 (nan)	mem 18241MB
[2022-02-05 15:01:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][900/1251]	eta 0:04:22 lr 0.000459	time 0.4445 (0.7472)	loss 0.4913 (0.4046)	grad_norm 0.0084 (nan)	mem 18241MB
[2022-02-05 15:03:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1000/1251]	eta 0:03:16 lr 0.000466	time 3.5568 (0.7843)	loss 0.5040 (0.4146)	grad_norm 0.0242 (nan)	mem 18241MB
[2022-02-05 15:04:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1100/1251]	eta 0:01:58 lr 0.000472	time 0.4314 (0.7874)	loss 0.4915 (0.4226)	grad_norm 0.2421 (nan)	mem 18241MB
[2022-02-05 15:05:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1200/1251]	eta 0:00:38 lr 0.000478	time 0.4317 (0.7606)	loss 0.5508 (0.4266)	grad_norm 0.0683 (nan)	mem 18241MB
[2022-02-05 15:05:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 5 training takes 0:15:37
[2022-02-05 15:05:33 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saving......
[2022-02-05 15:05:36 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saved !!!
[2022-02-05 15:05:40 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][0/1251]	eta 1:23:58 lr 0.000481	time 4.0272 (4.0272)	loss 0.4738 (0.4738)	grad_norm 1.3644 (1.3644)	mem 18241MB
[2022-02-05 15:07:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][100/1251]	eta 0:16:57 lr 0.000488	time 0.4414 (0.8841)	loss 0.4598 (0.4497)	grad_norm 0.1361 (nan)	mem 18241MB
[2022-02-05 15:08:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][200/1251]	eta 0:16:22 lr 0.000494	time 0.4016 (0.9352)	loss 0.5048 (0.4722)	grad_norm 0.0122 (nan)	mem 18241MB
[2022-02-05 15:10:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][300/1251]	eta 0:14:59 lr 0.000501	time 1.8795 (0.9461)	loss 0.4727 (0.4830)	grad_norm 0.0052 (nan)	mem 18241MB
[2022-02-05 15:12:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][400/1251]	eta 0:13:44 lr 0.000507	time 1.0088 (0.9684)	loss 0.4677 (0.4865)	grad_norm 0.0896 (nan)	mem 18241MB
[2022-02-05 15:13:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][500/1251]	eta 0:11:58 lr 0.000513	time 0.4878 (0.9563)	loss 0.5154 (0.4816)	grad_norm 24.0200 (nan)	mem 18241MB
[2022-02-05 15:15:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][600/1251]	eta 0:10:28 lr 0.000520	time 0.4229 (0.9660)	loss 0.4594 (0.4808)	grad_norm 0.9102 (nan)	mem 18243MB
[2022-02-05 15:16:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][700/1251]	eta 0:08:53 lr 0.000526	time 6.5413 (0.9676)	loss 0.4411 (0.4790)	grad_norm 0.7869 (nan)	mem 18243MB
[2022-02-05 15:18:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][800/1251]	eta 0:07:16 lr 0.000532	time 0.4438 (0.9674)	loss 0.4367 (0.4746)	grad_norm 1.4051 (nan)	mem 18243MB
[2022-02-05 15:20:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][900/1251]	eta 0:05:41 lr 0.000539	time 3.8164 (0.9736)	loss 0.4383 (0.4707)	grad_norm 0.0261 (nan)	mem 18243MB
[2022-02-05 15:21:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1000/1251]	eta 0:04:03 lr 0.000545	time 1.6960 (0.9705)	loss 0.4484 (0.4665)	grad_norm 21.2195 (nan)	mem 18243MB
[2022-02-05 15:23:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1100/1251]	eta 0:02:26 lr 0.000552	time 0.4055 (0.9699)	loss 0.4562 (0.4642)	grad_norm 1.7039 (nan)	mem 18243MB
[2022-02-05 15:24:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1200/1251]	eta 0:00:49 lr 0.000558	time 0.6191 (0.9622)	loss 0.4597 (0.4641)	grad_norm 1.6285 (nan)	mem 18243MB
[2022-02-05 15:25:37 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 6 training takes 0:20:01
[2022-02-05 15:25:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][0/1251]	eta 1:24:51 lr 0.000561	time 4.0702 (4.0702)	loss 0.4520 (0.4520)	grad_norm 0.2361 (0.2361)	mem 18243MB
[2022-02-05 15:26:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][100/1251]	eta 0:09:22 lr 0.000567	time 0.4117 (0.4889)	loss 0.4651 (0.4644)	grad_norm 0.0263 (1.5361)	mem 18243MB
[2022-02-05 15:27:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][200/1251]	eta 0:08:14 lr 0.000574	time 0.5824 (0.4702)	loss 0.4427 (0.4608)	grad_norm 0.4953 (2.5894)	mem 18243MB
[2022-02-05 15:27:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][300/1251]	eta 0:07:20 lr 0.000580	time 0.4171 (0.4631)	loss 0.4863 (0.4698)	grad_norm 0.0398 (2.0401)	mem 18243MB
[2022-02-05 15:30:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][400/1251]	eta 0:09:53 lr 0.000587	time 0.4244 (0.6975)	loss 0.4536 (0.4673)	grad_norm 0.3069 (1.8147)	mem 18243MB
[2022-02-05 15:32:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][500/1251]	eta 0:09:52 lr 0.000593	time 0.4264 (0.7884)	loss 0.4325 (0.4624)	grad_norm 0.1347 (2.3873)	mem 18243MB
[2022-02-05 15:33:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][600/1251]	eta 0:08:55 lr 0.000599	time 1.4852 (0.8225)	loss 0.4847 (0.4582)	grad_norm 9.4087 (2.4801)	mem 18243MB
[2022-02-05 15:35:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][700/1251]	eta 0:07:38 lr 0.000606	time 0.4803 (0.8325)	loss 0.4958 (0.4663)	grad_norm 0.0721 (2.2051)	mem 18243MB
[2022-02-05 15:37:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][800/1251]	eta 0:06:24 lr 0.000612	time 0.4407 (0.8519)	loss 0.5036 (0.4695)	grad_norm 0.0279 (2.1894)	mem 18243MB
[2022-02-05 15:38:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][900/1251]	eta 0:05:05 lr 0.000618	time 1.0429 (0.8691)	loss 0.4598 (0.4725)	grad_norm 0.3491 (1.9502)	mem 18243MB
[2022-02-05 15:40:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1000/1251]	eta 0:03:39 lr 0.000625	time 0.4146 (0.8731)	loss 0.4447 (0.4727)	grad_norm 0.0666 (1.8049)	mem 18243MB
[2022-02-05 15:41:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1100/1251]	eta 0:02:13 lr 0.000631	time 0.4089 (0.8846)	loss 0.5773 (0.4706)	grad_norm 533.4438 (2.1825)	mem 18243MB
[2022-02-05 15:43:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1200/1251]	eta 0:00:45 lr 0.000637	time 0.4819 (0.8905)	loss 0.4459 (0.4708)	grad_norm 0.2206 (inf)	mem 18243MB
[2022-02-05 15:44:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 7 training takes 0:18:32
[2022-02-05 15:44:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][0/1251]	eta 1:22:31 lr 0.000641	time 3.9582 (3.9582)	loss 0.4379 (0.4379)	grad_norm 0.1034 (0.1034)	mem 18243MB
[2022-02-05 15:44:59 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][100/1251]	eta 0:09:21 lr 0.000647	time 0.4268 (0.4878)	loss 0.4240 (0.4471)	grad_norm 0.1163 (0.5972)	mem 18243MB
[2022-02-05 15:46:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][200/1251]	eta 0:12:24 lr 0.000653	time 0.4756 (0.7080)	loss 0.5335 (0.4479)	grad_norm 0.6204 (5.4569)	mem 18243MB
[2022-02-05 15:47:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][300/1251]	eta 0:09:50 lr 0.000660	time 0.4158 (0.6213)	loss 0.5053 (0.4720)	grad_norm 0.0163 (3.8024)	mem 18243MB
[2022-02-05 15:48:02 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][400/1251]	eta 0:08:11 lr 0.000666	time 0.4401 (0.5773)	loss 0.4971 (0.4803)	grad_norm 0.0055 (2.8562)	mem 18243MB
[2022-02-05 15:49:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][500/1251]	eta 0:07:22 lr 0.000673	time 1.8764 (0.5890)	loss 0.5002 (0.4848)	grad_norm 0.0067 (2.2872)	mem 18243MB
[2022-02-05 15:51:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][600/1251]	eta 0:07:31 lr 0.000679	time 0.4090 (0.6942)	loss 0.4947 (0.4882)	grad_norm 0.0027 (1.9076)	mem 18243MB
[2022-02-05 15:52:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][700/1251]	eta 0:06:50 lr 0.000685	time 2.9712 (0.7456)	loss 0.5094 (0.4906)	grad_norm 0.0018 (1.6364)	mem 18243MB
[2022-02-05 15:53:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][800/1251]	eta 0:05:29 lr 0.000692	time 0.5231 (0.7305)	loss 0.5050 (0.4927)	grad_norm 0.0023 (1.4328)	mem 18243MB
[2022-02-05 15:55:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][900/1251]	eta 0:04:29 lr 0.000698	time 0.4867 (0.7679)	loss 0.5158 (0.4942)	grad_norm 0.0031 (1.2744)	mem 18243MB
[2022-02-05 15:57:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1000/1251]	eta 0:03:18 lr 0.000704	time 2.5024 (0.7920)	loss 0.5137 (0.4952)	grad_norm 0.0069 (1.1477)	mem 18243MB
[2022-02-05 15:58:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1100/1251]	eta 0:01:56 lr 0.000711	time 0.4010 (0.7713)	loss 0.5179 (0.4962)	grad_norm 0.0027 (1.0440)	mem 18243MB
[2022-02-05 16:00:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1200/1251]	eta 0:00:40 lr 0.000717	time 0.3946 (0.7944)	loss 0.5119 (0.4969)	grad_norm 0.0025 (0.9576)	mem 18243MB
[2022-02-05 16:00:52 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 8 training takes 0:16:41
[2022-02-05 16:00:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][0/1251]	eta 1:20:40 lr 0.000720	time 3.8696 (3.8696)	loss 0.4855 (0.4855)	grad_norm 0.0029 (0.0029)	mem 18243MB
[2022-02-05 16:01:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][100/1251]	eta 0:09:21 lr 0.000727	time 0.4418 (0.4877)	loss 0.5036 (0.5047)	grad_norm 0.0077 (0.0063)	mem 18243MB
[2022-02-05 16:02:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][200/1251]	eta 0:08:13 lr 0.000733	time 0.4427 (0.4691)	loss 0.5000 (0.5045)	grad_norm 0.0043 (0.0063)	mem 18243MB
[2022-02-05 16:03:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][300/1251]	eta 0:07:20 lr 0.000739	time 0.4323 (0.4635)	loss 0.5210 (0.5047)	grad_norm 0.0036 (0.0064)	mem 18243MB
[2022-02-05 16:05:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][400/1251]	eta 0:08:49 lr 0.000746	time 0.4176 (0.6220)	loss 0.4839 (0.5052)	grad_norm 0.0049 (0.0073)	mem 18243MB
[2022-02-05 16:06:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][500/1251]	eta 0:09:07 lr 0.000752	time 0.4086 (0.7284)	loss 0.4946 (0.5054)	grad_norm 0.0034 (0.0072)	mem 18243MB
[2022-02-05 16:08:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][600/1251]	eta 0:08:17 lr 0.000759	time 0.4523 (0.7643)	loss 0.5037 (0.5055)	grad_norm 0.0185 (0.0070)	mem 18243MB
[2022-02-05 16:10:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][700/1251]	eta 0:07:18 lr 0.000765	time 0.4846 (0.7965)	loss 0.5141 (0.5057)	grad_norm 0.0029 (0.0071)	mem 18243MB
[2022-02-05 16:11:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][800/1251]	eta 0:06:11 lr 0.000771	time 0.5237 (0.8228)	loss 0.4947 (0.5055)	grad_norm 0.0037 (0.0071)	mem 18243MB
[2022-02-05 16:13:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][900/1251]	eta 0:04:50 lr 0.000778	time 0.4529 (0.8269)	loss 0.5303 (0.5055)	grad_norm 0.0031 (0.0073)	mem 18243MB
[2022-02-05 16:14:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1000/1251]	eta 0:03:28 lr 0.000784	time 5.7999 (0.8313)	loss 0.5151 (0.5056)	grad_norm 0.0050 (0.0074)	mem 18243MB
[2022-02-05 16:16:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1100/1251]	eta 0:02:07 lr 0.000790	time 1.1566 (0.8422)	loss 0.4930 (0.5055)	grad_norm 0.0044 (0.0074)	mem 18243MB
[2022-02-05 16:17:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1200/1251]	eta 0:00:43 lr 0.000797	time 0.5183 (0.8531)	loss 0.4922 (0.5056)	grad_norm 0.0028 (0.0076)	mem 18243MB
[2022-02-05 16:18:39 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 9 training takes 0:17:46
[2022-02-05 16:18:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][0/1251]	eta 1:24:03 lr 0.000800	time 4.0314 (4.0314)	loss 0.5028 (0.5028)	grad_norm 0.0046 (0.0046)	mem 18243MB
[2022-02-05 16:19:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][100/1251]	eta 0:09:18 lr 0.000781	time 0.4616 (0.4852)	loss 0.5053 (0.5051)	grad_norm 0.0029 (0.0082)	mem 18243MB
[2022-02-05 16:20:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][200/1251]	eta 0:10:45 lr 0.000781	time 1.2564 (0.6146)	loss 0.5208 (0.5047)	grad_norm 0.0030 (0.0077)	mem 18243MB
[2022-02-05 16:22:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][300/1251]	eta 0:13:19 lr 0.000781	time 0.4131 (0.8408)	loss 0.5163 (0.5054)	grad_norm 0.0067 (0.0078)	mem 18243MB
[2022-02-05 16:24:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][400/1251]	eta 0:12:00 lr 0.000780	time 0.4386 (0.8464)	loss 0.5159 (0.5057)	grad_norm 0.0075 (0.0083)	mem 18243MB
[2022-02-05 16:25:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][500/1251]	eta 0:09:37 lr 0.000780	time 0.4158 (0.7694)	loss 0.5114 (0.5056)	grad_norm 0.0055 (0.0083)	mem 18243MB
[2022-02-05 16:26:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][600/1251]	eta 0:08:21 lr 0.000780	time 0.4583 (0.7696)	loss 0.5191 (0.5058)	grad_norm 0.0064 (0.0083)	mem 18243MB
[2022-02-05 16:27:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][700/1251]	eta 0:06:53 lr 0.000779	time 0.4195 (0.7505)	loss 0.4864 (0.5056)	grad_norm 0.0081 (0.0085)	mem 18243MB
[2022-02-05 16:29:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][800/1251]	eta 0:06:11 lr 0.000779	time 1.3727 (0.8233)	loss 0.4949 (0.5058)	grad_norm 0.0031 (0.0089)	mem 18243MB
[2022-02-05 16:31:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][900/1251]	eta 0:04:56 lr 0.000779	time 8.4577 (0.8454)	loss 0.5168 (0.5056)	grad_norm 0.0051 (0.0087)	mem 18243MB
[2022-02-05 16:32:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1000/1251]	eta 0:03:30 lr 0.000778	time 0.4084 (0.8387)	loss 0.5202 (0.5056)	grad_norm 0.0031 (0.0088)	mem 18243MB
[2022-02-05 16:34:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1100/1251]	eta 0:02:08 lr 0.000778	time 0.4177 (0.8530)	loss 0.5111 (0.5056)	grad_norm 0.0056 (0.0087)	mem 18243MB
[2022-02-05 16:35:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1200/1251]	eta 0:00:44 lr 0.000778	time 0.4621 (0.8640)	loss 0.4990 (0.5055)	grad_norm 0.0050 (0.0088)	mem 18243MB
[2022-02-05 16:36:41 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 10 training takes 0:18:02
[2022-02-05 16:36:41 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saving......
[2022-02-05 16:36:44 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saved !!!
[2022-02-05 16:36:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][0/1251]	eta 1:13:03 lr 0.000778	time 3.5042 (3.5042)	loss 0.5118 (0.5118)	grad_norm 0.0109 (0.0109)	mem 18243MB
[2022-02-05 16:37:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][100/1251]	eta 0:12:09 lr 0.000777	time 0.5750 (0.6336)	loss 0.5325 (0.5074)	grad_norm 0.0052 (0.0076)	mem 18243MB
[2022-02-05 16:39:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][200/1251]	eta 0:14:14 lr 0.000777	time 0.4904 (0.8130)	loss 0.5064 (0.5059)	grad_norm 0.0036 (0.0104)	mem 18243MB
[2022-02-05 16:41:03 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][300/1251]	eta 0:13:38 lr 0.000777	time 0.5833 (0.8607)	loss 0.4996 (0.5055)	grad_norm 0.0044 (0.0099)	mem 18243MB
[2022-02-05 16:42:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][400/1251]	eta 0:12:40 lr 0.000776	time 1.4644 (0.8932)	loss 0.5054 (0.5057)	grad_norm 0.0049 (0.0093)	mem 18243MB
[2022-02-05 16:43:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][500/1251]	eta 0:10:45 lr 0.000776	time 0.4532 (0.8590)	loss 0.4886 (0.5055)	grad_norm 0.0048 (0.0096)	mem 18243MB
[2022-02-05 16:45:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][600/1251]	eta 0:09:28 lr 0.000776	time 0.4424 (0.8730)	loss 0.5031 (0.5053)	grad_norm 0.0035 (0.0096)	mem 18243MB
[2022-02-05 16:46:58 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][700/1251]	eta 0:08:02 lr 0.000775	time 0.4489 (0.8760)	loss 0.5361 (0.5057)	grad_norm 0.0055 (0.0094)	mem 18243MB
[2022-02-05 16:47:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][800/1251]	eta 0:06:11 lr 0.000775	time 0.4341 (0.8229)	loss 0.4948 (0.5057)	grad_norm 0.0069 (0.0095)	mem 18243MB
[2022-02-05 16:49:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][900/1251]	eta 0:04:55 lr 0.000775	time 0.4356 (0.8426)	loss 0.5100 (0.5056)	grad_norm 0.0119 (0.0096)	mem 18243MB
[2022-02-05 16:50:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1000/1251]	eta 0:03:23 lr 0.000774	time 0.6975 (0.8099)	loss 0.5050 (0.5058)	grad_norm 0.0047 (0.0097)	mem 18243MB
[2022-02-05 16:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1100/1251]	eta 0:02:06 lr 0.000774	time 1.0458 (0.8359)	loss 0.5302 (0.5060)	grad_norm 0.0044 (0.0095)	mem 18243MB
[2022-02-05 16:53:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1200/1251]	eta 0:00:43 lr 0.000773	time 0.4221 (0.8547)	loss 0.5000 (0.5061)	grad_norm 0.0074 (0.0096)	mem 18243MB
[2022-02-05 16:54:34 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 11 training takes 0:17:50
[2022-02-05 16:54:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][0/1251]	eta 1:16:28 lr 0.000773	time 3.6676 (3.6676)	loss 0.5215 (0.5215)	grad_norm 0.0116 (0.0116)	mem 18243MB
[2022-02-05 16:55:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][100/1251]	eta 0:14:50 lr 0.000773	time 0.7810 (0.7737)	loss 0.5164 (0.5068)	grad_norm 0.0069 (0.0073)	mem 18243MB
[2022-02-05 16:57:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][200/1251]	eta 0:14:47 lr 0.000773	time 1.7244 (0.8448)	loss 0.5187 (0.5058)	grad_norm 0.0096 (0.0076)	mem 18243MB
[2022-02-05 16:58:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][300/1251]	eta 0:13:41 lr 0.000772	time 0.4478 (0.8634)	loss 0.4969 (0.5061)	grad_norm 0.0054 (0.0086)	mem 18243MB
[2022-02-05 17:00:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][400/1251]	eta 0:11:49 lr 0.000772	time 0.6186 (0.8340)	loss 0.5156 (0.5064)	grad_norm 0.0082 (0.0085)	mem 18243MB
[2022-02-05 17:01:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][500/1251]	eta 0:11:01 lr 0.000772	time 0.4288 (0.8806)	loss 0.5079 (0.5062)	grad_norm 0.0048 (0.0085)	mem 18243MB
[2022-02-05 17:03:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][600/1251]	eta 0:09:45 lr 0.000771	time 0.4818 (0.8992)	loss 0.4947 (0.5059)	grad_norm 0.0039 (0.0084)	mem 18243MB
[2022-02-05 17:05:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][700/1251]	eta 0:08:20 lr 0.000771	time 0.4087 (0.9092)	loss 0.4817 (0.5060)	grad_norm 0.0042 (0.0083)	mem 18243MB
[2022-02-05 17:06:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][800/1251]	eta 0:06:54 lr 0.000770	time 3.8632 (0.9187)	loss 0.5162 (0.5057)	grad_norm 0.0056 (0.0085)	mem 18243MB
[2022-02-05 17:08:22 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][900/1251]	eta 0:05:22 lr 0.000770	time 0.6304 (0.9192)	loss 0.5097 (0.5058)	grad_norm 0.0882 (0.0122)	mem 18243MB
[2022-02-05 17:09:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1000/1251]	eta 0:03:50 lr 0.000770	time 0.4901 (0.9200)	loss 0.5016 (0.5059)	grad_norm 0.0458 (0.0284)	mem 18243MB
[2022-02-05 17:11:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1100/1251]	eta 0:02:20 lr 0.000769	time 4.5322 (0.9288)	loss 0.4981 (0.5058)	grad_norm 0.0095 (0.0943)	mem 18243MB
[2022-02-05 17:13:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1200/1251]	eta 0:00:47 lr 0.000769	time 0.6195 (0.9275)	loss 0.5148 (0.5059)	grad_norm 0.0085 (0.0929)	mem 18243MB
[2022-02-05 17:13:54 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 12 training takes 0:19:19
[2022-02-05 17:13:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][0/1251]	eta 1:16:38 lr 0.000769	time 3.6756 (3.6756)	loss 0.5141 (0.5141)	grad_norm 0.0134 (0.0134)	mem 18243MB
[2022-02-05 17:14:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][100/1251]	eta 0:09:08 lr 0.000768	time 0.4265 (0.4769)	loss 0.4834 (0.5059)	grad_norm 0.0057 (0.0109)	mem 18243MB
[2022-02-05 17:16:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][200/1251]	eta 0:11:38 lr 0.000768	time 0.4268 (0.6645)	loss 0.5068 (0.5065)	grad_norm 0.0053 (0.0105)	mem 18243MB
[2022-02-05 17:17:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][300/1251]	eta 0:12:06 lr 0.000768	time 0.5157 (0.7636)	loss 0.4987 (0.5058)	grad_norm 0.0068 (0.0111)	mem 18243MB
[2022-02-05 17:19:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][400/1251]	eta 0:11:13 lr 0.000767	time 0.4169 (0.7910)	loss 0.5224 (0.5060)	grad_norm 0.0052 (0.0109)	mem 18243MB
[2022-02-05 17:20:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][500/1251]	eta 0:09:47 lr 0.000767	time 0.5883 (0.7817)	loss 0.4828 (0.5060)	grad_norm 0.0175 (0.0106)	mem 18243MB
[2022-02-05 17:21:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][600/1251]	eta 0:08:12 lr 0.000766	time 3.2032 (0.7566)	loss 0.5117 (0.5061)	grad_norm 0.0134 (0.0104)	mem 18243MB
[2022-02-05 17:23:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][700/1251]	eta 0:07:20 lr 0.000766	time 0.4028 (0.8003)	loss 0.5239 (0.5060)	grad_norm 0.0044 (0.0108)	mem 18243MB
[2022-02-05 17:25:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][800/1251]	eta 0:06:15 lr 0.000766	time 0.3985 (0.8326)	loss 0.5014 (0.5062)	grad_norm 0.0114 (0.0110)	mem 18243MB
[2022-02-05 17:26:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][900/1251]	eta 0:04:58 lr 0.000765	time 0.6148 (0.8492)	loss 0.5003 (0.5062)	grad_norm 0.0027 (0.0109)	mem 18243MB
[2022-02-05 17:28:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1000/1251]	eta 0:03:36 lr 0.000765	time 0.4273 (0.8616)	loss 0.4993 (0.5062)	grad_norm 0.0140 (0.0108)	mem 18243MB
[2022-02-05 17:30:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1100/1251]	eta 0:02:12 lr 0.000764	time 0.4532 (0.8779)	loss 0.5004 (0.5061)	grad_norm 0.0107 (0.0109)	mem 18243MB
[2022-02-05 17:31:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1200/1251]	eta 0:00:44 lr 0.000764	time 0.5378 (0.8783)	loss 0.4941 (0.5061)	grad_norm 0.0027 (0.0106)	mem 18243MB
[2022-02-05 17:32:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 13 training takes 0:18:16
[2022-02-05 17:32:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][0/1251]	eta 1:28:50 lr 0.000764	time 4.2612 (4.2612)	loss 0.4906 (0.4906)	grad_norm 0.0057 (0.0057)	mem 18243MB
[2022-02-05 17:33:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][100/1251]	eta 0:09:30 lr 0.000763	time 0.4356 (0.4958)	loss 0.4973 (0.5057)	grad_norm 0.0031 (0.0087)	mem 18243MB
[2022-02-05 17:34:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][200/1251]	eta 0:11:46 lr 0.000763	time 0.4247 (0.6724)	loss 0.5087 (0.5059)	grad_norm 0.0055 (0.0088)	mem 18243MB
[2022-02-05 17:36:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][300/1251]	eta 0:13:02 lr 0.000763	time 0.5249 (0.8233)	loss 0.5212 (0.5071)	grad_norm 0.0065 (0.0095)	mem 18243MB
[2022-02-05 17:37:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][400/1251]	eta 0:12:15 lr 0.000762	time 0.6521 (0.8638)	loss 0.4970 (0.5065)	grad_norm 0.0145 (0.0094)	mem 18243MB
[2022-02-05 17:39:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][500/1251]	eta 0:11:08 lr 0.000762	time 1.7114 (0.8902)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][600/1251]	eta 0:09:49 lr 0.000761	time 3.6215 (0.9061)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:42:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][700/1251]	eta 0:08:23 lr 0.000761	time 0.5390 (0.9145)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:44:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][800/1251]	eta 0:06:54 lr 0.000761	time 0.8262 (0.9181)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:46:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][900/1251]	eta 0:05:25 lr 0.000760	time 0.7292 (0.9266)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:47:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1000/1251]	eta 0:03:53 lr 0.000760	time 0.5330 (0.9303)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:49:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1100/1251]	eta 0:02:20 lr 0.000759	time 1.6054 (0.9295)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:50:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1200/1251]	eta 0:00:47 lr 0.000759	time 0.4148 (0.9302)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:51:29 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 14 training takes 0:19:19
[2022-02-05 17:51:33 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][0/1251]	eta 1:25:26 lr 0.000759	time 4.0980 (4.0980)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][100/1251]	eta 0:09:03 lr 0.000758	time 0.4545 (0.4721)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:53:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][200/1251]	eta 0:11:56 lr 0.000758	time 0.4286 (0.6820)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:54:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][300/1251]	eta 0:10:23 lr 0.000757	time 0.4350 (0.6558)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB

I did not modify any of the configs except for specifying --accumulation-steps 2 from the command line to fit in memory on an 8-GPU machine. I'm using CUDA 11.1, CUDNN 8 and Pytorch 1.9.0 (which is sufficiently new). Could you help take a look what went wrong and how to fix this?

Thank you!

Questions about AvgDist

Hi @caoyue10 @ancientmooner Thanks a lot for your wonderful job! And I have a question about the AvgDist metric in the paper. In the experiment, you mention the "AvgDist" which is a newly proposed metric. However, I haven't found the reference of the metric in the paper or implemention in the code. So could you please give me any suggestion about this metric?

Again, thanks a lot for your contributions~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.