sense-x / uniformer Goto Github PK
View Code? Open in Web Editor NEW[ICLR2022] official implementation of UniFormer
License: Apache License 2.0
[ICLR2022] official implementation of UniFormer
License: Apache License 2.0
I run the test script of uniformer small plus model https://github.com/Sense-X/UniFormer/blob/main/image_classification/exp/uniformer_small_plus/test.sh
The acc@1 is 68.394 and lower than that in paper, 83.4.
the weight I used is
In https://github.com/Sense-X/UniFormer/blob/main/video_classification/exp/uniformer_b32x4_k600/config.yaml
MODEL:
NUM_CLASSES: 400
ARCH: uniformer
MODEL_NAME: Uniformer
LOSS_FUNC: soft_cross_entropy
DROPOUT_RATE: 0.5
It seems that the NUM_CLASSES should be 600 instead of 400. Other k600 config files have the same problem. Or are the config files themselves wrong?
There are some problems when I use the pre-trained model uniformer_base_in1k.pth as my backbone?
missing keys: ['patch_embed1.norm.weight', 'patch_embed1.norm.bias', 'patch_embed1.proj.weight', 'patch_embed1.proj.bias', 'patch_embed2.norm.weight', .....
unexpected keys: ['model']
Hello,
I have three operational questions about video classification in uniformer:
1、I find you write ''resize the video to the short edge size of 320'' in DATASET.md of video_classification, so I wonder whether this operation will be achieved before the program runs. If so, could you offer me the code of 'resize'?
2、I see you write ''freeze BN in Backbone'' in README.md, are you also forget to perform this operation in video classification? And if it's necessary to perform this operation in video classification?
3、The Kinetics 400 dataset I downloaded has timestamps between the file name and '.mp4', does it affect the code running?
Thanks for your instruction if you answer my questions.
I met this problem after installing mmdet2.11 in object_detection and mmcv1.3.14. It seems something wrong of the installation and multi-gpu training. Do u have any idea whats wrong with it?
Thank u for ur efforts.
Dear author:
I am looking forward to fine-tuning the Uniformer model on my dataset of image classification task. However, I met several questions when loading proposed checkpoint. Here I use ImageNet-1K pretrained (224x224) UniFormer-B model, and set args.finetune = 'uniformer_base_in1k.pth'
.
main.py
, there's an error:UniFormer/image_classification/main.py
Lines 275 to 293 in e802470
File "/path-to-project/main.py", line 276, in main
pos_embed_checkpoint = checkpoint_model['pos_embed']
KeyError: 'pos_embed'
Where's my mistake of loading the model?
I want to convert the pytorch model to the onnx model on the cpu, but there is no relevant code in the source code. Can you upload pytorch2onnx.py in the video classification?
Hi, may I ask any difference between lucidrains's implementation and this repo?
https://github.com/lucidrains/uniformer-pytorch
Thanks a lot!
你好,我在你的日志里面发现使用的模型叫Container3D_SA_FP32(base的164,168等使用的都是Container3D_SA_FP32),也就是说你的结果都是使用的这个模型。但你提供的所有pytorch模型都是UniFormer,那么Container3D_SA_FP32和UniFormer是一样的吗?我很困惑,因为我使用UniFormer base 16*4测试得到的top1只有0.03
Hi, thank you for the contribution to this super-rad work!
Wonder that, in your experiments, whether the backbone models used in Detection task with stage-3 window/hybrid SABlock (S-h14, B-h14) are needed to be pretrained on imagenet?
If so, could these backbones with window/hybrid SABlock be released? And if not, are the weights loaded directly from the regular model with global window in stage-3?
Thanks!
Hi, would you be interested in sharing a web demo on Huggingface Spaces for UniFormer?
It would make this model more accessible as it would allow people to try out the model directly from the browser. Some other recent machine learning model repos have set up Spaces for easy access:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/akhaliq/BLIP
github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore
Spaces is completely free, and I can help setup a Gradio Space. Here are some getting started instructions if you'd prefer to do it yourself: https://huggingface.co/blog/gradio-spaces
UNIFORMER:
EMBED_DIM: [64, 128, 320, 512]
DEPTH: [3, 4, 8, 3]
HEAD_DIM: 64
MLP_RATIO: 4
DROPOUT_RATE: 0
ATTENTION_DROPOUT_RATE: 0
DROP_DEPTH_RATE: 0.1
SPLIT: False
PRETRAIN_NAME: 'uniformer_small_in1k'
The pretrained model is 'uniformer_small_in1k', however the corresponding model in this link is 'uniformer_small_k400_16x8.pth'.
Would you kindly tell me where is 'uniformer_small_in1k'?
Thanks and best regards!
Hi, thank you so much for your work. I am now faced with a 3D input task, so I would like to refer to your work. Different from the video, the three dimensions of my data are equally important. Therefore, I understand that the convolution kernel parameter of patch embedding in the first stage should be changed to 4x4x4 with stride 4x4x4, right? However, WHEN I realized this problem, I have trained a batch of models, and the convolution kernel of these models is still 3x4x4, and the step size is 4x4x4. Here I want to ask you what impact will be produced by the difference of this parameter?
How does the video transformer pre train on image1k? Isn't the input different?
for example : 3D patch-embedding in video, 2D patch-embedding in image?
Hi,
When I reading the video uniformer code, there is a SplitSABlock , and the default setting of enabling it is false .
However, I can't find this split block in "UniFormer: Unifying Convolution and
Self-attention for Visual Recognition" . Would you kindly tell me the reference of this split block?
Thank you and best regards!
Thank u for ur efforts!
The pictures in ur paper are beautiful. I wonder did u use attention rollout to do the visualization as other transformer models? I tried to visualize it by just calculate the mean of the feature. And the result is not as expected.
I would appreciate it to hear ur recommendation.
Where is the uniformer module referenced? Is the uniformer a plug and play module? For example, I have an input of (64,3,25,25). After inputting the uniformer module, will the output also be (64,3,25,25)?
Hi,
Thanks for sharing this excellent paper and the code. Could you please give a detailed explanation of why the PWConv-DWConv-PWConv does the same thing as the Local MHRA? I understand that the 'spatiotemporal affinity' is content-dependent (similar to ViT), but the Conv layers in code are content-independent. Is the set of Conv layers the approximation of the Local MHRA? For the temporal dimension, how do you handle them before feeding the tensor to the network, concatenating all frames along the RGB channels (the Conv layers in the code are all 2d Conv)?
Thanks in advance!
Best regards
Dear author:
When I training the Uniformer model with 8 GPUs, I start the code with the following run.sh
:
work_path=$(dirname $0)
PYTHONPATH=$PYTHONPATH:../../ \
python -m torch.distributed.launch --nproc_per_node=8 --master_port=22335 --use_env main.py \
--model uniformer_base \
--batch-size 64 \
--num_workers 8 \
--drop-path 0.3 \
--epoch 300 \
--dist-eval \
--output_dir ${work_path}/ckpt \
2>&1 | tee -a ${work_path}/log.txt
And the logs are (I have deleted the display of model details):
/home/data/user/local/anaconda3/envs/uniformer/lib/python3.9/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : main.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 8
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:22335
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}
INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_yejsfquq/none_6nw5du9x
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/home/data/user/local/anaconda3/envs/uniformer/lib/python3.9/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=22335
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/7/error.json
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
| distributed init (rank 3): env://
| distributed init (rank 6): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
| distributed init (rank 7): env://
| distributed init (rank 4): env://
[W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 4 using best-guess GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 5 using best-guess GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 3 using best-guess GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 2 using best-guess GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 6 using best-guess GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 7 using best-guess GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
Warning: Enabling distributed evaluation with an eval dataset not divisible by process number. This will slightly alter validation results as extra duplicate entries are added to achieve equal num of samples per-process.
Creating model: uniformer_base
number of params: 49468752
Start training for 300 epochs
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
Epoch: [0] [0/9] eta: 0:01:24 lr: 0.000001 loss: 6.0915 (6.0915) time: 9.3436 data: 3.3803 max mem: 11905
Epoch: [0] [8/9] eta: 0:00:02 lr: 0.000001 loss: 6.0828 (6.0922) time: 2.0037 data: 0.3758 max mem: 12141
Epoch: [0] Total time: 0:00:18 (2.0372 s / it)
Averaged stats: lr: 0.000001 loss: 6.0828 (6.0883)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
The displayed logs are very messy and maddening, and I am totally no clue whether the code runs correctly. These warnings only occur when distributed training. Have you ever met this situation? I will appreciate it if you can give me some advice.
Hi! Thanks for sharing the code. It seems that Uniformer uses DETR-like aug for MS training. (
)when I run the script ' ./exp/mask_rcnn_1x_hybrid_small/run.sh' of uniformer, I get a bug as follows:
Traceback (most recent call last):
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
[56/1124]
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
if cached_x.grad_fn.next_functions[1][0].variable is not x:
IndexError: tuple index out of range
return forward_call(*input, **kwargs)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(*new_args, **new_kwargs)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/detectors/base.py", line 181, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/detectors/two_stage.py", line 156, in forward_train
proposal_cfg=proposal_cfg)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/dense_heads/base_dense_head.py", line 49, in forward_train
outs = self(x)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/dense_heads/anchor_head.py", line 143, in forward
return multi_apply(self.forward_single, feats)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/core/utils/misc.py", line 29, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/dense_heads/rpn_head.py", line 43, in forward_single
x = self.rpn_conv(x)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
self.padding, self.dilation, self.groups)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/apex/amp/wrap.py", line 21, in wrapper
args[i] = utils.cached_cast(cast_fn, args[i], handle.cache)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/apex/amp/utils.py", line 97, in cached_cast
if cached_x.grad_fn.next_functions[1][0].variable is not x:
IndexError: tuple index out of range
I use the uniformer as the backbone and I don't change the code. Please help me!
I'm hesitating asking this basic question, but what's the correct way using the token label models for basic image classification? I followed your instruction in huggingface.co uniformer_image, but the result seems not right:
# cd image_classification
import torch
import torch.nn.functional as F
import torchvision.transforms as T
# from models import uniformer as torch_uniformer
from token_labeling.tlt.models import uniformer as torch_uniformer
def inference(model, image):
image_transform = T.Compose([T.Resize(224), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
image = image_transform(image)
image = image.unsqueeze(0)
prediction = model(image)
prediction = F.softmax(prediction, dim=1).flatten()
return prediction
model = torch_uniformer.uniformer_small()
weights = torch.load('uniformer_small_tl_224.pth')
model.load_state_dict(weights['model'] if "model" in weights else weights, strict=True)
model = model.eval()
# Run prediction
from skimage.data import chelsea
from PIL import Image
imm = Image.fromarray(chelsea()) # Chelsea the cat
out = inference(model, imm)
print(out.argsort()[-5:])
# tensor([224, 196, 223, 410, 599])
# Decode, any method just getting the label output
from tensorflow import keras
keras.applications.imagenet_utils.decode_predictions(out.detach().numpy()[None])
# [[('n03530642', 'honeycomb', 0.55872005),
# ('n02727426', 'apiary', 0.011748945),
# ('n02104365', 'schipperke', 0.0044726683),
# ('n02097047', 'miniature_schnauzer', 0.003748106),
# ('n02105056', 'groenendael', 0.0033460185)]]
The correct output like using non-token-label uniformer_small
is like:
from models import uniformer as torch_uniformer
...
weights = torch.load('uniformer_small_in1k.pth')
...
print(out.argsort()[-5:])
# tensor([284, 287, 281, 282, 285])
...
keras.applications.imagenet_utils.decode_predictions(out.detach().numpy()[None])
# [[('n02124075', 'Egyptian_cat', 0.7029501),
# ('n02123159', 'tiger_cat', 0.08705652),
# ('n02123045', 'tabby', 0.056305394),
# ('n02127052', 'lynx', 0.0035495553),
# ('n02123597', 'Siamese_cat', 0.0008160392)]]
Besides, the imagenet evaluation accuracy in my testing for non-token-label uniformer_small
is top1: 0.82986 top5: 0.96358
, and token-label one using same method is top1: 0.00136 top5: 0.00622
. I think it's something wrong in my usage.
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh ./exp/cascade_mask_rcnn_3x_ms_hybrid_base/config.py 4 --cfg-options model.pretrained='/home/lbc/UniFormer/object_detection/pretrained/cascade_mask_rcnn_3x_ms_hybrid_base.pth'
`base = [
'../../configs/base/models/cascade_mask_rcnn_uniformer_fpn.py',
'../../configs/base/datasets/coco_instance.py',
'../../configs/base/schedules/schedule_1x.py',
'../../configs/base/default_runtime.py'
]
model = dict(
backbone=dict(
embed_dim=[64, 128, 320, 512],
layers=[5, 8, 20, 7],
head_dim=64,
drop_path_rate=0.4,
use_checkpoint=True,
checkpoint_num=[0, 0, 20, 0],
windows=False,
hybrid=True,
window_size=14
),
neck=dict(in_channels=[64, 128, 320, 512]),
roi_head=dict(
bbox_head=[
dict(
type='ConvFCBBoxHead',
num_shared_convs=4,
num_shared_fcs=1,
in_channels=256,
conv_out_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=54,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
reg_decoded_bbox=True,
norm_cfg=dict(type='SyncBN', requires_grad=True),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
dict(
type='ConvFCBBoxHead',
num_shared_convs=4,
num_shared_fcs=1,
in_channels=256,
conv_out_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=54,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.05, 0.05, 0.1, 0.1]),
reg_class_agnostic=False,
reg_decoded_bbox=True,
norm_cfg=dict(type='SyncBN', requires_grad=True),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
dict(
type='ConvFCBBoxHead',
num_shared_convs=4,
num_shared_fcs=1,
in_channels=256,
conv_out_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=54,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.033, 0.033, 0.067, 0.067]),
reg_class_agnostic=False,
reg_decoded_bbox=True,
norm_cfg=dict(type='SyncBN', requires_grad=True),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=10.0))
]))
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='AutoAugment',
policies=[
[
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
multiscale_mode='value',
keep_ratio=True)
],
[
dict(type='Resize',
img_scale=[(400, 1333), (500, 1333), (600, 1333)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomCrop',
crop_type='absolute_range',
crop_size=(384, 600),
allow_negative_crop=True),
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
override=True,
keep_ratio=True)
]
]),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
data = dict(train=dict(pipeline=train_pipeline))
optimizer = dict(delete=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05,
paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
'relative_position_bias_table': dict(decay_mult=0.),
'norm': dict(decay_mult=0.)}))
lr_config = dict(step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
fp16 = None
optimizer_config = dict(
type="DistOptimizerHook",
update_interval=1,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True,
)
Using /home/lbc/miniconda3/envs/openmmlab_new/lib/python3.7/site-packages
Finished processing dependencies for mmdet==2.11.0
`
I use 8 GPUs of 32G and set the batch size to 16, but there is always an out-of-memory error.
Hello,
I'm observing drop in performance between test and validation using Uniformer S16 on SSv2
The label mapping provided in somesomev2_rgb_test_split.txt lists label 0 for all instances.
The pretrained checkpoint (trained on ssv2 + k400) shows very poor performance on the label mapping provided across test/train/valid for ssv2. ie. the model checkpoints seem to be trained on a different label mapping. Do you happen to have these available?
I appreciate help tracking these down.
Thanks
file not found error while trying to run repo as it is -- 'configs\Kinetics\SLOWFAST_4x16_R50.yaml'
Dear author:
I want to fine-tune your model on my dataset of video classification task using provided UniFormer-B model. However, I met some problems.
When I start training, the top1 error is always 100
I have successfully fine-tune my dataset using code facebook SlowFast with different models before.
Before I training UniFormer, I have:
Have I possibly made any mistakes?
Besides that, I notice that SlowFast code using slowfast/utils/checkpoint.py to load pretrain model, while the Uniformer code add a new function in slowfast/model/uniformer.py and copy the origin slowfast/utils/checkpoint.py to slowfast/utils/checkpoint_amp.py. What is the difference between loading pretrain model from the these two functions?
您好:
我想用UniFormer模型在我的数据集上fine-tune,我的任务是video classification,用的是文档中提供的UniFormer-B model,但是遇到了一些问题。
我训练的时候,top-1 error一直是100,如上图。
我之前用facebook SlowFast 的code在不同的模型上面都可以成功训练。训Uniformer的时候,我把我之前project的dataset文件夹直接覆盖了过来,然后关掉了AUG和MIXUP。不知道是哪里出了问题?
此外,SlowFast的code是在slowfast/utils/checkpoint.py 里面读的预训练模型,但我看您的code是在 slowfast/model/uniformer.py 重新实现了一个函数来读,并且把原来的slowfast/utils/checkpoint.py重命名为slowfast/utils/checkpoint_amp.py(但我好像没看到两者之间的差别)。我想了解一下用uniformer.py和checkpoint_amp.py加载预训练模型有区别吗?
请问后面会提供21k的预训练模型吗0.0
By default I gotta download ADEChallengeData2016 for testing the pre-trained model. Is there a easier way without set up a dataset, just use one image like the demo/inference_demo.ipynb do? when I use the inference_demo and the keyError is ""EncoderDecoder: 'UniFormer is not in the models registry'""
Hi, great work on the demo for uniformer https://huggingface.co/spaces/Andy1621/uniformer_image_demo, would you be interested in making a gradio blocks version of this for the blocks event this month: https://huggingface.co/Gradio-Blocks or for another model, thanks!
Thanks for giving the details of drop_path_rate on ade20k. I want to ask how the value of drop path rate affects the performance. Could you provide some comparative results?
Hi, @Andy1621 ,
I want to use pretrained UniFormer as the backbone for a downstream task on the video. But I can't obtain a simple but usable network code just by UniFormer-main\video_classification\slowfast\models\uniformer.py
.
I annotate the code about UniFormer register, and define the offered "config.yaml" for UniFormer instantiation. But there is an error that some parameters is lacking in config file, like "qkv_bias = cfg['UNIFORMER']['QKV_BIAS']".
Thanks!
'path_to_models/uniformer_base_in1k.pth'???
Hi,
Thanks for sharing this impressive work.
When I read the paper, I feel puzzled about #Frame in TABLE 3. (Comparison with the state-of-the-art on Kinetics400&600) in the paper https://arxiv.org/pdf/2201.09450.pdf. As for UniFormer-S, the #Frame is 16×1×4, what do 16,1, and 4 represent for?
Best regards!
I ran the test code before and found that top1 is only 0.03. I can guarantee that my data and labels must be aligned. Because there is no dataset provided by the author, I can only find the dataset on the Internet, so the labels I use, they are all generated by myself based on the data set, but the results that came out were so disappointing to me. In previous issues, I found that other people also had this problem, and then I based on the kinetics_400_categroies.txt provided by the author
, the test.csv is regenerated, which means that the test.csv file must be regenerated using the kinetics_400_categroies.txt provided by the author . Then I ran it again with the new kinetics_400_categroies.txt and test.csv, this time with a top1 of 0.81. This is really weird. This is really weird. This is really weird,
Thanks to share your nice products :).
I think your code is arranged well for me to understand.
I have one question how to compute the FLOPs number like in your paper.
do you have a package to compute it?
or I want to know if you can recommend some tools for it.
Thanks.
Hello author, I would like to ask a question:
Is the GGLL method a feasible solution without considering the memory consumption? Have you tried this combination?
Thanks lol...!
您好,在exp文件夹中有uniformer_n1_n2_n3形式文件夹。n1格式有b8x8,b16,s8x8,s16,这里的s以及b代表的是模型大小吧?那后面的16以及8x8是什么意思呢?
Hello, this is a great work. But there are something I want to ask:
1.How to prepare the train and val list file for Something-Something V1 dataset? Are these of the same format of Something-Something V2 mentioned in dataset.md? Can you please provide those?
I'm looking at the model code of the video classification part, but I don't see where Dynamic Position Embedding is specified.
AssertionError: The
num_classes(80) in UniFormer of MMDistributedDataParallel does not matches the length of
CLASSES 54) in CocoDataset
in the token labeling main file, I found that from tlt.data import create_token_label_target, TokenLabelMixup, FastCollateTokenLabelMixup, create_token_label_loader, create_token_label_dataset
,
but I can't find the tlt.data module
What is the reason why the trained model predicts the image to be blank when testing the image?
Hello,
In Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning paper Table 4a you have done an ablation study on the local and global affinity at different stages.
Can you please share the pretrained weights on 2D-Uniformer for the following configurations (such that I can load them for the 3D-Uniformer):
Thank you,
Thanks for sharing this great work.
It seems that the log for for UniFormer-Sh14 on Cascade Mask R-CNN has been misplaced and the provided one is the same with UniFormer-Bh14's.
Can you update it with the correct one?
We (@omubarek, @VenerableSpace) have been running Uniformer test only (TRAIN.ENABLE=FALSE), on Kinetics-400 test split. We also specified the following provided model checkpoints via TEST.CHECKPOINT_FILE_PATH for two different runs; one with Uniformer-S, another with Uniformer-B:
In both cases, we obtained very similar low top1 / top5 accuracies, such as:
{"split": "test_final", "top1_acc": "0.04", "top5_acc": "0.85"}.
We've tested the pre-processed data using other models different than Uniformer and we could reproduce their results.
Could you please help us figuring out what we may be missing?
How do I use Uniformer for semantic segmentation?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.