sense-x / uniformer Goto Github PK

View Code? Open in Web Editor NEW

798.0 10.0 110.0 32.17 MB

[ICLR2022] official implementation of UniFormer

License: Apache License 2.0

Python 50.39% Shell 0.37% Jupyter Notebook 49.19% Dockerfile 0.03% Makefile 0.01% Batchfile 0.01%

image-classification video-classification object-detection semantic-segmentation pose-estimation

uniformer's People

Contributors

Stargazers

Watchers

Forkers

ar210689 junhaozhang98 wangjingg atlasgooo2 superryanguo wuyifujia cauchyfood jrcao bweng001 itxiud2015 mygit007hub opentld pjjie zxypro1 andpyxa ai-hub-deep-learning-fundamental joskid stjordanis ryanhu-tech paperwave idrissbachali yasinkaya1 techthiyanes ak391 faithfulnguyen lv-tuan delldu zyxjtu tonylibing tommylitlle cv-ip lancemsf yangyanggirl yangzhuang-j shuguoj canyonwind xrosliang 23119841 luoolu user8361 sreejyo miracletrespasser leondgarse celeste-cj hn18001 sun-jn liumarcus70s hadlang jiazheng-xing zepengw wolfworld6 azizighani hydrogeohc holmes-gu rookiexwc luo77123 xychen2022 swjtulinxi dariush-saberi wentaozhu manzhihuangnian duanzhihua aworldofchaos piaomai tao-p-m harvard-visionlab yiyualt junting mayuelala yangr116 dl-vit jjandnn christophreich1996 rachlenko ybkim95 lixiang007666 libeineu cuiziteng dumpmemory manhcntt21 xuhuahaoren mehulk43 xiaoqiangzhou go-ahead-maker gabprato xiuqhou megahanga jan-1995 trellixvulnteam airpods69 kgavrilyuk mayyoy markjhonbao jarygrace yult0821 yueyedeai begonia2020 mzheng0927 zds-20181015 gegalu

uniformer's Issues

Why the uniformer_small_plus test result is lower than the claimed？

I run the test script of uniformer small plus model https://github.com/Sense-X/UniFormer/blob/main/image_classification/exp/uniformer_small_plus/test.sh

The acc@1 is 68.394 and lower than that in paper, 83.4.

the weight I used is

How could I use token labeling for object detection task?

About video attention visualization

As I see it, you use GradCam to visualize the matrix A from the last layer, could you please kindly provide a demo code of how to use the GradCam code in your repo?
If it is possible, could you plz upload the demo in your demo?
Many thanks!

About the k600 model config

In https://github.com/Sense-X/UniFormer/blob/main/video_classification/exp/uniformer_b32x4_k600/config.yaml

MODEL:
  NUM_CLASSES: 400
  ARCH: uniformer
  MODEL_NAME: Uniformer
  LOSS_FUNC: soft_cross_entropy
  DROPOUT_RATE: 0.5

It seems that the NUM_CLASSES should be 600 instead of 400. Other k600 config files have the same problem. Or are the config files themselves wrong?

How to use the pretrained model uniformer_base_in1k.pth as my backbone ?

There are some problems when I use the pre-trained model uniformer_base_in1k.pth as my backbone?
missing keys: ['patch_embed1.norm.weight', 'patch_embed1.norm.bias', 'patch_embed1.proj.weight', 'patch_embed1.proj.bias', 'patch_embed2.norm.weight', .....
unexpected keys: ['model']

threequestions about video classification

Hello,
I have three operational questions about video classification in uniformer:
1、I find you write ''resize the video to the short edge size of 320'' in DATASET.md of video_classification, so I wonder whether this operation will be achieved before the program runs. If so, could you offer me the code of 'resize'?
2、I see you write ''freeze BN in Backbone'' in README.md, are you also forget to perform this operation in video classification? And if it's necessary to perform this operation in video classification?
3、The Kinetics 400 dataset I downloaded has timestamps between the file name and '.mp4', does it affect the code running?
Thanks for your instruction if you answer my questions.

`fused_weight_gradient_mlp_cuda` module not found. gradient accumulation fusion with weight gradient computation disabled.

I met this problem after installing mmdet2.11 in object_detection and mmcv1.3.14. It seems something wrong of the installation and multi-gpu training. Do u have any idea whats wrong with it?

Thank u for ur efforts.

How to correctly fine-tune the Uniformer on my dataset?

Dear author:

I am looking forward to fine-tuning the Uniformer model on my dataset of image classification task. However, I met several questions when loading proposed checkpoint. Here I use ImageNet-1K pretrained (224x224) UniFormer-B model, and set args.finetune = 'uniformer_base_in1k.pth'.

When interpolate position embedding in main.py, there's an error:

UniFormer/image_classification/main.py

Lines 275 to 293 in e802470

 # interpolate position embedding 

 pos_embed_checkpoint = checkpoint_model['pos_embed'] 

 embedding_size = pos_embed_checkpoint.shape[-1] 

 num_patches = model.patch_embed.num_patches 

 num_extra_tokens = model.pos_embed.shape[-2] - num_patches 

 # height (== width) for the checkpoint position embedding 

 orig_size = int((pos_embed_checkpoint.shape[-2] - num_extra_tokens) ** 0.5) 

 # height (== width) for the new position embedding 

 new_size = int(num_patches ** 0.5) 

 # class_token and dist_token are kept unchanged 

 extra_tokens = pos_embed_checkpoint[:, :num_extra_tokens] 

 # only the position tokens are interpolated 

 pos_tokens = pos_embed_checkpoint[:, num_extra_tokens:] 

 pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2) 

 pos_tokens = torch.nn.functional.interpolate( 

 pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False) 

 pos_tokens = pos_tokens.permute(0, 2, 3, 1).flatten(1, 2) 

 new_pos_embed = torch.cat((extra_tokens, pos_tokens), dim=1) 

 checkpoint_model['pos_embed'] = new_pos_embed

File "/path-to-project/main.py", line 276, in main
    pos_embed_checkpoint = checkpoint_model['pos_embed']
KeyError: 'pos_embed'

Where's my mistake of loading the model?

How to convert video classification from pytorch to onnx，on cpu

I want to convert the pytorch model to the onnx model on the cpu, but there is no relevant code in the source code. Can you upload pytorch2onnx.py in the video classification?

any difference between lucidrains's implemantation?

Hi, may I ask any difference between lucidrains's implementation and this repo?
https://github.com/lucidrains/uniformer-pytorch
Thanks a lot!

Container3D和UniFormer有什么区别吗？

你好，我在你的日志里面发现使用的模型叫Container3D_SA_FP32(base的164，168等使用的都是Container3D_SA_FP32)，也就是说你的结果都是使用的这个模型。但你提供的所有pytorch模型都是UniFormer，那么Container3D_SA_FP32和UniFormer是一样的吗？我很困惑，因为我使用UniFormer base 16*4测试得到的top1只有0.03

Pretrained window/hybrid SABlock backbone model for Detection task

Hi, thank you for the contribution to this super-rad work!

Wonder that, in your experiments, whether the backbone models used in Detection task with stage-3 window/hybrid SABlock (S-h14, B-h14) are needed to be pretrained on imagenet?

If so, could these backbones with window/hybrid SABlock be released? And if not, are the weights loaded directly from the regular model with global window in stage-3?

Thanks!

Huggingface Spaces

Hi, would you be interested in sharing a web demo on Huggingface Spaces for UniFormer?

It would make this model more accessible as it would allow people to try out the model directly from the browser. Some other recent machine learning model repos have set up Spaces for easy access:

github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/akhaliq/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

Spaces is completely free, and I can help setup a Gradio Space. Here are some getting started instructions if you'd prefer to do it yourself: https://huggingface.co/blog/gradio-spaces

where is uniformer_small_in1k

In https://github.com/Sense-X/UniFormer/blob/main/video_classification/exp/uniformer_s16x8_k400/config.yaml

UNIFORMER:
EMBED_DIM: [64, 128, 320, 512]
DEPTH: [3, 4, 8, 3]
HEAD_DIM: 64
MLP_RATIO: 4
DROPOUT_RATE: 0
ATTENTION_DROPOUT_RATE: 0
DROP_DEPTH_RATE: 0.1
SPLIT: False
PRETRAIN_NAME: 'uniformer_small_in1k'

The pretrained model is 'uniformer_small_in1k', however the corresponding model in this link is 'uniformer_small_k400_16x8.pth'.

Would you kindly tell me where is 'uniformer_small_in1k'?

Thanks and best regards!

The convolution kernel parameter problem of patch embedding in the first stage of video classification

Hi, thank you so much for your work. I am now faced with a 3D input task, so I would like to refer to your work. Different from the video, the three dimensions of my data are equally important. Therefore, I understand that the convolution kernel parameter of patch embedding in the first stage should be changed to 4x4x4 with stride 4x4x4, right? However, WHEN I realized this problem, I have trained a batch of models, and the convolution kernel of these models is still 3x4x4, and the step size is 4x4x4. Here I want to ask you what impact will be produced by the difference of this parameter?

pretrain

How does the video transformer pre train on image1k? Isn't the input different?
for example : 3D patch-embedding in video, 2D patch-embedding in image?

about SplitSABlock

Hi,

When I reading the video uniformer code, there is a SplitSABlock , and the default setting of enabling it is false .

However, I can't find this split block in "UniFormer: Unifying Convolution and
Self-attention for Visual Recognition" . Would you kindly tell me the reference of this split block?

Thank you and best regards!

About attention visualization

Thank u for ur efforts!
The pictures in ur paper are beautiful. I wonder did u use attention rollout to do the visualization as other transformer models? I tried to visualize it by just calculate the mean of the feature. And the result is not as expected.
I would appreciate it to hear ur recommendation.

about uniformer

Where is the uniformer module referenced? Is the uniformer a plug and play module? For example, I have an input of (64,3,25,25). After inputting the uniformer module, will the output also be (64,3,25,25)?

Local MHRA and PWConv-DWConv-PWConv

Hi,

Thanks for sharing this excellent paper and the code. Could you please give a detailed explanation of why the PWConv-DWConv-PWConv does the same thing as the Local MHRA? I understand that the 'spatiotemporal affinity' is content-dependent (similar to ViT), but the Conv layers in code are content-independent. Is the set of Conv layers the approximation of the Local MHRA? For the temporal dimension, how do you handle them before feeding the tensor to the network, concatenating all frames along the RGB channels (the Conv layers in the code are all 2d Conv)?

Thanks in advance!

Best regards

Training hangs at the end of the first epoch in image classification task.

Dear author:

When I training the Uniformer model with 8 GPUs, I start the code with the following run.sh:

work_path=$(dirname $0)
PYTHONPATH=$PYTHONPATH:../../ \
python -m torch.distributed.launch --nproc_per_node=8 --master_port=22335 --use_env main.py \
    --model uniformer_base \
    --batch-size 64 \
    --num_workers 8 \
    --drop-path 0.3 \
    --epoch 300 \
    --dist-eval \
    --output_dir ${work_path}/ckpt \
    2>&1 | tee -a ${work_path}/log.txt

And the logs are (I have deleted the display of model details):

/home/data/user/local/anaconda3/envs/uniformer/lib/python3.9/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
  entrypoint       : main.py
  min_nodes        : 1
  max_nodes        : 1
  nproc_per_node   : 8
  run_id           : none
  rdzv_backend     : static
  rdzv_endpoint    : 127.0.0.1:22335
  rdzv_configs     : {'rank': 0, 'timeout': 900}
  max_restarts     : 3
  monitor_interval : 5
  log_dir          : None
  metrics_cfg      : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_yejsfquq/none_6nw5du9x
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/home/data/user/local/anaconda3/envs/uniformer/lib/python3.9/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
  warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
  restart_count=0
  master_addr=127.0.0.1
  master_port=22335
  group_rank=0
  group_world_size=1
  local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
  global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/7/error.json
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
| distributed init (rank 3): env://
| distributed init (rank 6): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
| distributed init (rank 7): env://
| distributed init (rank 4): env://
[W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 4 using best-guess GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 5 using best-guess GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 3 using best-guess GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 2 using best-guess GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 6 using best-guess GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 7 using best-guess GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
Warning: Enabling distributed evaluation with an eval dataset not divisible by process number. This will slightly alter validation results as extra duplicate entries are added to achieve equal num of samples per-process.
Creating model: uniformer_base
number of params: 49468752
Start training for 300 epochs
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
Epoch: [0]  [0/9]  eta: 0:01:24  lr: 0.000001  loss: 6.0915 (6.0915)  time: 9.3436  data: 3.3803  max mem: 11905
Epoch: [0]  [8/9]  eta: 0:00:02  lr: 0.000001  loss: 6.0828 (6.0922)  time: 2.0037  data: 0.3758  max mem: 12141
Epoch: [0] Total time: 0:00:18 (2.0372 s / it)
Averaged stats: lr: 0.000001  loss: 6.0828 (6.0883)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

The displayed logs are very messy and maddening, and I am totally no clue whether the code runs correctly. These warnings only occur when distributed training. Have you ever met this situation? I will appreciate it if you can give me some advice.

Regular MS aug for object detection for uniformer

Hi! Thanks for sharing the code. It seems that Uniformer uses DETR-like aug for MS training. (

UniFormer/object_detection/exp/mask_rcnn_3x_ms_hybrid_small/config.py

Line 30 in 0b5af7d

dict(type='AutoAugment',

)
What are the results for regular MS aug for Uniformer for Object Detection?

IndexError: tuple index out of range

when I run the script ' ./exp/mask_rcnn_1x_hybrid_small/run.sh' of uniformer, I get a bug as follows:
Traceback (most recent call last):
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent

                                                                                                                                                                          [56/1124]

File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
if cached_x.grad_fn.next_functions[1][0].variable is not x:
IndexError: tuple index out of range
return forward_call(*input, **kwargs)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(*new_args, **new_kwargs)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/detectors/base.py", line 181, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/detectors/two_stage.py", line 156, in forward_train
proposal_cfg=proposal_cfg)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/dense_heads/base_dense_head.py", line 49, in forward_train
outs = self(x)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/dense_heads/anchor_head.py", line 143, in forward
return multi_apply(self.forward_single, feats)
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/core/utils/misc.py", line 29, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/mnt/ghome/jiaojiayu/uniformer/object_detection/mmdet/models/dense_heads/rpn_head.py", line 43, in forward_single
x = self.rpn_conv(x)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
self.padding, self.dilation, self.groups)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/apex/amp/wrap.py", line 21, in wrapper
args[i] = utils.cached_cast(cast_fn, args[i], handle.cache)
File "/mnt/ghome/jiaojiayu/anaconda3/envs/open-mmlab-2/lib/python3.7/site-packages/apex/amp/utils.py", line 97, in cached_cast
if cached_x.grad_fn.next_functions[1][0].variable is not x:
IndexError: tuple index out of range

I use the uniformer as the backbone and I don't change the code. Please help me!

Basic image classifier usage of token label models

I'm hesitating asking this basic question, but what's the correct way using the token label models for basic image classification? I followed your instruction in huggingface.co uniformer_image, but the result seems not right:

# cd image_classification
import torch
import torch.nn.functional as F
import torchvision.transforms as T
# from models import uniformer as torch_uniformer
from token_labeling.tlt.models import uniformer as torch_uniformer

def inference(model, image):
    image_transform = T.Compose([T.Resize(224), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
    image = image_transform(image)
    image = image.unsqueeze(0)
    prediction = model(image)
    prediction = F.softmax(prediction, dim=1).flatten()
    return prediction

model = torch_uniformer.uniformer_small()
weights = torch.load('uniformer_small_tl_224.pth')
model.load_state_dict(weights['model'] if "model" in weights else weights, strict=True)
model = model.eval()

# Run prediction
from skimage.data import chelsea
from PIL import Image
imm = Image.fromarray(chelsea()) # Chelsea the cat
out = inference(model, imm)
print(out.argsort()[-5:])
# tensor([224, 196, 223, 410, 599])

# Decode, any method just getting the label output
from tensorflow import keras
keras.applications.imagenet_utils.decode_predictions(out.detach().numpy()[None])
# [[('n03530642', 'honeycomb', 0.55872005),
#   ('n02727426', 'apiary', 0.011748945),
#   ('n02104365', 'schipperke', 0.0044726683),
#   ('n02097047', 'miniature_schnauzer', 0.003748106),
#   ('n02105056', 'groenendael', 0.0033460185)]]

The correct output like using non-token-label uniformer_small is like:

from models import uniformer as torch_uniformer
...
weights = torch.load('uniformer_small_in1k.pth')
...
print(out.argsort()[-5:])
# tensor([284, 287, 281, 282, 285])
...
keras.applications.imagenet_utils.decode_predictions(out.detach().numpy()[None])
# [[('n02124075', 'Egyptian_cat', 0.7029501),
#   ('n02123159', 'tiger_cat', 0.08705652),
#   ('n02123045', 'tabby', 0.056305394),
#   ('n02127052', 'lynx', 0.0035495553),
#   ('n02123597', 'Siamese_cat', 0.0008160392)]]

Besides, the imagenet evaluation accuracy in my testing for non-token-label uniformer_small is top1: 0.82986 top5: 0.96358, and token-label one using same method is top1: 0.00136 top5: 0.00622. I think it's something wrong in my usage.

AssertionError: The `num_classes` (54) in ConvFCBBoxHead of MMDistributedDataParallel does not matches the length of `CLASSES` 80) in CocoDataset

CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh ./exp/cascade_mask_rcnn_3x_ms_hybrid_base/config.py 4 --cfg-options model.pretrained='/home/lbc/UniFormer/object_detection/pretrained/cascade_mask_rcnn_3x_ms_hybrid_base.pth'

`base = [
'../../configs/base/models/cascade_mask_rcnn_uniformer_fpn.py',
'../../configs/base/datasets/coco_instance.py',
'../../configs/base/schedules/schedule_1x.py',
'../../configs/base/default_runtime.py'
]

model = dict(
backbone=dict(
embed_dim=[64, 128, 320, 512],
layers=[5, 8, 20, 7],
head_dim=64,
drop_path_rate=0.4,
use_checkpoint=True,
checkpoint_num=[0, 0, 20, 0],
windows=False,
hybrid=True,
window_size=14
),
neck=dict(in_channels=[64, 128, 320, 512]),
roi_head=dict(
bbox_head=[
dict(
type='ConvFCBBoxHead',
num_shared_convs=4,
num_shared_fcs=1,
in_channels=256,
conv_out_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=54,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
reg_decoded_bbox=True,
norm_cfg=dict(type='SyncBN', requires_grad=True),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
dict(
type='ConvFCBBoxHead',
num_shared_convs=4,
num_shared_fcs=1,
in_channels=256,
conv_out_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=54,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.05, 0.05, 0.1, 0.1]),
reg_class_agnostic=False,
reg_decoded_bbox=True,
norm_cfg=dict(type='SyncBN', requires_grad=True),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
dict(
type='ConvFCBBoxHead',
num_shared_convs=4,
num_shared_fcs=1,
in_channels=256,
conv_out_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=54,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.033, 0.033, 0.067, 0.067]),
reg_class_agnostic=False,
reg_decoded_bbox=True,
norm_cfg=dict(type='SyncBN', requires_grad=True),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=10.0))
]))

img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

augmentation strategy originates from DETR / Sparse RCNN

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='AutoAugment',
policies=[
[
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
multiscale_mode='value',
keep_ratio=True)
],
[
dict(type='Resize',
img_scale=[(400, 1333), (500, 1333), (600, 1333)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomCrop',
crop_type='absolute_range',
crop_size=(384, 600),
allow_negative_crop=True),
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
override=True,
keep_ratio=True)
]
]),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
data = dict(train=dict(pipeline=train_pipeline))

optimizer = dict(delete=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05,
paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
'relative_position_bias_table': dict(decay_mult=0.),
'norm': dict(decay_mult=0.)}))
lr_config = dict(step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)

do not use mmdet version fp16

fp16 = None
optimizer_config = dict(
type="DistOptimizerHook",
update_interval=1,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True,
) Using /home/lbc/miniconda3/envs/openmmlab_new/lib/python3.7/site-packages
Finished processing dependencies for mmdet==2.11.0
`

How to set a suitable batch size for video classification task of 32 frames?

I use 8 GPUs of 32G and set the batch size to 16, but there is always an out-of-memory error.

~50% drop between test and valid performance [SSv2 - Uniformer S16]

Hello,

I'm observing drop in performance between test and validation using Uniformer S16 on SSv2

The label mapping provided in somesomev2_rgb_test_split.txt lists label 0 for all instances.
1. Obtained test labels from another repo and matched to ensure indices match the train and valid label mapping
2. Training from scratch (including on k400) yields validation results of 67.99 top 1 acc. (slightly higher but comparable to paper & github reported results)
3. When evaluate this model on test. Results drop to 31.86 top 1 acc (similar drop for top 5 acc)
4. Looking through the provided logs here, don't see the test set being invoked. I see interleaved train and validation at 1x1 (clips x crops) until the end where 1x3 evaluation for validation is done.
5. I want to confirm the reported values in the paper and github are on test and not validation.
The pretrained checkpoint (trained on ssv2 + k400) shows very poor performance on the label mapping provided across test/train/valid for ssv2. ie. the model checkpoints seem to be trained on a different label mapping. Do you happen to have these available?

I appreciate help tracking these down.

Thanks

config file not found

file not found error while trying to run repo as it is -- 'configs\Kinetics\SLOWFAST_4x16_R50.yaml'

Some questions about loading pretrain model and training.

Dear author:

I want to fine-tune your model on my dataset of video classification task using provided UniFormer-B model. However, I met some problems.

When I start training, the top1 error is always 100

I have successfully fine-tune my dataset using code facebook SlowFast with different models before.

Before I training UniFormer, I have:

replaced the dataset folder with my previous SlowFast project;
set AUG and MIXUP disable.

Have I possibly made any mistakes?

Besides that, I notice that SlowFast code using slowfast/utils/checkpoint.py to load pretrain model, while the Uniformer code add a new function in slowfast/model/uniformer.py and copy the origin slowfast/utils/checkpoint.py to slowfast/utils/checkpoint_amp.py. What is the difference between loading pretrain model from the these two functions?

您好：

我想用UniFormer模型在我的数据集上fine-tune，我的任务是video classification，用的是文档中提供的UniFormer-B model，但是遇到了一些问题。

我训练的时候，top-1 error一直是100，如上图。

我之前用facebook SlowFast 的code在不同的模型上面都可以成功训练。训Uniformer的时候，我把我之前project的dataset文件夹直接覆盖了过来，然后关掉了AUG和MIXUP。不知道是哪里出了问题？

此外，SlowFast的code是在slowfast/utils/checkpoint.py 里面读的预训练模型，但我看您的code是在 slowfast/model/uniformer.py 重新实现了一个函数来读，并且把原来的slowfast/utils/checkpoint.py重命名为slowfast/utils/checkpoint_amp.py（但我好像没看到两者之间的差别）。我想了解一下用uniformer.py和checkpoint_amp.py加载预训练模型有区别吗？

21k pretrained model

请问后面会提供21k的预训练模型吗0.0

[segmentation]How to inference just using a single image?

By default I gotta download ADEChallengeData2016 for testing the pre-trained model. Is there a easier way without set up a dataset, just use one image like the demo/inference_demo.ipynb do? when I use the inference_demo and the keyError is ""EncoderDecoder: 'UniFormer is not in the models registry'""

Gradio Blocks demo

Hi, great work on the demo for uniformer https://huggingface.co/spaces/Andy1621/uniformer_image_demo, would you be interested in making a gradio blocks version of this for the blocks event this month: https://huggingface.co/Gradio-Blocks or for another model, thanks!

drop_path_rate on ade20k

Thanks for giving the details of drop_path_rate on ade20k. I want to ask how the value of drop path rate affects the performance. Could you provide some comparative results?

How to fine-tune the pre-trained on Imagenet with 384x384 resolution?

How to fine-tune the pre-trained on Imagenet with 384x384 resolution?
I pre-train the model on Imagenet1k but I meet this KeyError when I try to fine-turn.

Some question on pretrained model usage

Hi, @Andy1621 ,

I want to use pretrained UniFormer as the backbone for a downstream task on the video. But I can't obtain a simple but usable network code just by UniFormer-main\video_classification\slowfast\models\uniformer.py.

I annotate the code about UniFormer register, and define the offered "config.yaml" for UniFormer instantiation. But there is an error that some parameters is lacking in config file, like "qkv_bias = cfg['UNIFORMER']['QKV_BIAS']".

Thanks!

FileNotFoundError: [Errno 2] No such file or directory: 'path_to_models/uniformer_base_in1k.pth'

'path_to_models/uniformer_base_in1k.pth'???

meaning of #Frame

Hi,

Thanks for sharing this impressive work.

When I read the paper, I feel puzzled about #Frame in TABLE 3. (Comparison with the state-of-the-art on Kinetics400&600) in the paper https://arxiv.org/pdf/2201.09450.pdf. As for UniFormer-S， the #Frame is 16×1×4, what do 16,1, and 4 represent for？

Best regards！

Video classification, k400 test, top1 is only 0.03

I ran the test code before and found that top1 is only 0.03. I can guarantee that my data and labels must be aligned. Because there is no dataset provided by the author, I can only find the dataset on the Internet, so the labels I use, they are all generated by myself based on the data set, but the results that came out were so disappointing to me. In previous issues, I found that other people also had this problem, and then I based on the kinetics_400_categroies.txt provided by the author
, the test.csv is regenerated, which means that the test.csv file must be regenerated using the kinetics_400_categroies.txt provided by the author . Then I ran it again with the new kinetics_400_categroies.txt and test.csv, this time with a top1 of 0.81. This is really weird. This is really weird. This is really weird,

How to compute the FLOPs?

Thanks to share your nice products :).
I think your code is arranged well for me to understand.

I have one question how to compute the FLOPs number like in your paper.

do you have a package to compute it?
or I want to know if you can recommend some tools for it.

Thanks.

LLGG --》GGLL?

Hello author, I would like to ask a question:
Is the GGLL method a feasible solution without considering the memory consumption? Have you tried this combination?

Thanks lol...!

exp 文件夹中文件名的含义

您好，在exp文件夹中有uniformer_n1_n2_n3形式文件夹。n1格式有b8x8，b16，s8x8，s16,这里的s以及b代表的是模型大小吧？那后面的16以及8x8是什么意思呢？

Some questions about video classification.

Hello, this is a great work. But there are something I want to ask:
1.How to prepare the train and val list file for Something-Something V1 dataset? Are these of the same format of Something-Something V2 mentioned in dataset.md? Can you please provide those?

I ever adopt the tools in TSM project to extract frames of Something-Something V2. The difference is that the extracted frames are sparse. For example, according to the train.csc you provide in dataset.md the frames of video 1 is totally 117 while in TSM project the number is 45. So, why do you adopt a much dense extracting rate? How do different rates influence the final performance?

About Dynamic Position Embedding mentioned in paper

I'm looking at the model code of the video classification part, but I don't see where Dynamic Position Embedding is specified.

AssertionError: The `num_classes` (80) in UniFormer of MMDistributedDataParallel does not matches the length of `CLASSES` 54) in CocoDataset

AssertionError: The num_classes(80) in UniFormer of MMDistributedDataParallel does not matches the length ofCLASSES 54) in CocoDataset

can't find the tlt.data module?

in the token labeling main file, I found that from tlt.data import create_token_label_target, TokenLabelMixup, FastCollateTokenLabelMixup, create_token_label_loader, create_token_label_dataset ,
but I can't find the tlt.data module

There is an image output during training and validation, but not during testing

What is the reason why the trained model predicts the image to be blank when testing the image?

3D-Uniformer Ablation Weights

Hello,
In Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning paper Table 4a you have done an ablation study on the local and global affinity at different stages.

Can you please share the pretrained weights on 2D-Uniformer for the following configurations (such that I can load them for the 3D-Uniformer):

G-G-G-G
L-L-L-L
L-G-G-G

Thank you,

The log for UniFormer-Sh14 on Cascade Mask R-CNN is the same with that of UniFormer-Bh14

Thanks for sharing this great work.

It seems that the log for for UniFormer-Sh14 on Cascade Mask R-CNN has been misplaced and the provided one is the same with UniFormer-Bh14's.

Can you update it with the correct one?

Strange test results using provided model checkpoints

We (@omubarek, @VenerableSpace) have been running Uniformer test only (TRAIN.ENABLE=FALSE), on Kinetics-400 test split. We also specified the following provided model checkpoints via TEST.CHECKPOINT_FILE_PATH for two different runs; one with Uniformer-S, another with Uniformer-B:

Uniformer-S, #Frame: 8x1x4, Sampling Stride: 8, filename: uniformer_small_k400_8x8.pth
Uniformer-B, #Frame: 8x1x4, Sampling Stride: 8, filename: uniformer_base_k400_8x8.pth

In both cases, we obtained very similar low top1 / top5 accuracies, such as:
{"split": "test_final", "top1_acc": "0.04", "top5_acc": "0.85"}.

We've tested the pre-processed data using other models different than Uniformer and we could reproduce their results.

Could you please help us figuring out what we may be missing?

How do I use Uniformer for semantic segmentation

How do I use Uniformer for semantic segmentation?

	# interpolate position embedding
	pos_embed_checkpoint = checkpoint_model['pos_embed']
	embedding_size = pos_embed_checkpoint.shape[-1]
	num_patches = model.patch_embed.num_patches
	num_extra_tokens = model.pos_embed.shape[-2] - num_patches
	# height (== width) for the checkpoint position embedding
	orig_size = int((pos_embed_checkpoint.shape[-2] - num_extra_tokens) ** 0.5)
	# height (== width) for the new position embedding
	new_size = int(num_patches ** 0.5)
	# class_token and dist_token are kept unchanged
	extra_tokens = pos_embed_checkpoint[:, :num_extra_tokens]
	# only the position tokens are interpolated
	pos_tokens = pos_embed_checkpoint[:, num_extra_tokens:]
	pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)
	pos_tokens = torch.nn.functional.interpolate(
	pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)
	pos_tokens = pos_tokens.permute(0, 2, 3, 1).flatten(1, 2)
	new_pos_embed = torch.cat((extra_tokens, pos_tokens), dim=1)
	checkpoint_model['pos_embed'] = new_pos_embed

sense-x / uniformer Goto Github PK

uniformer's People

Contributors

Stargazers

Watchers

Forkers

uniformer's Issues

augmentation strategy originates from DETR / Sparse RCNN

do not use mmdet version fp16

Recommend Projects

Recommend Topics

Recommend Org

Jobs