Comments (4)
All the parts have ImageNet-pretraining. For convolution, if the temporal dimension is larger than 1, we will copy and average the convolution weights. For self-attention, we copy the same weights. Please check the code
UniFormer/video_classification/slowfast/models/uniformer.py
Lines 387 to 421 in f92e423
from uniformer.
Thanks a lot for the quick response, the pointer to the code helps a lot! Just two follow-up questions.
- I understand the imagenet pertaining is done on the image-based Uniformer architectures and transferred to video uniformer architectures by inflating weights as above, right?
- a) Is there a table showing a comparison between imagenet pertaining vs not? b) I see that Table 17 in the paper presents some results showing inflating the weights to 3D performs better than 2D. What is the basis of this comparison? Because if it is a video model, the 3D inflation was always done right ? Whether centered around the middle slice or equally averaged across the time dimension. So what is the 2D comparison here?
Thanks a lot again for your time to answer the questions!
from uniformer.
For convolution inflation, I suggest you read paper I3D.
As for your other questions:
- Yes.
- a) Without ImageNet pretraining, the convergence will be much slower, which is a common strategy in video training. b)
2D
means we do not inflate the convolution, and merge the temporal dimension with the batch dimension. But for attention, we use spatiotemporal attention.
from uniformer.
Thanks a lot for the answers!
from uniformer.
Related Issues (20)
- About the pretrain model HOT 3
- About the code for ucf101 dataset processing HOT 2
- SABlock is the same as SABlock_windows? HOT 2
- Error in loading videos for testing HOT 7
- Is pos_embed NEED for every Block? HOT 2
- Usage of Image classification with UniFormer with Token Labeling HOT 1
- Thank you very much! I found the answer in the previous answer!
- mmdet object detection pretrained model HOT 1
- The picture has to be pre-trained HOT 1
- which pretrain model is uniformer-small-dim64? HOT 5
- video dataset HOT 2
- some ambiguousness when testing on k400 HOT 1
- Unknown model (uniformer_large_ls) HOT 2
- image_classification_model(Large resolution fine-tuning (384x384)) HOT 3
- What are these "tools/submit.py" and "tools/summit.py" for ? HOT 1
- 大佬您好,我想问一下,关于UniFormer,如果只准备了解模型架构的话,代码是不是只用看这个文件https://github.com/Sense-X/UniFormer/blob/main/video_classification/slowfast/models/uniformer.py就可以了 HOT 1
- Precision HOT 3
- Feature Extraction HOT 1
- About training on my own dataset for video classification HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from uniformer.