zhangzjn / emo Goto Github PK

[ICCV 2023] Official PyTorch implementation of "Rethinking Mobile Block for Efficient Attention-based Models"

Python 16.13% Jupyter Notebook 83.80% Dockerfile 0.01% Shell 0.04% Makefile 0.01% CSS 0.01% Batchfile 0.01%

emo's Introduction

Jiangning Zhang (张江宁) works as a Principal Researcher of two teams (Industry Perception and AIGC) at YouTu Lab, Tencent, Shanghai. I receive Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China, under the supervision of Prof. Yong Liu. My major is Computer Vision and my research interests include:
🌱 GAN-/Diffusion-based AIGC researches with LLM, e.g., multi-modal image/video generation, 2D/3D virtual digital human related researches (3D face/body/hand reconstruction, multi-modal digital human drive, motion generation, etc.), text-to-image generation, multi-modal human-centric editing and generation, etc.
🌱 Neural Architecture Design (NAD), e.g., transformer-based architecture, light-wight vision model, etc.
🌱 Anomaly Classification and Segmentation.

💬 Feel free to drop me emails ([email protected]) if you have interests on above topics, and remote cooperations are welcomed.
💬 You can contact me if you are applying for a Research Intern or a B.S./Ph.D. student in computer vision / robotic perception, and I co-supervise students with Prof. Yong Liu at Zhejiang University.
💖 I love 📷photography, 🍲cooking, and 🌏traveling, enjoy together!!!!!!

emo's People

Contributors

Stargazers

Watchers

Forkers

fhfeishi tanjingme ackesnal anmyles zfxu dl-attention xiaow89 rainbowpillow lidi100 dl-vit venonary zhymeng cleardry jackzhousz leo5050xvjf

emo's Issues

库安装问题

Traceback (most recent call last):
File "D:\yolov5-7.0\yolov5-7.0\models\yolo.py", line 24, in
from models.common import *
File "D:\yolov5-7.0\yolov5-7.0\models\common.py", line 1853, in
from timm.models.layers.activations import *
ModuleNotFoundError: No module named 'timm.models.layers.activations'

Which window partitioning strategy is used in EW-MHSA

Very interesting work, and the final architecture is very elegant.
I would like to know which window partitioning strategy is used in EW-MHSA, the sliding (SASA) or the shifting (Swin)?

About the Inconsistency between the Implementation and the Paper. (e.g., the "attn_pre" setting.)

Sorry to bother you, I have seen the issue (#9 (comment)), but I am still confused by the "attn_pre". As you describe in the paper, the qk and v will be multiplied after self.v() (e.g. set "attn_pre" to False), but you seem to set it to True. So I wonder which setting will produce the results you report in the published paper?

Thanks in advance!

How do I replace MHSA with ACmix?

Question about the FLOPs

Amazing work and is inspired me a lot!

Can anyone help me more? I am confused about the calculation of FLOPs.

Suppose the shape of Q, K and V is LxD. I calculate FLOPs of Multi-Head Self-Attention without cosidering bias and the result is about 8xD^2xL+4xDxL^2. I cannot get item with L^2.

debug.py

请问一下这个debug.py文件为啥项目中没有

Seems some files omitted

when executing

python3 -m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --use_env run.py -c configs/mobile/cls_emo -m test model.name=EMO_1M trainer.data.batch_size=2048 model.model_kwargs.checkpoint_path=resources/EMO-1M/net.pth

I encountered this:
FileNotFoundError: [Errno 2] No such file or directory: '/youtu_fuxi_team1_ceph/vtzhang/codes/data/imagenet/train'

A question about EW-MHSA

Amazing work!

I have a question about the code of EW-MHSA, how do you implement Q=K=X?

Latency results for mobile devices

Dear Author.
Very lucky to read your work. Could you please report in your paper some latency results for some other mobile devices besides CPU and GPU?

How to draw the Fig.7, the Diagonal similarity in S-3 with different components?

Hallo dear authors, could you please release the code for drawing the he Fig.7, the Diagonal similarity in S-3 with different components? And I am confused with the diagonal similarity, could you please explain it?

Where can we find MetaMobile?

https://github.com/zhangzjn/EMO/blob/main/resources/RetinaNet_EMO_1M/20221102_234131.log#L39

Where can we find the model?

Have you done experiments with data enhancement？

Did you compare with TensorRT?

I have compared EMO-1M with RepVGG-A0, the throughput for EMO-1M is 300 fps, but RepVGG-A0 can achieve 2000 fps, the backend is TensorRT-8.5.

But as the paper stated, it's fast for embeded device, but i did not see the evidence.

Any idea?

A question about "attn_pre"

Thank you for your brilliant work!

I have some questions about parameter "attn_pre". Would setting it to 'True' make the model perform better?

visualization of the similarity of diagonal pixels

Thank you for your brilliant work!
Could you please provide the code for visualizing the similarity of diagonal pixels?

predict image regression

Hello, I would like to ask, can I use your network structure to predict image regression problem, if yes, I would like to ask what changes need to be made to the code.

Doubts about iPhone 14 Results

According to NextViT's[1] report, on iPhone 12, models with much larger parameter quantities such as ResNet101 and NextViT-S have a delay of only about 4ms compared to EMO. The delay of EfficientFormer-L1[2] with a parameter quantity of 12M is only 1.6ms. The hardware of iPhone 14 is stronger than iPhone 12, and the latency of smaller EMO should be lower. However, the delay of EMO family is over 4ms. What is the reason for this?

[1] Li J, Xia X, Li W, et al. Next-vit: Next generation vision transformer for efficient deployment in realistic industrial scenarios[J]. arXiv preprint arXiv:2207.05501, 2022.
[2] Li Y, Yuan G, Wen Y, et al. Efficientformer: Vision transformers at mobilenet speed[J]. Advances in Neural Information Processing Systems, 2022, 35: 12934-12949.

Source Code

Hey! Interesting paper! Could you please share the source code or will you do it in the future? Thanks!

zhangzjn / emo Goto Github PK

emo's Introduction

emo's People

Contributors

Stargazers

Watchers

Forkers

emo's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs