GithubHelp home page GithubHelp logo

zhangzjn / emo Goto Github PK

View Code? Open in Web Editor NEW
222.0 222.0 17.0 85.56 MB

[ICCV 2023] Official PyTorch implementation of "Rethinking Mobile Block for Efficient Attention-based Models"

Python 16.13% Jupyter Notebook 83.80% Dockerfile 0.01% Shell 0.04% Makefile 0.01% CSS 0.01% Batchfile 0.01%

emo's Introduction

Jiangning Zhang (张江宁) works as a Principal Researcher of two teams (Industry Perception and AIGC) at YouTu Lab, Tencent, Shanghai. I receive Ph.D. degree in College of Control Science and Engineering, Zhejiang University, Hangzhou, China, under the supervision of Prof. Yong Liu. My major is Computer Vision and my research interests include:
🌱 GAN-/Diffusion-based AIGC researches with LLM, e.g., multi-modal image/video generation, 2D/3D virtual digital human related researches (3D face/body/hand reconstruction, multi-modal digital human drive, motion generation, etc.), text-to-image generation, multi-modal human-centric editing and generation, etc.
🌱 Neural Architecture Design (NAD), e.g., transformer-based architecture, light-wight vision model, etc.
🌱 Anomaly Classification and Segmentation.

  • 💬 Feel free to drop me emails ([email protected]) if you have interests on above topics, and remote cooperations are welcomed.
  • 💬 You can contact me if you are applying for a Research Intern or a B.S./Ph.D. student in computer vision / robotic perception, and I co-supervise students with Prof. Yong Liu at Zhejiang University.
  • 💖 I love 📷photography, 🍲cooking, and 🌏traveling, enjoy together!!!!!!

Jiangning Zhang's github stats

emo's People

Contributors

zhangzjn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emo's Issues

库安装问题

Traceback (most recent call last):
File "D:\yolov5-7.0\yolov5-7.0\models\yolo.py", line 24, in
from models.common import *
File "D:\yolov5-7.0\yolov5-7.0\models\common.py", line 1853, in
from timm.models.layers.activations import *
ModuleNotFoundError: No module named 'timm.models.layers.activations'

Question about the FLOPs

Amazing work and is inspired me a lot!

Can anyone help me more? I am confused about the calculation of FLOPs.

Suppose the shape of Q, K and V is LxD. I calculate FLOPs of Multi-Head Self-Attention without cosidering bias and the result is about 8xD^2xL+4xDxL^2. I cannot get item with L^2.

debug.py

请问一下这个debug.py文件为啥项目中没有

Seems some files omitted

when executing

python3 -m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --use_env run.py -c configs/mobile/cls_emo -m test model.name=EMO_1M trainer.data.batch_size=2048 model.model_kwargs.checkpoint_path=resources/EMO-1M/net.pth

I encountered this:
FileNotFoundError: [Errno 2] No such file or directory: '/youtu_fuxi_team1_ceph/vtzhang/codes/data/imagenet/train'

Latency results for mobile devices

Dear Author.
Very lucky to read your work. Could you please report in your paper some latency results for some other mobile devices besides CPU and GPU?

Did you compare with TensorRT?

I have compared EMO-1M with RepVGG-A0, the throughput for EMO-1M is 300 fps, but RepVGG-A0 can achieve 2000 fps, the backend is TensorRT-8.5.

But as the paper stated, it's fast for embeded device, but i did not see the evidence.

Any idea?

A question about "attn_pre"

Thank you for your brilliant work!

I have some questions about parameter "attn_pre". Would setting it to 'True' make the model perform better?

predict image regression

Hello, I would like to ask, can I use your network structure to predict image regression problem, if yes, I would like to ask what changes need to be made to the code.

Doubts about iPhone 14 Results

According to NextViT's[1] report, on iPhone 12, models with much larger parameter quantities such as ResNet101 and NextViT-S have a delay of only about 4ms compared to EMO. The delay of EfficientFormer-L1[2] with a parameter quantity of 12M is only 1.6ms. The hardware of iPhone 14 is stronger than iPhone 12, and the latency of smaller EMO should be lower. However, the delay of EMO family is over 4ms. What is the reason for this?

[1] Li J, Xia X, Li W, et al. Next-vit: Next generation vision transformer for efficient deployment in realistic industrial scenarios[J]. arXiv preprint arXiv:2207.05501, 2022.
[2] Li Y, Yuan G, Wen Y, et al. Efficientformer: Vision transformers at mobilenet speed[J]. Advances in Neural Information Processing Systems, 2022, 35: 12934-12949.

Source Code

Hey! Interesting paper! Could you please share the source code or will you do it in the future? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.