bytedance / next-vit Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
您好,请问你fig5的傅里叶图和热力图怎么生成的?有没有代码或者链接分享一下,谢谢博主了!
When I run the code I get the following error, can you help ?
File "/content/drive/MyDrive/UNINEXT/projects/UNINEXT/uninext/backbone/nextvit.py", line 144, in init
assert out_channels % head_dim == 0
TypeError: unsupported operand type(s) for %: 'list' and 'int'
您好!
我注意到在log中,nextvit-small在imagenet上(224x224)训练时,耗时约为4min/epoch;
我自己在8 * A100上进行训练,bsz=8 * 256,耗时约为10min/epoch;
而注意到paper中提到您们工作的训练策略是8 * V100,bsz=8 * 256。
因此想请问是不是有一些不统一的地方在里面,抑或是我训练得有问题?
谢谢!
Is the output given by the NextViT class in tensor form?
Hi eveyone,
did someone already build a dockerfile to run this code. If you don't mind, please share with us. Thank you for helping us.
Best regards,
Xin
This is a request to provide CoreML segmentation pretrained models, as it seems you have already converted them and tested/used in order to get your statistics on the page for the latency of the models. If you could post them it would help me and my team a huge amount. Thanks!
in the classification model of nextvit.py:
why is there no relu before the global pool?
the batchnorm will produce both +ve and -ve values.
def forward(self, x):
x = self.stem(x)
for idx, layer in enumerate(self.features):
if self.use_checkpoint:
x = checkpoint.checkpoint(layer, x)
else:
x = layer(x)
x = self.norm(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.proj_head(x)
return x
博主,我又来了!就是你那个傅里叶图,是用的np.fft.fft2吗?ff2(图),np.fft.fft2(图),这个图直接是热力图,还是最后一层网络输出的图?谢谢了!
当前训练指令:CUDA_VISIBLE_DEVICES=0 bash train.sh 1 --model nextvit_small --batch-size 8 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path ./data
model.to(devices)时报错:RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 556202) of binary: /opt/conda/envs/NextVit/bin/python3
请问是什么原因呢?
Hi!
Would you be interested in sharing your models in the Hugging Face Hub? The Hub offers free hosting and it would make your work more accessible and visible to the rest of the ML community. We can help you set up a bytedance organization.
Some of the benefits of sharing your models through the Hub would be:
Creating the repos and adding new models should be a relatively straightforward process if you've used Git before. This is a step-by-step guide explaining the process in case you're interested. Please let us know if you would be interested and if you have any questions.
and you can also setup a gradio demo for your model by following this guide: https://gradio.app/getting_started/
here is a example of a gradio demo: https://huggingface.co/spaces/adirik/OWL-ViT
and the code: https://huggingface.co/spaces/adirik/OWL-ViT/blob/main/app.py
Happy to hear your thoughts,
Ahsen and the Hugging Face team
Hello 👋 I came across your model in kaggle competitions and was thinking it would be great if it was hosted on Hugging Face Hub model repositories as it addresses many needs around model hosting, and if you feel like it you can also wrap your model with this mixin to make it easily loadable. What do you think?
I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.
Here are the OpenMMLab 2.0 repos branches:
OpenMMLab 1.0 branch | OpenMMLab 2.0 branch | |
---|---|---|
MMEngine | 0.x | |
MMCV | 1.x | 2.x |
MMDetection | 0.x 、1.x、2.x | 3.x |
MMAction2 | 0.x | 1.x |
MMClassification | 0.x | 1.x |
MMSegmentation | 0.x | 1.x |
MMDetection3D | 0.x | 1.x |
MMEditing | 0.x | 1.x |
MMPose | 0.x | 1.x |
MMDeploy | 0.x | 1.x |
MMTracking | 0.x | 1.x |
MMOCR | 0.x | 1.x |
MMRazor | 0.x | 1.x |
MMSelfSup | 0.x | 1.x |
MMRotate | 1.x | 1.x |
MMYOLO | 0.x |
Attention: please create a new virtual environment for OpenMMLab 2.0.
how to deploy the Next_ViT detection? use mmdetection tools?
I used the NextVit as backbone for training a face embedding model and the accuracy was pretty good. But I tried to convert it to onnx and deploy on edge device, however, there was an error like this:
einops.py:314: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
known = {axis for axis in composite_axis if axis_name2known_length[axis] != _unknown_axis_length}
This error prevents me from inference the model and I don't know how to fix this, can anyone help me with this, thanks :D
您好,若我想对您的工作进行改进并用于分割任务,那我应该如何与您的工作对比?是利用FPN neck吗或是upernet?
Hi!
First of all very interesting paper with good ideas. I've looked into your code and your implementation of AvgPool seems to be wrong. Usually when we speak about AvgPool for images we assume a box filter with kernel_size=(size, size) and stride=size.
In your implementation you
kernel_size=size * size, stride=size * size
. This is NOT IDENTICAL to AvgPool2d.The problem is that in this implementation your performing avg-pooling of rows, and it leads to leakage of information from one border to another and also doesn't use any information in vertical dimension. Below is an example of tensor and it's mean inside your E_MHSA.
BS, DIM, SZ = 1, 1, 4
inp = torch.arange(SZ * SZ).float().view(BS, SZ * SZ, DIM)
E_MHSA(dim=DIM, sr_ratio=2, head_dim=1)(inp)
inp = tensor([
[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.]])
mean_inp (after self.sr(_x)): tensor([ 1.5000, 5.5000, 9.5000, 13.5000])
It's obvious it calculate average only inside rows. I would assume you used such implementation to avoid reshaping back to (B, C, H, W) for performing AvgPool2d, but this leads to a totally different pooling.
As a side note I would say that all this (B, C, H, W) -> (B, N, C) -> (B, C, H, W)
permutations are not really needed and you could boost speed of your network by re-implementing E_MHSA to work on (B, C, H, W)
tensors as inputs.
would you like to give us pretrained models in BaiduWangpan ? thanks.
Thank you for sharing a nice architecture.
I'm curious why you adopt "PatchEmbedding function" for every NCB as well as NTB.
(code line:
Next-ViT/classification/nextvit.py
Line 125 in 922e771
Because it looks slightly different from the figure of the paper which added patch embedding for each stage (not for each NCB and NTB).
Next-ViT/classification/nextvit.py
Lines 171 to 172 in 922e771
Are there any available weights for the models replacing BN with LN?
Hi,
What tool was used to draw Figure 1 in paper(ie. images/result.png ) ?
Was it drawn by Excel ?
--data/
--train/
--false/
img.jpg
--true/
img.jpg
--val/
--false/
img.jpg
--true/
img.jpg
数据集是这个结构,训练代码:bash train.sh 1 --model nextvit_small --batch-size 256 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path ./data
报错:FileNotFoundError: Found no valid file for the classes .ipynb_checkpoints. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp
定位到代码里,
dataset = datasets.ImageFolder(root, transform=transform),root读取不到false或者true的数据
请问怎么修改呢?我理解的是图像的文件夹名称代表的就是类别,训练过程应该是需要路径和类别两个数据才对呀
ImportError: cannot import name 'create_logger' from 'logger'
请问分割模型适合在边缘端部署么?我们的硬件采用nvidia orin,我下载的pretrained model都700多MB了,能部署么?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.