bytedance / next-vit Goto Github PK

View Code? Open in Web Editor NEW

534.0 13.0 76.0 5.4 MB

License: Apache License 2.0

Python 99.22% Shell 0.78%

next-vit's People

Contributors

Stargazers

Watchers

next-vit's Issues

论文fig5的问题

您好，请问你fig5的傅里叶图和热力图怎么生成的？有没有代码或者链接分享一下，谢谢博主了！

outputs of the NextViT class

Is the output given by the NextViT class in tensor form?

部署问题

请问分割模型适合在边缘端部署么？我们的硬件采用nvidia orin，我下载的pretrained model都700多MB了，能部署么？

论文引用

您好，若我想对您的工作进行改进并用于分割任务，那我应该如何与您的工作对比？是利用FPN neck吗或是upernet?

Hosting model ckpts on Hugging Face

Hello 👋 I came across your model in kaggle competitions and was thinking it would be great if it was hosted on Hugging Face Hub model repositories as it addresses many needs around model hosting, and if you feel like it you can also wrap your model with this mixin to make it easily loadable. What do you think?

单GPU训练

当前训练指令：CUDA_VISIBLE_DEVICES=0 bash train.sh 1 --model nextvit_small --batch-size 8 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path ./data
model.to(devices)时报错：RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 556202) of binary: /opt/conda/envs/NextVit/bin/python3
请问是什么原因呢？

no relu before the global pool?

in the classification model of nextvit.py:

why is there no relu before the global pool?
the batchnorm will produce both +ve and -ve values.



    def forward(self, x):
        x = self.stem(x)
        for idx, layer in enumerate(self.features):
            if self.use_checkpoint:
                x = checkpoint.checkpoint(layer, x)
            else:
                x = layer(x)
        x = self.norm(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.proj_head(x)
        return x

博主，fig5的问题

博主，我又来了！就是你那个傅里叶图，是用的np.fft.fft2吗？ff2(图)，np.fft.fft2（图），这个图直接是热力图，还是最后一层网络输出的图？谢谢了！

进行throughout时存在bug

ImportError: cannot import name 'create_logger' from 'logger'

would you like to give us pretrained models in BaiduWangpan? thanks.

would you like to give us pretrained models in BaiduWangpan ? thanks.

add models to Hugging Face Hub

Hi!

Would you be interested in sharing your models in the Hugging Face Hub? The Hub offers free hosting and it would make your work more accessible and visible to the rest of the ML community. We can help you set up a bytedance organization.

Some of the benefits of sharing your models through the Hub would be:

versioning, commit history and diffs
repos provide useful metadata about their tasks, languages, metrics, etc that make them discoverable
multiple features from TensorBoard visualizations, PapersWithCode integration, and more
wider reach of your work to the ecosystem

Creating the repos and adding new models should be a relatively straightforward process if you've used Git before. This is a step-by-step guide explaining the process in case you're interested. Please let us know if you would be interested and if you have any questions.

and you can also setup a gradio demo for your model by following this guide: https://gradio.app/getting_started/

here is a example of a gradio demo: https://huggingface.co/spaces/adirik/OWL-ViT
and the code: https://huggingface.co/spaces/adirik/OWL-ViT/blob/main/app.py

Happy to hear your thoughts,
Ahsen and the Hugging Face team

Please provide CoreML Segmentation pretrained models

This is a request to provide CoreML segmentation pretrained models, as it seems you have already converted them and tested/used in order to get your statistics on the page for the latency of the models. If you could post them it would help me and my team a huge amount. Thanks!

Dockerfile request

Hi eveyone,

did someone already build a dockerfile to run this code. If you don't mind, please share with us. Thank you for helping us.

Best regards,
Xin

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	1.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

关于log时间的疑问

您好！
我注意到在log中，nextvit-small在imagenet上（224x224）训练时，耗时约为4min/epoch；
我自己在8 * A100上进行训练，bsz=8 * 256，耗时约为10min/epoch；
而注意到paper中提到您们工作的训练策略是8 * V100，bsz=8 * 256。
因此想请问是不是有一些不统一的地方在里面，抑或是我训练得有问题？
谢谢！

Some problems about Code

Next-ViT/classification/nextvit.py

Lines 171 to 172 in 922e771

 self.sr = nn.AvgPool1d(kernel_size=self.N_ratio, stride=self.N_ratio) 

 self.norm = nn.BatchNorm1d(dim, eps=NORM_EPS)

Hi! Thank you for your great work! 我想问一下您这里用AvgPool1d是只考虑轴向的信息聚合吗？

关于论文对E-MHSA的空间缩减率的描述疑问

作者您好！

最近我正在研读您的论文。您在论文的3.5节中提到E-MHSA在不同stage的空间缩减率为8,4,2,1. 但是按照我对论文的理解，stage1中是没有E-MHSA模块的（仅有NCB没有NTB）。那您所描述的缩减率为8是什么意思呢？还是我对您的论文哪里理解有误呢？还望您能解惑！
万分感谢。

Error in conversion to ONNX

I used the NextVit as backbone for training a face embedding model and the accuracy was pretty good. But I tried to convert it to onnx and deploy on edge device, however, there was an error like this:

einops.py:314: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  known = {axis for axis in composite_axis if axis_name2known_length[axis] != _unknown_axis_length}

This error prevents me from inference the model and I don't know how to fix this, can anyone help me with this, thanks :D

Weights for the ablation studies

Are there any available weights for the models replacing BN with LN?

out_channels error

When I run the code I get the following error, can you help ?

File "/content/drive/MyDrive/UNINEXT/projects/UNINEXT/uninext/backbone/nextvit.py", line 144, in init
assert out_channels % head_dim == 0
TypeError: unsupported operand type(s) for %: 'list' and 'int'

分类训练的路径问题

--data/
--train/
--false/
img.jpg
--true/
img.jpg
--val/
--false/
img.jpg
--true/
img.jpg
数据集是这个结构，训练代码：bash train.sh 1 --model nextvit_small --batch-size 256 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path ./data
报错：FileNotFoundError: Found no valid file for the classes .ipynb_checkpoints. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp
定位到代码里，
dataset = datasets.ImageFolder(root, transform=transform),root读取不到false或者true的数据
请问怎么修改呢？我理解的是图像的文件夹名称代表的就是类别，训练过程应该是需要路径和类别两个数据才对呀

Patch embedding in each NCB and NTB?

Thank you for sharing a nice architecture.

I'm curious why you adopt "PatchEmbedding function" for every NCB as well as NTB.
(code line:

Next-ViT/classification/nextvit.py

Line 125 in 922e771

self.patch_embed = PatchEmbed(in_channels, out_channels, stride)

Because it looks slightly different from the figure of the paper which added patch embedding for each stage (not for each NCB and NTB).

how to deploy the Next_ViT detection?

how to deploy the Next_ViT detection? use mmdetection tools?

Excellent work!

Hi,
What tool was used to draw Figure 1 in paper(ie. images/result.png ) ?
Was it drawn by Excel ?

Wrong implementation of AvgPool in E_MHSA

Hi!
First of all very interesting paper with good ideas. I've looked into your code and your implementation of AvgPool seems to be wrong. Usually when we speak about AvgPool for images we assume a box filter with kernel_size=(size, size) and stride=size.

In your implementation you

take input of size: (B, C, H, W)
reshape it to (B, H*W, C)
in E_MHSA reshape it to (B, Heads, H * W, C / Heads)
in case of sr_ratio != 1 you reshape to (B, Heads, H * W, C / Heads)
Apply AvgPool 1d to the tensor above with kernel_size=size * size, stride=size * size. This is NOT IDENTICAL to AvgPool2d.

The problem is that in this implementation your performing avg-pooling of rows, and it leads to leakage of information from one border to another and also doesn't use any information in vertical dimension. Below is an example of tensor and it's mean inside your E_MHSA.

BS, DIM, SZ = 1, 1, 4
inp = torch.arange(SZ * SZ).float().view(BS, SZ * SZ, DIM)
E_MHSA(dim=DIM, sr_ratio=2, head_dim=1)(inp)

inp = tensor([
        [ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.]])

mean_inp (after self.sr(_x)): tensor([ 1.5000,  5.5000,  9.5000, 13.5000])

It's obvious it calculate average only inside rows. I would assume you used such implementation to avoid reshaping back to (B, C, H, W) for performing AvgPool2d, but this leads to a totally different pooling.

As a side note I would say that all this (B, C, H, W) -> (B, N, C) -> (B, C, H, W) permutations are not really needed and you could boost speed of your network by re-implementing E_MHSA to work on (B, C, H, W) tensors as inputs.

	self.sr = nn.AvgPool1d(kernel_size=self.N_ratio, stride=self.N_ratio)
	self.norm = nn.BatchNorm1d(dim, eps=NORM_EPS)

bytedance / next-vit Goto Github PK

next-vit's People

Contributors

Stargazers

Watchers

Forkers

next-vit's Issues

Welcome update to OpenMMLab 2.0

Recommend Projects

Recommend Topics

Recommend Org

Jobs