leaplabthu / flatten-transformer Goto Github PK

View Code? Open in Web Editor NEW

378.0 378.0 22.0 1.49 MB

Official repository of FLatten Transformer (ICCV2023)

Python 100.00%

flatten-transformer's People

Contributors

Stargazers

Watchers

flatten-transformer's Issues

您好，您的工作非常有意义和参考价值，我像用来嫁接到CNN特征提取网络中，用于P345特征图之后，做一注意力机制，请问您代码支持吗？非常期待你的回信！十分感谢

The config you provided is valid, can you renew them?

Thank you for your work in how to utilize the linear attention. But the config you provided is valid, like https://github.com/LeapLabTHU/cfgs/flatten_pvt_t.yaml, can you renew them?

Numerical Instability in z & kv

When I plug in FocusedLinearAttention and try to train in mixed precision. It got nan from the start. I found that the Denominator of z and the value of kv are quite large. It can easily explode in float16.

Another question, have you tried identity/convolution instead of depthwise convolution in the attention block?

Is "focused" really true?

Thank you for your excellent work.
I have a question about whether the feature map is truly sharper with the 'focused function' compared to the standard Transformer.
From the perspective of the "pull" you mentioned, it seems to be more sharper and this is what figure 4 shows. However, I noticed the image of ball in Figure 3, its feature maps appear smoother than the original softmax attention.
In a simple experiment, focused functions gave smoother feature map distributions versus standard softmax. This is confusing to me.

So I'm curious whether focused attention has a more centralized feature map than softmax attention in practical use.
I appreciate any insights you can provide.

There is a little experiment.
import torch
torch.manual_seed(5)
d = 4
n = 5
Q = torch.tensor([[1.,2.,4.,9.]]) # Qi
K = torch.randint(1,9,(d,n)).to(torch.float)

att = torch.einsum('ik,kj->ij',Q,K)
att_softmax = torch.softmax(att,dim=1)
print(f'softmax-attention is: {att_softmax}')

def flatten(x):
y = torch.pow(x,3) # x**3
return y*torch.norm(x)/torch.norm(y)

Q_ = flatten(Q)
for i in range(n):
K[:,i] = flatten(K[:,i])
att = torch.einsum('ik,kj->ij',Q_,K)
att = att/torch.sum(att)
print(f'Flatten-attention is: {att}')

pre-trained model about cswin

Hello, your work is amazing. Could you provide a pre-trained model for cswin-small?

There is one place where the code is not well understood.

kv = (k.transpose(-2, -1) * (N ** -0.5)) @ (v * (N ** -0.5))
I can understand that N is for scaling dot product attention. But the shapes of k and v have been changed by sr_ration to (B, N1, C), i.e. N1 = N/sr_ration**2. So shouldn't it be, to add an If self.sr_ratio>1, to deal with this part, to replace N with N1. Since this part isn't mentioned in the paper, it's just my personal understanding and I hope that you will be able to explain it, thank you very much.

论文中Figure 7. Accuracy-Runtime curve 是怎么画出来的？

非常感谢你们的贡献，我在复现过程中发现了这个评估图：

想知道如何绘制出来，能否提供相应代码，谢谢。

can not download pretrained weight

Hi,

thanks for the great job! I encountered an issue with inaccessible web pages while downloading pre training weights

Best,
James

typical implementation form？

I am attempting to reproduce your theory on a general attention mechanism, specifically by replacing softmax with flatten. However, I am having difficulty understanding the improvements made in Swin Transformer (SwinT) and Pyramid Vision Transformer (PVT). Can you provide a common implementation form.
thanks a lot!

Numerical Instability

我想问实验效果真的很一般而且也不清楚为什么会这样

只使用dw conv的实验结果

感谢你们出色的工作！注意到加了dw conv之后性能有十分显著的提升，请问有单独使用dw conv实验的结果吗？这篇工作中提到将注意力模块替换为dw conv可以达到与原本相近甚至更好的效果，所以我很好奇dw conv在flatten中所起的作用。

使用softplus对ReLU(query)进行缩放

models.flatten_pvt.py文件中的124-129行代码：
kernel_function = nn.ReLU()
scale = nn.Softplus()(self.scale)
q = kernel_function(q) + 1e-6
k = kernel_function(k) + 1e-6
q = q / scale
k = k / scale
您好，我对您的研究成果非常感兴趣。可以给我解释一下，您为什么要在ReLU(query)之后使用softplus进行缩放么，谢谢。

迁移到下游任务，训练不收敛问题

感觉您杰出的工作。
我尝试将您在论文中所提出的聚焦线性注意力模块，迁移到下游的去雾任务中，但是在模型的训练过程中，出现模型参数为nan的问题，然而使用swin替换FLatten模块则不会出现这样的问题，希望能够得到您的解答！

A question about the mask of FocusedLinearAttention in flatten_swin.py

Thank you for your excellent work.

I have a question about whether FocusedLinearAttention (in flatten_swin.py) uses a mask and why.

Request for DeiT-Tiny model definition.

Hi @tian-qing001

First, thank you for sharing your wonderful work as an open source. I appreciate your work.

Question: Can you share the flatten transformer implementation applied on the DeiT-T model?

I am trying to reproduce table 3 of the paper (shown in the below figure). If you could provide the implementation of your method on DeiT-T, it would help me a lot.

Thank you.

Hankyul.

visualization

First of all, thank you very much for your work. Could you please explain how to use the visualizer file? Also, how exactly is the rank of the attention matrix visualized? Once again, thank you for your efforts

I want to implement cross-attention, such as images as Q and text data as K and V, but the feature_map dimension calculated by dwc will not match x. Do you have any insights on this?

注意力图可视化

感谢您杰出的工作，我对FLatten的工作非常感兴趣，我尝试使用与文中类似的网格可视化注意力图，但是每一个patch相较于原图尺寸是极小的，请问文中的效果是如何得到的，能否提供相应的可以话代码。希望能够得到您的回复！

关于可视化的一些问题

论文中出现的DeiT-tiny的Attention Map的可视化是如何完成的，想问一下有没有相关的代码可以参考一下，感谢

Hello，Could you provide more pretrained weights for PVT-V2，such as PVT_V2_b4.

I want to use PVT-V2 models with different scales for model testing.Thanks!

Subject: Inquiry About Lightweight Feature Extraction with Your Attention Mechanism

I hope this message finds you well. I recently read your impressive paper on [FLatten Transformer: Vision Transformer using Focused Linear Attention], and I must say I was truly amazed by your work.

I am currently working on a task related to feature point extraction and matching, and my focus is on developing lightweight models. I am particularly interested in whether it would be feasible to replace the standard self-attention mechanisms in backbone networks with the attention mechanism you proposed in your research.

I would be grateful for your insights or suggestions on this approach. I apologize for any inconvenience my inquiry might cause and look forward to your response.

Thank you very much, and best wishes.

分割数据集组织形式是什么？

   作者您好，非常荣幸看到您的文章，并且还加入数学公式的解释，以前我看到的文章多数都没写数学公式和实际添加模块的对应关系，所以看到您的文章，让我学到很多。我看文章的测试数据中有分类数据集的组织形式，它是不同文件夹代表不同类，但是分割数据集的形式是什么样子的呢？

Hello, I am very interested in your work.

Could you release the code for semantic segmentation?

pretrained weights

Will you provide any other pretrained weights?

There is an error in the flatten_pvt.py file code

fix patch_embed1 which was wrong

It is hoped that more pretraining parameter models can be provided for PVT-V2

First of all, thank you very much for your work. I am greatly inspired by your work. Due to the lack of equipment in the laboratory, I cannot reproduce your experiment. Could you please provide more information about the improved pre-training model of PVT-v2? Thank you very much.

leaplabthu / flatten-transformer Goto Github PK

flatten-transformer's People

Contributors

Stargazers

Watchers

Forkers

flatten-transformer's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs