GithubHelp home page GithubHelp logo

flatten-transformer's People

Contributors

tian-qing001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

flatten-transformer's Issues

Numerical Instability in z & kv

When I plug in FocusedLinearAttention and try to train in mixed precision. It got nan from the start. I found that the Denominator of z and the value of kv are quite large. It can easily explode in float16.

Another question, have you tried identity/convolution instead of depthwise convolution in the attention block?

Is "focused" really true?

Thank you for your excellent work.
I have a question about whether the feature map is truly sharper with the 'focused function' compared to the standard Transformer.
From the perspective of the "pull" you mentioned, it seems to be more sharper and this is what figure 4 shows. However, I noticed the image of ball in Figure 3, its feature maps appear smoother than the original softmax attention.
In a simple experiment, focused functions gave smoother feature map distributions versus standard softmax. This is confusing to me.

So I'm curious whether focused attention has a more centralized feature map than softmax attention in practical use.
I appreciate any insights you can provide.

image

There is a little experiment.
import torch
torch.manual_seed(5)
d = 4
n = 5
Q = torch.tensor([[1.,2.,4.,9.]]) # Qi
K = torch.randint(1,9,(d,n)).to(torch.float)

att = torch.einsum('ik,kj->ij',Q,K)
att_softmax = torch.softmax(att,dim=1)
print(f'softmax-attention is: {att_softmax}')

def flatten(x):
y = torch.pow(x,3) # x**3
return y*torch.norm(x)/torch.norm(y)

Q_ = flatten(Q)
for i in range(n):
K[:,i] = flatten(K[:,i])
att = torch.einsum('ik,kj->ij',Q_,K)
att = att/torch.sum(att)
print(f'Flatten-attention is: {att}')

There is one place where the code is not well understood.

kv = (k.transpose(-2, -1) * (N ** -0.5)) @ (v * (N ** -0.5))
I can understand that N is for scaling dot product attention. But the shapes of k and v have been changed by sr_ration to (B, N1, C), i.e. N1 = N/sr_ration**2. So shouldn't it be, to add an If self.sr_ratio>1, to deal with this part, to replace N with N1. Since this part isn't mentioned in the paper, it's just my personal understanding and I hope that you will be able to explain it, thank you very much.

typical implementation form?

I am attempting to reproduce your theory on a general attention mechanism, specifically by replacing softmax with flatten. However, I am having difficulty understanding the improvements made in Swin Transformer (SwinT) and Pyramid Vision Transformer (PVT). Can you provide a common implementation form.
thanks a lot!

只使用dw conv的实验结果

感谢你们出色的工作!注意到加了dw conv之后性能有十分显著的提升,请问有单独使用dw conv实验的结果吗?这篇工作中提到将注意力模块替换为dw conv可以达到与原本相近甚至更好的效果,所以我很好奇dw conv在flatten中所起的作用。

使用softplus对ReLU(query)进行缩放

models.flatten_pvt.py文件中的124-129行代码:
kernel_function = nn.ReLU()
scale = nn.Softplus()(self.scale)
q = kernel_function(q) + 1e-6
k = kernel_function(k) + 1e-6
q = q / scale
k = k / scale
您好,我对您的研究成果非常感兴趣。可以给我解释一下,您为什么要在ReLU(query)之后使用softplus进行缩放么,谢谢。

迁移到下游任务,训练不收敛问题

感觉您杰出的工作。
我尝试将您在论文中所提出的聚焦线性注意力模块,迁移到下游的去雾任务中,但是在模型的训练过程中,出现模型参数为nan的问题,然而使用swin替换FLatten模块则不会出现这样的问题,希望能够得到您的解答!

Request for DeiT-Tiny model definition.

Hi @tian-qing001

First, thank you for sharing your wonderful work as an open source. I appreciate your work.

Question: Can you share the flatten transformer implementation applied on the DeiT-T model?

I am trying to reproduce table 3 of the paper (shown in the below figure). If you could provide the implementation of your method on DeiT-T, it would help me a lot.
image

Thank you.

Hankyul.

visualization

First of all, thank you very much for your work. Could you please explain how to use the visualizer file? Also, how exactly is the rank of the attention matrix visualized? Once again, thank you for your efforts

注意力图可视化

感谢您杰出的工作,我对FLatten的工作非常感兴趣,我尝试使用与文中类似的网格可视化注意力图,但是每一个patch相较于原图尺寸是极小的,请问文中的效果是如何得到的,能否提供相应的可以话代码。希望能够得到您的回复!
image

关于可视化的一些问题

论文中出现的DeiT-tiny的Attention Map的可视化是如何完成的,想问一下有没有相关的代码可以参考一下,感谢
1

Subject: Inquiry About Lightweight Feature Extraction with Your Attention Mechanism

I hope this message finds you well. I recently read your impressive paper on [FLatten Transformer: Vision Transformer using Focused Linear Attention], and I must say I was truly amazed by your work.

I am currently working on a task related to feature point extraction and matching, and my focus is on developing lightweight models. I am particularly interested in whether it would be feasible to replace the standard self-attention mechanisms in backbone networks with the attention mechanism you proposed in your research.

I would be grateful for your insights or suggestions on this approach. I apologize for any inconvenience my inquiry might cause and look forward to your response.

Thank you very much, and best wishes.

分割数据集组织形式是什么?

   作者您好,非常荣幸看到您的文章,并且还加入数学公式的解释,以前我看到的文章多数都没写数学公式和实际添加模块的对应关系,所以看到您的文章,让我学到很多。我看文章的测试数据中有分类数据集的组织形式,它是不同文件夹代表不同类,但是分割数据集的形式是什么样子的呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.