menghaoguo / eanet Goto Github PK
View Code? Open in Web Editor NEWExternal Attention Network
External Attention Network
Hello, in this function code, the operation of matrices multiplying between quary and mk (shown in the paper) is not found, how can I explain it?
Hello Menhao,
I have some question about the difference between code and paper.
In the paper, 'Equation (5) is the similarity between the i-th pixel and the j-th rows of M'
, external attention do the attention between pixels, however in the code I think Conv1d
only can do the attention amount only one pixels’ channels, and pixels in F not do attention with pixels in M.
And in the paper 'In fact, we find that a small _S_, e.g. 64, works well in experiments.'
but in the code d is setted to 64 instead of S.
作者您好,阅读您的论文后我非常感兴趣,但是有些疑问,在消融实验中有self attention+double norm这一实验,这里是直接将softmax替换成double norm吗?还是有其他的设置吗,因为我在使用SA+double norm时,出现了评价指标为0的情况。
期待您的答复
Hi,Menghao. I run the command: CUDA_VISIBLE_DEVICES=1,3 python transfer_learning.py and want to verify the performance of EANet on Cifar10/100. But after two rounds of iteration, the loss function remains unchanged. And the value of ACC is always 10.000%. The result is shown in the figure below:
Could you please help me to solve this problem?
Not able to reproduce semantic segmentation result on PASCAL VOC val with MMSegmentation.
I used the code from https://github.com/MenghaoGuo/-EANet/blob/main/model_torch.py and modified nothing despite replacing the backbone with the one in MMSegmentation. After training EANet and PSPNet with several sets of configs, the result is that the mIoU of EANet is always a little bit below PSPNet, e.g. 73 vs 75.
Any suggestions?
It's obvious that the code is wrong. You should check the reshape and transpose part in your code.
And what is the coef? It has not been mentioned in the paper.
And also you didn't import the torch.
And the qkv_bias, qk_scale is not used here.
Did you really run the multi-head version in your paper?
Hello. When I run model_torch.py, I meet this error. How can I deal with it? What is the 'bn_lib'?
Hi, Thanks for releasing the code!
Do you think the representation capability of MEA is lower than EA, since the external memories are shared across different heads?
how to Implement Multi-head
请问预训练模型在哪儿能找到呢?
似乎会很快地丢失信息。
您好,请问main/EAMLP/EAMLP_7_384/args.yaml里面的配置,是8卡的训练配置吗?
could we use conv2d to instead conv1d in ea module?
谢谢你的代码,我挺感兴趣你的课题。但是为什么我把external attention 嵌入到R3d网络中,epoch进行到很少轮之后loss就不再降低了,loss一直下不去。想向您请教请教。
我未能在 t2t_vit 的代码中找到调用 External_attention 实现的地方,能否帮忙指出?
论文表2中提到的 T2T-Transformer ,是否是对应着 T2T_module(tokens_type=...) 参数?公开版本的代码中是否可以/如何将其替换为 EA ?
这个 issue 里提到 #7 (comment) ,使我对表 2 的结果感到困惑,恳请您拨冗阐释清楚。
非常感谢!
Hi,
I see that the new multi-head version doesn 't use resnet connection, which is different from former version.
Nor did you use normalization.
How do you think about this ?
Thank you.
Thanks for sharing this great work. I was reading the paper and trying to understand the mechanism. In the paper, Fig 4 outcomes caught my attention, and wanted to see how such amazing attention has been generated. Is the generated code already released or will be?
Hi, Menhao. How to calculate the MACs mentioned in your paper? Does this have any difference from FLOPs? Thanks.
感谢开源代码.
看到EA的代码里面这里一句,不知道有什么特殊考虑吗?
https://github.com/MenghaoGuo/-EANet/blob/de59e2535f92ce58648a217e21302e42296c8efd/model_torch.py#L198
谢谢~
作者您好,最近阅读了你的论文,产生了浓厚的兴趣,仔细阅读了论文和代码,这里有一个疑问。在multi_head_attention_torch.py文件中,外部注意力机制的实现经过了4个线性层,第一个线性层将Q的维度由dim变成dim×4,中间两层是记忆单元维度64,最后输出将dim×4变回dim,但这样做会增加计算量,比自注意力的计算量大,虽然EA的复杂度是线性的。
上面是我的一点疑问,期待你的答复,谢谢。
先reshape(view)再Conv1d和直接11 Conv2d有什么区别,为什么不直接用11 Conv2d呢?
而且这么看的话,EA和ResNet的bottleneck非常相似,都是沙漏状11 Conv2d中间夹一个模块,只不过ResNet夹的是BN-ReLU-33 Conv2d-BN-ReLU,而EA夹的是double normalization
另外,我把normalization从先在N=H*W维度上softmax再在k=S维度上L1 norm改成只在k=S维度上softmax,在简单的任务上性能没有显著差异,不知道作者有没有做相关的消融实验?(论文中没有提及)
model_torch.py中有一段 from bn_lib.nn.modules import SynchronizedBatchNorm2d这样的程序,我使用时报错,我该怎么解决这个问题?
是否和jittor示例的https://cg.cs.tsinghua.edu.cn/jittor/tutorial/2020-3-17-09-55-segmentation/的resnet实现一致?
最后有金字塔空洞卷积?
Intuitively, the memory units serve as prototypes for different patterns, almost play the same role as a convolution kernel (especially 1*1 conv kernel). From the perspective of mathematical operation, In both cases, the dot product between the feature vector and the memory unit/convolution kernel will be the output.
Hence the question comes that, what are the differences between a memory unit and a convolution kernel?
作者您好,关于multi head attention代码中,self.coef=4,这里的coef=4的作用是什么呢?self.trans_dims = nn.Linear(dim, dim * self.coef)的输入输出是不同维度,但原始self attention中的Q经过现性变换前后维度相同,这是为什么呢。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.