GithubHelp home page GithubHelp logo

Comments (5)

isaacto avatar isaacto commented on May 12, 2024 2

看了 the illustrated transformer,明白了。

首先,Keras 的 batch_dot 只会做 2 阶的 dot,就是矩阵积,就是 array 是超过三阶时也一样,最前面的 (n-2) 个维度都被看成是 "batch"。这里 Keras 的文档写得不好。

然后,Transformer 确是要做矩阵积,而不是内积。每个 query vector 都会跟同一个 attention head 的所有其他 key vector 求积然后加起来。没看 illustrated transformer 看不明白。

所以代码没有问题,可以 close issue,打搅了。

from rasa_chatbot_cn.

isaacto avatar isaacto commented on May 12, 2024

再尝试多了一点,代码在 Tensorflow 下跑没有问题,但 Keras 的 batch_dot 在 tensorflow 和 theano 下效果不一致,在 theano 下只有第一个维度当成为 batch,所以会出现一开始所说的不正常计算结果,就是不同的 attention head 之间也会做矩阵积。

from rasa_chatbot_cn.

isaacto avatar isaacto commented on May 12, 2024

例子如下:

import numpy as np
import keras.backend as K

A = np.int_(
    [[[[1, 2],
       [5, 6]],
      [[3, 4],
       [7, 8]]]])
B = np.int_(
    [[[[7, 8],
       [11, 13]],
      [[5, 3],
       [4, 1]]]])
varA = K.variable(value=A)
varB = K.variable(value=B)
print("A batch. B\n%s" % K.eval(K.batch_dot(varA, varB, axes=[3, 3])))

Tensorflow 下执行结果:

Using TensorFlow backend.
A batch. B
[[[[ 23.  37.]
   [ 83. 133.]]

  [[ 27.  16.]
   [ 59.  36.]]]]

Theano 下执行结果:

Using Theano backend.
A batch. B
[[[[[ 23.  37.]
    [ 11.   6.]]

   [[ 83. 133.]
    [ 43.  26.]]]


  [[[ 53.  85.]
    [ 27.  16.]]

   [[113. 181.]
    [ 59.  36.]]]]]

from rasa_chatbot_cn.

johnny12150 avatar johnny12150 commented on May 12, 2024

現在的batch_dot在 tensorflow和 theano 下效果應該一致了
之前用tf backend以下code的output shape是(9, 8, 7, 4, 5)

from keras import backend as K
a = K.ones((9, 8, 7, 4, 2))
b = K.ones((9, 8, 7, 2, 5))
c = K.batch_dot(a, b)
print(c.shape)

現在也變成(9, 8, 7, 4, 8, 7, 5)

from rasa_chatbot_cn.

GaoQ1 avatar GaoQ1 commented on May 12, 2024

这块我还没涉及到,所以你能提交个pr吗

from rasa_chatbot_cn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.