GithubHelp home page GithubHelp logo

Comments (6)

CarloLucibello avatar CarloLucibello commented on June 9, 2024 1

The layer's documentation for the forward pass says:

     (mha::MultiHeadAttention)(q_in, k_in, v_in, [bias]; [mask])
...
mask: Input array broadcastable to size (kv_len, q_len, nheads, batch_size). 
      The mask is applied to the attention scores just before the softmax. 
      See NNlib.make_causal_mask for creating causal masks. Default nothing.

so I think you should reshape as. reshape(mask, (seq_len, 1, 1, batch_size)) or reshape(mask, (1, seq_len, 1, batch_size)). I'm not sure which one of the two is correct.

from flux.jl.

alerem18 avatar alerem18 commented on June 9, 2024

The layer's documentation for the forward pass says:

     (mha::MultiHeadAttention)(q_in, k_in, v_in, [bias]; [mask])
...
mask: Input array broadcastable to size (kv_len, q_len, nheads, batch_size). 
      The mask is applied to the attention scores just before the softmax. 
      See NNlib.make_causal_mask for creating causal masks. Default nothing.

so I think you should reshape as. reshape(mask, (seq_len, 1, 1, batch_size)) or reshape(mask, (1, seq_len, 1, batch_size)). I'm not sure which one of the two is correct.

thanks now it's working

from flux.jl.

CarloLucibello avatar CarloLucibello commented on June 9, 2024

@alerem18 which of the two reshaping is correct in your case?

from flux.jl.

alerem18 avatar alerem18 commented on June 9, 2024

@alerem18 which of the two reshaping is correct in your case?

reshape(mask, (seq_len, 1, 1, batch_size))

from flux.jl.

alerem18 avatar alerem18 commented on June 9, 2024

@alerem18 which of the two reshaping is correct in your case?

reshape(mask, (seq_len, 1, 1, batch_size))

however masking is wrong
it should be in the shape (seq_len, seq_len, 1, batch_size)
but for the (1, seq_len, 1, batch_size) it'll return NaN so pad masking is not currently supported by the layer, i've tried that already

l = reduce(hcat, [[5, 2, 3, 1, 1], [4, 5, 6, 1, 1]])
mask = fill(true, 5, 5, 1, 2)
mask[4:5, :, :, :] .= 0
mask[:, 4:5, :, :] .= 0

emb_layer = Embedding(10, 128)
emb = emb_layer(l)
attn = MultiHeadAttention(128, nheads=2)
attn(emb, mask=mask)[2]

result
`5×5×2×2 Array{Float32, 4}:
[:, :, 1, 1] =
0.326395 0.362849 0.343025 NaN NaN
0.0660359 0.402627 0.0637925 NaN NaN
0.60757 0.234524 0.593183 NaN NaN
0.0 0.0 0.0 NaN NaN
0.0 0.0 0.0 NaN NaN

[:, :, 2, 1] =
0.486156 0.144888 0.532702 NaN NaN
0.2133 0.422068 0.0270071 NaN NaN
0.300544 0.433044 0.440291 NaN NaN
0.0 0.0 0.0 NaN NaN
0.0 0.0 0.0 NaN NaN

[:, :, 1, 2] =
0.0449472 0.396037 0.347837 NaN NaN
0.198215 0.455466 0.0415825 NaN NaN
0.756838 0.148497 0.610581 NaN NaN
0.0 0.0 0.0 NaN NaN
0.0 0.0 0.0 NaN NaN

[:, :, 2, 2] =
0.778366 0.164352 0.220597 NaN NaN
0.0780623 0.445108 0.702782 NaN NaN
0.143571 0.39054 0.0766214 NaN NaN
0.0 0.0 0.0 NaN NaN
0.0 0.0 0.0 NaN NaN`

from flux.jl.

alerem18 avatar alerem18 commented on June 9, 2024

masking with shape (seq_len, 1, 1, batch_size) is ok but with shape (1, seq_len, 1, batch_size) return NaN

from flux.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.