Comments (6)
The layer's documentation for the forward pass says:
(mha::MultiHeadAttention)(q_in, k_in, v_in, [bias]; [mask])
...
mask: Input array broadcastable to size (kv_len, q_len, nheads, batch_size).
The mask is applied to the attention scores just before the softmax.
See NNlib.make_causal_mask for creating causal masks. Default nothing.
so I think you should reshape as. reshape(mask, (seq_len, 1, 1, batch_size))
or reshape(mask, (1, seq_len, 1, batch_size))
. I'm not sure which one of the two is correct.
from flux.jl.
The layer's documentation for the forward pass says:
(mha::MultiHeadAttention)(q_in, k_in, v_in, [bias]; [mask]) ... mask: Input array broadcastable to size (kv_len, q_len, nheads, batch_size). The mask is applied to the attention scores just before the softmax. See NNlib.make_causal_mask for creating causal masks. Default nothing.
so I think you should reshape as.
reshape(mask, (seq_len, 1, 1, batch_size))
orreshape(mask, (1, seq_len, 1, batch_size))
. I'm not sure which one of the two is correct.
thanks now it's working
from flux.jl.
@alerem18 which of the two reshaping is correct in your case?
from flux.jl.
@alerem18 which of the two reshaping is correct in your case?
reshape(mask, (seq_len, 1, 1, batch_size))
from flux.jl.
@alerem18 which of the two reshaping is correct in your case?
reshape(mask, (seq_len, 1, 1, batch_size))
however masking is wrong
it should be in the shape (seq_len, seq_len, 1, batch_size)
but for the (1, seq_len, 1, batch_size) it'll return NaN so pad masking is not currently supported by the layer, i've tried that already
l = reduce(hcat, [[5, 2, 3, 1, 1], [4, 5, 6, 1, 1]])
mask = fill(true, 5, 5, 1, 2)
mask[4:5, :, :, :] .= 0
mask[:, 4:5, :, :] .= 0
emb_layer = Embedding(10, 128)
emb = emb_layer(l)
attn = MultiHeadAttention(128, nheads=2)
attn(emb, mask=mask)[2]
result
`5×5×2×2 Array{Float32, 4}:
[:, :, 1, 1] =
0.326395 0.362849 0.343025 NaN NaN
0.0660359 0.402627 0.0637925 NaN NaN
0.60757 0.234524 0.593183 NaN NaN
0.0 0.0 0.0 NaN NaN
0.0 0.0 0.0 NaN NaN
[:, :, 2, 1] =
0.486156 0.144888 0.532702 NaN NaN
0.2133 0.422068 0.0270071 NaN NaN
0.300544 0.433044 0.440291 NaN NaN
0.0 0.0 0.0 NaN NaN
0.0 0.0 0.0 NaN NaN
[:, :, 1, 2] =
0.0449472 0.396037 0.347837 NaN NaN
0.198215 0.455466 0.0415825 NaN NaN
0.756838 0.148497 0.610581 NaN NaN
0.0 0.0 0.0 NaN NaN
0.0 0.0 0.0 NaN NaN
[:, :, 2, 2] =
0.778366 0.164352 0.220597 NaN NaN
0.0780623 0.445108 0.702782 NaN NaN
0.143571 0.39054 0.0766214 NaN NaN
0.0 0.0 0.0 NaN NaN
0.0 0.0 0.0 NaN NaN`
from flux.jl.
masking with shape (seq_len, 1, 1, batch_size) is ok but with shape (1, seq_len, 1, batch_size) return NaN
from flux.jl.
Related Issues (20)
- Illegal Memory Access Error During Gradient Calculation of predefined losses on GPU RTX 4050 HOT 1
- Unnecessarily using shared GPU memory HOT 8
- Flux installation error under Julia 1.10 on Apple Silicon HOT 2
- Given that DataLoader implements `length` shouldn't it also be able to provide size? HOT 4
- The dedicated tutorial on DataLoader is missing HOT 2
- Incorrect link on docs HOT 4
- Hard error using dice loss HOT 2
- Compilation time of Flux models HOT 1
- Flux.setup buggy and broken in latest v.0.13.17 HOT 3
- example for using apple GPU with flux HOT 4
- Dimensions check for `Conv` is incomplete, leading to confusing error HOT 1
- 2x performance regression due to 5e80211c3302b5e7b79b4f670498f5a68af6659b HOT 2
- Why is Flux.destructure type unstable? HOT 3
- bad formatting for PairwiseFusion docstring HOT 1
- Zero-sized arrays cannot be applied to Dense layers. HOT 4
- Adding Simple Recurrent Unit as a recurrent layer
- Collecting PyTorch -> Flux migration notes
- tests are failing due to ComponentArrays HOT 2
- deprecate Flux.params HOT 7
- Significant time spent moving medium-size arrays to GPU, type instability HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flux.jl.