minGPT-with-BigBird
DD2412 project at KTH by Leo Hiselius, Jonas Thunberg and Alfons Heintz, {leohi, jonthu, alfonsh}"at"kth.se
This project strives to combine two models:
- The minGPT model, a light weight implementation of iGPT by Andrej Karpathy published under the MIT license
- The BigBird attention masking developed by Zaheer et. al
Notes for devs
Maybe useful example of autograd on sparse matrices in comments
Video on BigBird, timestamp on block/roll implementation of sparse attention