Short collaboration with Cat Mitelut studying how Transformers 1-layer Transformer models converge on the Bayesian-optimal solution for simple statistical-inference problems.
alejoacelas / bayesian-transformers Goto Github PK
View Code? Open in Web Editor NEWInterpretability on 1-layer Transformer models that converge on the Bayesian-optimal solution for statistical tasks
License: MIT License