This is a learning repo. In this repo I implement various transformer models from scratch. The goal is to understand the transformer architecture and its working in depth.
This repo is HEAVILY based on the excellent video from Karpathy and his nanoGPT repo
I have made no attempt to optimise these models for production or so that you can load them into HuggingFace. This is purely for learning purposes. '
This is very much based upon the nanoGPT model. The big changes are in the training code
python -m src.train_a_gpt
This repo includes the tiny Shakespeare dataset from the nanoGPT repo. You can download it from the nanoGPT repo or use your own data.