Transformer language model written in pytorch and with the help of firelab
library.
- Cached inference
- Telegram bot integration
- Sample from model with temperature (instead of taking max)
- Sample from 70% of top words only
- Beam search
- Share layer weights in Transformer (to make it universal)
- LR scheduling