A minimal reproduction of the github repository lukasberglund/reversal_curse and corresponding paper by Berglund et al. The aim of this repository is to evaluate the reversal curse phenomenon in various language model architectures and explore methods to mitigate it.
An especially interesting paper related to this domain is ROME (Rank-One Model Editing), first described in Locating and Editing Factual Associations in GPT by Meng et al. It turns out fact completion in transformer-based language models can causally traced within specific hidden states, and the facts themselves can be localized in the MLP layers of the transformers. The MLP blocks work as key-value stores, with the last subject token acting as the key, and the MLP output value encoding properties about that key.
- Currently playing around with bi-directional language models including BART and T5 to see if they can automatically capture bidirectional factual associations during training.
- Reverse associations can also be manually inserted after training. Recent work proposes new editing methods that can insert new facts bi-directionally in one go.