Currently, this repository contains some responses to Artificial Intelligence (mostly in AI Safety) research papers and suggestions for future work that builds on their foundations. These are all works in progress and I hope to eventually expand on these ideas in a blog.
- AI Safety via Debate
- Deep Reinforcement Learning from Human Preferences