ai-fail-safe / safe-reward Goto Github PK
View Code? Open in Web Editor NEWa prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation
License: MIT License