david-lindner / safe-grid-gym Goto Github PK
View Code? Open in Web Editor NEWA gym interface for AI safety gridworlds created in pycolab.
License: Apache License 2.0
A gym interface for AI safety gridworlds created in pycolab.
License: Apache License 2.0
Something about the toy gridworlds is broken. It looks like a tabular Q-learner learns the worst-case policy instead of the best case one, despite thoroughly exploring the environment.
For reference, here's the command I'm using:
python main.py -L test-log/<env_name> -E 1000 -V 100 -EE 50 <env_name> tabular-q -l .2 -dl 6000
When <env_name>
is set to sokoban
:
DeepMind has pushed Version 1.3 to the upstream repo. From a brief glance at the commit history there don't seem to be any breaking changes, but it might be worth a quick test to see if we can pull the changes in.
There seems to be undesirable behavior in the way the toy gridworlds are rendering in Tensorboard. Since the AI Safety Gridworlds rendering were functioning normally before, I'm assuming this is something to do with code over here. See figure below for example rendering.
Specifically, I feel like the following is unwanted behavior:
(1) multiple agents in each frame (maybe this is a general safe-grid-gym thing, or something to do with a parameter in safe-grid-agents?)
(2) overlapping rendered characters (e.g. the A, S values at the bottom), but also seems related to (1)
(3) only rendering partial trajectories (ideally we'd see the agent go from the initial state to the end state for each trajectory)
(4) what do A and S represent? they seem ambiguous, so as a user I don't know how to interpret those numbers
Avoids having to import the individual environments
For experiments using safe-grid-agents
we want to consider the boat race environment using transitions instead of states.
To implement this, we can just add a parameter to the GridworldEnv
class, that if activated causes the observations to be a concatenation of the last state the agent was in and the current state, instead of just the current state.
Should check formatting and run tests
Tests are great!
When converting a position into an observation the dtype of the board is forced to float32
. This breaks the assertion that the resulting observation is within the observation space for the toy envs.
Perhaps rather than making new environments for cheating we could have a wrapper. I think this would be cleaner both here and in usage.
To avoid code duplication perhaps we could merge this repo with the toy gridworlds one (whichever way is more convenient). This would be especially nice for having a consistent interface for corrupted gridworlds and would also make solving #6 easier.
Can you apply the following patch fixes runtime errors caused by:
* gym being renamed to gymnasium
* changed gymnasium API requiring calling reset() before step()
* incorrect value of metadata
I attach it as txt because github has some bug and does not allow me to name it <something>.patch
.
The current version of Gym in our Travis cache is 0.11.0, however we're using 0.10.9. This causes the build to break as in #23.
We should either modify the code to work with 0.11.0, or delete the current cache. We should also have it check for changes to the setup.py and overwriting the previous cache before restoring pip caches in the future.
The cheat argument is no longer necessary, because cheating is handled in the safe-grid-agents
repo, as discussed in #6
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.