david-lindner / safe-grid-gym Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 9.0 3.34 MB

A gym interface for AI safety gridworlds created in pycolab.

License: Apache License 2.0

Python 100.00%

safe-grid-gym's People

Contributors

Stargazers

Watchers

Forkers

jvmncs timorl dhruvramani mariyagcv rk1a stewy33 pallottaenrico volpix28 masonn808

safe-grid-gym's Issues

There's something wrong with the Toy Gridworlds...

Something about the toy gridworlds is broken. It looks like a tabular Q-learner learns the worst-case policy instead of the best case one, despite thoroughly exploring the environment.

For reference, here's the command I'm using:
python main.py -L test-log/<env_name> -E 1000 -V 100 -EE 50 <env_name> tabular-q -l .2 -dl 6000

When <env_name> is set to sokoban:

When <env_name> is set to way:

When <env_name> is set to corners:

AI Safety Gridworlds version 1.3

DeepMind has pushed Version 1.3 to the upstream repo. From a brief glance at the commit history there don't seem to be any breaking changes, but it might be worth a quick test to see if we can pull the changes in.

Toy Gridworlds Rendering Feedback

There seems to be undesirable behavior in the way the toy gridworlds are rendering in Tensorboard. Since the AI Safety Gridworlds rendering were functioning normally before, I'm assuming this is something to do with code over here. See figure below for example rendering.

Specifically, I feel like the following is unwanted behavior:
(1) multiple agents in each frame (maybe this is a general safe-grid-gym thing, or something to do with a parameter in safe-grid-agents?)
(2) overlapping rendered characters (e.g. the A, S values at the bottom), but also seems related to (1)
(3) only rendering partial trajectories (ideally we'd see the agent go from the initial state to the end state for each trajectory)
(4) what do A and S represent? they seem ambiguous, so as a user I don't know how to interpret those numbers

Register gym environments

Avoids having to import the individual environments

Add transitions to state information

For experiments using safe-grid-agents we want to consider the boat race environment using transitions instead of states.

To implement this, we can just add a parameter to the GridworldEnv class, that if activated causes the observations to be a concatenation of the last state the agent was in and the current state, instead of just the current state.

Set up travis

Should check formatting and run tests

Write some tests for the toy environments

Tests are great!

dtype of board is forced to float32

When converting a position into an observation the dtype of the board is forced to float32. This breaks the assertion that the resulting observation is within the observation space for the toy envs.

Gym wrapper for cheating

Perhaps rather than making new environments for cheating we could have a wrapper. I think this would be cleaner both here and in usage.

Merge with toy gridworlds

To avoid code duplication perhaps we could merge this repo with the toy gridworlds one (whichever way is more convenient). This would be especially nice for having a consistent interface for corrupted gridworlds and would also make solving #6 easier.

Runtime errors in new versions of gym(nasium)

Can you apply the following patch fixes runtime errors caused by:

* gym being renamed to gymnasium
* changed gymnasium API requiring calling reset() before step()
* incorrect value of metadata

fix-runtime-errors.txt

I attach it as txt because github has some bug and does not allow me to name it <something>.patch.

Wrong Gym version in Travis cache

The current version of Gym in our Travis cache is 0.11.0, however we're using 0.10.9. This causes the build to break as in #23.

We should either modify the code to work with 0.11.0, or delete the current cache. We should also have it check for changes to the setup.py and overwriting the previous cache before restoring pip caches in the future.

Remove cheat argument from safety gridworlds wrapper

The cheat argument is no longer necessary, because cheating is handled in the safe-grid-agents repo, as discussed in #6

david-lindner / safe-grid-gym Goto Github PK

safe-grid-gym's People

Contributors

Stargazers

Watchers

Forkers

safe-grid-gym's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs