sergia-ch / causality-disentanglement-rl Goto Github PK

Simple world models lead to good abstractions, Google Cerebra internship 2020/master thesis at EPFL LCN 2021 ⬛◼️▪️🔦

Home Page: https://www.overleaf.com/read/mmmbrrvnkffq

License: Creative Commons Zero v1.0 Universal

Python 88.56% Shell 0.17% Mathematica 11.28%

causality causal-reinforcement-learning abstraction-learning sparsity consciousness-prior tf-agents tensorflow model-sparsity primal-dual model-based-reinforcement-learning pytorch thesis epfl

causality-disentanglement-rl's Issues

Use visualization from the notebook in tensorboard and sacred

Add checkpoints

Make the system less chaotic & loss landscape less non-convex

Problem: https://docs.google.com/presentation/d/1iYfbMrMZUn88cet36qe0h-2oHG9HpbFfEZHTVp4xlik/edit

Possible solution:

train the D with sparsity loss on the model + reconstruction loss where the model is fixed
train the M with sparsity loss + reconstruction loss where D is fixed

Train a stock agent on transformed representation to a full return

Now in 761e211 DQN fails to converge on an environment with a 2x2 linear transform, while it's fine without the transform. The reason might be either that there were too few steps given (now 100 steps x 256 episodes, or that there is a problem with the environment being non-Markov).

Concrete step to try: try a linear q-network and see if it converges to different results with and without transform.

Next step: try policy gradients as they do not require to learn the value (which might not exist if the environment is not Markov), only the policy (which is simple -- compare two numbers and choose a corresponding action)

@jbrea adding you to the issue so that you learn faster about the progress.

Add a diagram of networks/losses/optimizers into README

Create and run hyperparameter study

Different losses: reconstruction or matrix norm or both
Feature space or observation space fit loss and reconstruction loss
Two, three or one optimizer [tried manually]
Learning rate, dimension search
Annealer grid search
Batch norm / no batch norm after decoder
Add a metric computing the losses with thresholded model
Norm order p
Projection or loss term

Add a button to KeyChest environment

Currently, lamp switches if the player is at its position
In the original environment, there was a separate button for that.

sergia-ch / causality-disentanglement-rl Goto Github PK

causality-disentanglement-rl's Issues

Use visualization from the notebook in tensorboard and sacred

Add checkpoints

Make the system less chaotic & loss landscape less non-convex

Train a stock agent on transformed representation to a full return

Add a diagram of networks/losses/optimizers into README

Create and run hyperparameter study

Add a button to KeyChest environment

Fix memory leak

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs