Light

weidler / rlaspa Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 1.0 2.97 MB

Reinforcement Learning in Latent Space

License: MIT License

Python 99.64% Shell 0.36%

rlaspa's Introduction

Hello there 👋 I hope you like what you find here 🤓

class Researcher(CuriousHuman):

    def __init__(self):
        self.name = "Tonio Weidler"
        self.role = "PhD Candidate"
        self.affiliation = "Maastricht University"
        self.residence = "The Netherlands"

        self.code = [
            "Python",
            "JavaScript",
            "Java",
            "PHP"
        ]

        self.research_field = "Neuroscience"
        self.research_topics = [
            "Sensorimotor Control",
            "Human Dexterity",
            "Goal-Driven Models",
            "Deep Learning"
        ] 

        self.language_spoken = ["de_DE", "en_US"]

    def greet(self, name):
        print(f"Hello there, {name}! I hope you like what you find here :)")

rlaspa's People

Contributors

Stargazers

Watchers

Forkers

alessandroscoppio

rlaspa's Issues

Exploration strategy

I have changed the exploration strategy to something similar of what we have talked. Now, when you instantiate an agent the following parameters will can be configured:

init_eps: Initial epsilon. Default:1.0.
min_eps: Minimal epsilon. Default: 0.01.
eps_decay: Number of steps for epsilon convergence to the minimal value since the use of the memory. Default: 500
per_init_eps_memory: percentage of the initial epsilon that will remain when the memory starts to be used. Default: 0.8

So, for now, when the agent is not using the memory a linear decay is used that will go from the init_eps to init_eps * per_init_eps_memory. In the default case from 1.0 to 0.8.
Then, when the memory is used the exponential decay will be used. It will go from init_eps * per_init_eps_memory to min_eps, having as the reference point the step the memory starts to being used.

I have been looking other methods for exploration like the Boltzmann one but I do not thing that for our simple tasks will suppose an improvement and imply greater changes. However, it could be done quickly.

Implement SiameseAutoencoder

Make tasks implement the gym interface

To achieve 100% compatibility in the framework between the tasks and the agents, we need to make our own tasks adhere to the requirements of the gym package. E.g., our tasks should be visualized using render().

Use 4 frames of Race/Evasion as one state

Look at the heads of the Janus and Cerberus architecture

Check whether the heads are actually doing what they're supposed to learn

Learn ObstaclePathing in latent space

Being able to save and continue training a model

Training takes a long time. If it crashes intermediately we should be able to continue where it crashed without restarting the whole training.

Make tensorboard logs get their own directories

Currently it gets a mess when there are multiple logs in the log folder and you can't rename them. Would be nice if each log creates its own folder, named according to some settings of the experiment.

Create statistics of training time and performance

In order to compare our different approaches we need to collect statistics of the training.

log policy learner
log repr learner

Create new Tasks

Create different tasks:

Race task (Vertical Scroller)
Evasive Task (Horizontal Scroller)
Evasion with bigger walls
Evasion with a "tunnel"

Verify if GPU is indeed utilized

Verify whether if torch.cuda.is_available(): torch.set_default_tensor_type('torch.cuda.FloatTensor') doing what's intended

Make Race/Evasion tasks customizable (kwargs for the register)

So we do not have to always change defaults.

Allow the DQN Policy to backpropagate through the representation module

Implement

Double Check the Workings of all Networks

I noticed that in some cases, the use of activation functions appears very random. For example, in PixelEncoders we had sigmoid in the representation but not in the heads. The latter should definitely be the case though.

Create different tasks/obstacle maps to "transfer"

By now we only train on one obstacle map. This will in the long run result in overfitting. That's why we should create different maps that can be created when env.reset() is called.

Create template for experiments in Wiki

Check if input is as expected and throw meaningful errors.

Especially when using batches in the networks, there can be easy mistakes be made when feeding input. We should therefore at crucial places add checks that test whether the given format is what the following code expects. E.g. when feeding networks one can give a one-dimensional input. This should be prohibited if the network wants to deal with batches.

Batch Learning in Representation Learners

Include MultiTask Possibility into Framework

Probably best done using a list of environments instead of a single one and than having some way of alternating between the environments.

Tensorize everything

We already are tensorizing the batches and everything, but there are still functions like cast_float_tensor().
Furthermore, we have to make sure that the methods receive a tensor. See #23

Remove usage of cast_float_tensor()
Type hinting for tensors in learners and representation
Type hinting in pixelencoders (needed?)
Tensorize Pathing tasks

Document Code, include type hinting!

Its a bit difficult to always see what is returned and what format of input is required. To make thinks foolproof we should include docstrings and typehinting.

Have rendered testruns automatically saved to a gif file

Would greatly help shareablility

Create DQN framework

qwdqwd

Try to learn ObstaclePathing

Learn obstacle pathing using state of the agent
Learn obstacle pathing using visual input

Add agent.save() and agent.load()

Make pathing task use numpy arrays for faster conversions

By doing #4 I realized that the representation is in Lists. It may be faster to have it in numpy arrays instead. I don't know how hard or if hard at all, but could you maybe look into it?

Prepare job script for Aachen cluster

Also prepare a list of actions needed for connecting and running jobs

job script
centralize job output file

Test RL agent on (padded) two-task latent representation

Try to learn Pathing in latent space

Develop a better metric for the progress report

Especially in the racing tasks where one mistake means a loss, we need a better way to indicate learning progress than episode length, because the exploration is too destructive.

Modular Framework for different RL Approaches

Work on a modular framework for the history and parallel approach s.t. We can use and extend different approaches without having to reimplement repetitive parts again and again. Early work for q tables is already done. I will work on extending on that the following days.

Modular means having wrapper architectures to which we can give modules e.g.

q learner
representation learner
task environment

And the architecture combines them (history or parallel).

Check how DQN memory influence training when using ckpts, eventually save memory in ckpts as well.

save total_steps and memory

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble