awjuliani / meta-rl Goto Github PK

View Code? Open in Web Editor NEW

401.0 401.0 110.0 834 KB

Implementation of Meta-RL A3C algorithm

License: MIT License

Python 15.06% Jupyter Notebook 84.94%

reinforcement-learning tensorflow

meta-rl's Issues

A3C-Meta-Grid did not work for CartPole

Hi awjuliani,

I have enjoyed reading your excellent examples. I tried to modify your code of A3C-Meta-Grid to solve CartPole problem, but it did not work. Even when I remove the RNN layer, it still did not work. It roughly can score about 60 points at maximum. Do you know why?

 Thanks!

Why remove n-steps function?

Hello, I have question. If I use n-step to updated episode-buffer which will work better?

I thinks off-policy will work better than on-policy. sorry I haven't understand meta-learning, I just want ask your opinion

Slicing bandit image in helper.py

Hi, thanks for sharing this very interesting project !!!

I think I might have found a typo/bug?

Traceback (most recent call last):
  File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-9-be9dc3580d09>", line 33, in <lambda>
    worker_work = lambda: worker.work(gamma,sess,coord,saver,train)
  File "<ipython-input-4-804d524ea4f0>", line 94, in work
    episode_frames.append(set_image_bandit(episode_reward,self.env.bandit,a,t))
  File "/home/ajay/PythonProjects/Meta-RL/helper.py", line 65, in set_image_bandit
    bandit_image[115:115+values[0]*2.5,20:75,:] = [0,255.0,0]
TypeError: slice indices must be integers or None or have an __index__ method

On another note I'd like to try to apply this code to different problems. The problem I'd like to examine is where the agent tries to attempt to make small logic circuits at each time-step. Mathematically I'd like agent to be able to output a simply transition probability matrix. That is, rather than a distribution over an action set, the agent's policy network outputs a small square matrix, say 8x8 with a 1 on each row. I guess I could somehow make a_dist be vector of length 64, and then apply 8 softmax's followed by one_hots or argmax's, but it seems rather messy? Also I'm not sure how this effects the "responsible output" in the loss calculation? Just wondered if you've ever seen something like that before? Cheers, Ajay

It gave me a error:
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Thank you very much for replay

awjuliani / meta-rl Goto Github PK

meta-rl's Issues

A3C-Meta-Grid did not work for CartPole

Why remove n-steps function?

Slicing bandit image in helper.py

rnn_state that feed in

'NoneType' object has no attribute 'model_checkpoint_path'

tensorflow 2

Init AC Network

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs