awjuliani / meta-rl Goto Github PK

View Code? Open in Web Editor NEW

400.0 400.0 109.0 834 KB

Implementation of Meta-RL A3C algorithm

License: MIT License

Python 15.06% Jupyter Notebook 84.94%

reinforcement-learning tensorflow

meta-rl's People

Contributors

Stargazers

Watchers

Forkers

codeaudit kailuowang geniusgeek amoliu binderwang 4skynet originholic dunovank benjamesbabala tigerneil ml-lab leezqcst synpon achao2013 agistrueai shimmeringvoid allensmile ehrosini nniy gortium hal2001 coocoky skribled chpyang0229 guillermogsjc wmitsuda cpehle collector-m newebug ojkelly lkh-1 ajaytalati sungjinlees josephlau payshangjj johndpope dabana meelement jiths kefault barzinm kashenfelter junchenjin gaoyz0625 andrewliao11 renly wshenx bilio shubhampachori12110095 batermj leorez rohitn yeshwanthv5 zxsted atiroms fanyuzeng lsw5835 manuelmolano landoufulxf shaonannan wanghuimu himjl rikirolly fedorajzf jzinsa che1qian2 wolf-bailang skycreeper2000 pkjli wh-forker shotaromiwa gomerudo pjhool visualtornado seuzmj benlansdell scape1989 gaopeng-bai anirudhk686 psyche-mia agiant obitoquilt zhb-1996 sreejank 1cycok3 cognoscentai maridia sakura-rain zhenchangxia apprenticearnab lhu25 idea-lab-smu josephthinhtran yskim525 jasmeeetkaur asheeshiit anysomebody1 suen049 sailfish009 awesomejf

meta-rl's Issues

Why remove n-steps function?

Hello, I have question. If I use n-step to updated episode-buffer which will work better?

I thinks off-policy will work better than on-policy. sorry I haven't understand meta-learning, I just want ask your opinion

Slicing bandit image in helper.py

Hi, thanks for sharing this very interesting project !!!

I think I might have found a typo/bug?

Traceback (most recent call last):
  File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-9-be9dc3580d09>", line 33, in <lambda>
    worker_work = lambda: worker.work(gamma,sess,coord,saver,train)
  File "<ipython-input-4-804d524ea4f0>", line 94, in work
    episode_frames.append(set_image_bandit(episode_reward,self.env.bandit,a,t))
  File "/home/ajay/PythonProjects/Meta-RL/helper.py", line 65, in set_image_bandit
    bandit_image[115:115+values[0]*2.5,20:75,:] = [0,255.0,0]
TypeError: slice indices must be integers or None or have an __index__ method

On another note I'd like to try to apply this code to different problems. The problem I'd like to examine is where the agent tries to attempt to make small logic circuits at each time-step. Mathematically I'd like agent to be able to output a simply transition probability matrix. That is, rather than a distribution over an action set, the agent's policy network outputs a small square matrix, say 8x8 with a 1 on each row. I guess I could somehow make a_dist be vector of length 64, and then apply 8 softmax's followed by one_hots or argmax's, but it seems rather messy? Also I'm not sure how this effects the "responsible output" in the loss calculation? Just wondered if you've ever seen something like that before? Cheers, Ajay

A3C-Meta-Grid did not work for CartPole

Hi awjuliani,

I have enjoyed reading your excellent examples. I tried to modify your code of A3C-Meta-Grid to solve CartPole problem, but it did not work. Even when I remove the RNN layer, it still did not work. It roughly can score about 60 points at maximum. Do you know why?

 Thanks!

Init AC Network

Hi, thank you for your code.

I have a issue with init AC Network for meta bandit:
hidden = tf.concat(1, [self.prev_rewards, self.prev_actions_onehot, self.timestep])

It gave me a error:
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Thank you very much for replay

'NoneType' object has no attribute 'model_checkpoint_path'

Hello and happy 2020! I am trying to run A3C-Meta-Bandit and run into the error below. All the cells are running fine, except for the final cell of the python notebook script which produces this error. Is there some folder that is missing? Thank you!

tensorflow 2

Hi, thank you for the code! do you know of any similar implementation of meta rl using tensorflow 2?

awjuliani / meta-rl Goto Github PK

meta-rl's People

Contributors

Stargazers

Watchers

Forkers

meta-rl's Issues

rnn_state that feed in

Why remove n-steps function?

Slicing bandit image in helper.py

A3C-Meta-Grid did not work for CartPole

Init AC Network

'NoneType' object has no attribute 'model_checkpoint_path'

tensorflow 2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs