awjuliani / meta-rl Goto Github PK
View Code? Open in Web Editor NEWImplementation of Meta-RL A3C algorithm
License: MIT License
Implementation of Meta-RL A3C algorithm
License: MIT License
Hi awjuliani,
I have enjoyed reading your excellent examples. I tried to modify your code of A3C-Meta-Grid to solve CartPole problem, but it did not work. Even when I remove the RNN layer, it still did not work. It roughly can score about 60 points at maximum. Do you know why?
Thanks!
Hello, I have question. If I use n-step to updated episode-buffer which will work better?
I thinks off-policy will work better than on-policy. sorry I haven't understand meta-learning, I just want ask your opinion
Hi, thanks for sharing this very interesting project !!!
I think I might have found a typo/bug?
Traceback (most recent call last):
File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "<ipython-input-9-be9dc3580d09>", line 33, in <lambda>
worker_work = lambda: worker.work(gamma,sess,coord,saver,train)
File "<ipython-input-4-804d524ea4f0>", line 94, in work
episode_frames.append(set_image_bandit(episode_reward,self.env.bandit,a,t))
File "/home/ajay/PythonProjects/Meta-RL/helper.py", line 65, in set_image_bandit
bandit_image[115:115+values[0]*2.5,20:75,:] = [0,255.0,0]
TypeError: slice indices must be integers or None or have an __index__ method
On another note I'd like to try to apply this code to different problems. The problem I'd like to examine is where the agent tries to attempt to make small logic circuits at each time-step. Mathematically I'd like agent to be able to output a simply transition probability matrix. That is, rather than a distribution over an action set, the agent's policy network outputs a small square matrix, say 8x8
with a 1
on each row. I guess I could somehow make a_dist
be vector of length 64, and then apply 8 softmax's followed by one_hots or argmax's, but it seems rather messy? Also I'm not sure how this effects the "responsible output" in the loss calculation? Just wondered if you've ever seen something like that before? Cheers, Ajay
Hi, thank you for the code! do you know of any similar implementation of meta rl using tensorflow 2?
Hi, thank you for your code.
I have a issue with init AC Network for meta bandit:
hidden = tf.concat(1, [self.prev_rewards, self.prev_actions_onehot, self.timestep])
It gave me a error:
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.
Thank you very much for replay
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.