jimkon / deep-reinforcement-learning-in-large-discrete-action-spaces Goto Github PK
View Code? Open in Web Editor NEWImplementation of the algorithm in Python 3, TensorFlow and OpenAI Gym
License: MIT License
Implementation of the algorithm in Python 3, TensorFlow and OpenAI Gym
License: MIT License
Really cool that you've been working on implementing that algorithm in Python. I've been thinking of doing this as well. As far as I can tell, you're the only one that's tried doing this yet, so I'm not sure if you're looking for contributors, or if I should just go work on my own.
Either way, I'm just curious though, why is this implementation only for continuous action spaces? Isn't the point of the algorithm to bring the benefits of a DDPG to the large discrete space by mapping the continuous output to a set of discrete actions via the k-nearest neighbor approach? It looks like you've started working on that, but I'm still trying to figure out your code, so sorry if I'm misunderstanding.
Also, it looks like you're building your own nearest neighbor function here. Have you looked at using the one in sklearn?
Hi, jimkon
In the original paper, the action for training critic net comes from the full policy. But, in your master, the action is just given by the target actor net. I wonder to konw if there are any impacts on the last performance. Thanks ~
Could not find how to adjust exploration rate, exploration-exploitation policy, discount rate, number of warm up steps etc. Please help me out!
When I change the k_ration in agrs to generate multiple actions, AssertionError shows as:
Traceback (most recent call last):
File "/Users/xx/Downloads/DROO-master/mec/rlmodel/LDAS/main.py", line 211, in
train(args.train_iter, agent, env, evaluate,
File "/Users/xx/Downloads/DROO-master/mec/rlmodel/LDAS/main.py", line 87, in train
agent.select_action(observation, args=args),
File "/Users/xx/Downloads/DROO-master/mec/rlmodel/LDAS/wolp.py", line 326, in select_action
actions = self.action_space.search_point(proto_action, self.k_nearest_neighbors)[0]
File "/Users/katerina/Downloads/DROO-master/mec/rlmodel/LDAS/action_space.py", line 28, in search_point
search_res, _ = self._flann.nn_index(p_in, k)
File "/Users/xx/opt/anaconda3/envs/tensorflow/lib/python3.8/site-packages/pyflann/index.py", line 223, in nn_index
assert(npts >= num_neighbors)
AssertionError
What's the reason and how can I fix it?
I can attach details later, but was just curious if you had intended to use batch norm or just hadn't messed with it (or if batch norm didn't make sense for this architecture, although I wouldn't know why). I changed the wolp_agent interface to allow for batch normalization as it appeared as if it should work, but got some errors that I couldn't fix in the time I had to work on it this morning.
Thanks!
Hi, in the Agent.py line 144 of ddpg, you use state
to get target_action. I think it should be state_2
. In the original ddpg.py of stevenpig's implementation, he also uses state_t_1_batch
to get action_t_1_batch
.
Hello, I'm a new in "large action space". And I'm trying to do some work about large discrete action space. So will it work or could it be applied for 2^100 actions, such as 100 switches that can be open or close? And what were f and g in authors' paper? How to choose or train a f and g for specific problem? @M00NSH0T@jimkon
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.