Implementation of the algorithm in Python 3, TensorFlow and OpenAI Gym

License: MIT License

Python 100.00%

ddpg deep-reinforcement-learning discrete-actions wolpertinger

deep-reinforcement-learning-in-large-discrete-action-spaces's People

Stargazers

Watchers

deep-reinforcement-learning-in-large-discrete-action-spaces's Issues

why is this for continuous action spaces?

Really cool that you've been working on implementing that algorithm in Python. I've been thinking of doing this as well. As far as I can tell, you're the only one that's tried doing this yet, so I'm not sure if you're looking for contributors, or if I should just go work on my own.

Either way, I'm just curious though, why is this implementation only for continuous action spaces? Isn't the point of the algorithm to bring the benefits of a DDPG to the large discrete space by mapping the continuous output to a set of discrete actions via the k-nearest neighbor approach? It looks like you've started working on that, but I'm still trying to figure out your code, so sorry if I'm misunderstanding.

Also, it looks like you're building your own nearest neighbor function here. Have you looked at using the one in sklearn?

why does the target action a' in Q(s', a') for training critic net directly come from the target actor net

Hi, jimkon
In the original paper, the action for training critic net comes from the full policy. But, in your master, the action is just given by the target actor net. I wonder to konw if there are any impacts on the last performance. Thanks ~

Handle the hyperparameters

Could not find how to adjust exploration rate, exploration-exploitation policy, discount rate, number of warm up steps etc. Please help me out!

assert(npts >= num_neighbors) AssertionError

When I change the k_ration in agrs to generate multiple actions, AssertionError shows as:
Traceback (most recent call last):
File "/Users/xx/Downloads/DROO-master/mec/rlmodel/LDAS/main.py", line 211, in
train(args.train_iter, agent, env, evaluate,
File "/Users/xx/Downloads/DROO-master/mec/rlmodel/LDAS/main.py", line 87, in train
agent.select_action(observation, args=args),
File "/Users/xx/Downloads/DROO-master/mec/rlmodel/LDAS/wolp.py", line 326, in select_action
actions = self.action_space.search_point(proto_action, self.k_nearest_neighbors)[0]
File "/Users/katerina/Downloads/DROO-master/mec/rlmodel/LDAS/action_space.py", line 28, in search_point
search_res, _ = self._flann.nn_index(p_in, k)
File "/Users/xx/opt/anaconda3/envs/tensorflow/lib/python3.8/site-packages/pyflann/index.py", line 223, in nn_index
assert(npts >= num_neighbors)
AssertionError

What's the reason and how can I fix it?

Batch Norm not working

I can attach details later, but was just curious if you had intended to use batch norm or just hadn't messed with it (or if batch norm didn't make sense for this architecture, although I wouldn't know why). I changed the wolp_agent interface to allow for batch normalization as it appeared as if it should work, but got some errors that I couldn't fix in the time I had to work on it this morning.

Thanks!

target_action in Agent.py

Hi, in the Agent.py line 144 of ddpg, you use state to get target_action. I think it should be state_2. In the original ddpg.py of stevenpig's implementation, he also uses state_t_1_batch to get action_t_1_batch.

How to deal with expotential action space?

Hello, I'm a new in "large action space". And I'm trying to do some work about large discrete action space. So will it work or could it be applied for 2^100 actions, such as 100 switches that can be open or close? And what were f and g in authors' paper? How to choose or train a f and g for specific problem? @M00NSH0T＠jimkon

jimkon / deep-reinforcement-learning-in-large-discrete-action-spaces Goto Github PK

deep-reinforcement-learning-in-large-discrete-action-spaces's People

Stargazers

Watchers

Forkers

deep-reinforcement-learning-in-large-discrete-action-spaces's Issues

why is this for continuous action spaces?

why does the target action a' in Q(s', a') for training critic net directly come from the target actor net

Handle the hyperparameters

assert(npts >= num_neighbors) AssertionError

Batch Norm not working

target_action in Agent.py

How to deal with expotential action space?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs