GithubHelp home page GithubHelp logo

jimkon / deep-reinforcement-learning-in-large-discrete-action-spaces Goto Github PK

View Code? Open in Web Editor NEW
168.0 168.0 53.0 213.31 MB

Implementation of the algorithm in Python 3, TensorFlow and OpenAI Gym

License: MIT License

Python 100.00%
ddpg deep-reinforcement-learning discrete-actions wolpertinger

deep-reinforcement-learning-in-large-discrete-action-spaces's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-reinforcement-learning-in-large-discrete-action-spaces's Issues

why is this for continuous action spaces?

Really cool that you've been working on implementing that algorithm in Python. I've been thinking of doing this as well. As far as I can tell, you're the only one that's tried doing this yet, so I'm not sure if you're looking for contributors, or if I should just go work on my own.

Either way, I'm just curious though, why is this implementation only for continuous action spaces? Isn't the point of the algorithm to bring the benefits of a DDPG to the large discrete space by mapping the continuous output to a set of discrete actions via the k-nearest neighbor approach? It looks like you've started working on that, but I'm still trying to figure out your code, so sorry if I'm misunderstanding.

Also, it looks like you're building your own nearest neighbor function here. Have you looked at using the one in sklearn?

Handle the hyperparameters

Could not find how to adjust exploration rate, exploration-exploitation policy, discount rate, number of warm up steps etc. Please help me out!

assert(npts >= num_neighbors) AssertionError

When I change the k_ration in agrs to generate multiple actions, AssertionError shows as:
Traceback (most recent call last):
File "/Users/xx/Downloads/DROO-master/mec/rlmodel/LDAS/main.py", line 211, in
train(args.train_iter, agent, env, evaluate,
File "/Users/xx/Downloads/DROO-master/mec/rlmodel/LDAS/main.py", line 87, in train
agent.select_action(observation, args=args),
File "/Users/xx/Downloads/DROO-master/mec/rlmodel/LDAS/wolp.py", line 326, in select_action
actions = self.action_space.search_point(proto_action, self.k_nearest_neighbors)[0]
File "/Users/katerina/Downloads/DROO-master/mec/rlmodel/LDAS/action_space.py", line 28, in search_point
search_res, _ = self._flann.nn_index(p_in, k)
File "/Users/xx/opt/anaconda3/envs/tensorflow/lib/python3.8/site-packages/pyflann/index.py", line 223, in nn_index
assert(npts >= num_neighbors)
AssertionError

What's the reason and how can I fix it?

Batch Norm not working

I can attach details later, but was just curious if you had intended to use batch norm or just hadn't messed with it (or if batch norm didn't make sense for this architecture, although I wouldn't know why). I changed the wolp_agent interface to allow for batch normalization as it appeared as if it should work, but got some errors that I couldn't fix in the time I had to work on it this morning.

Thanks!

target_action in Agent.py

Hi, in the Agent.py line 144 of ddpg, you use state to get target_action. I think it should be state_2. In the original ddpg.py of stevenpig's implementation, he also uses state_t_1_batch to get action_t_1_batch.

How to deal with expotential action space?

Hello, I'm a new in "large action space". And I'm trying to do some work about large discrete action space. So will it work or could it be applied for 2^100 actions, such as 100 switches that can be open or close? And what were f and g in authors' paper? How to choose or train a f and g for specific problem? @M00NSH0T@jimkon

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.