iffix / machin Goto Github PK
View Code? Open in Web Editor NEWReinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
License: MIT License
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
License: MIT License
machin/frame/algorithms/impala.py, line 363, 373
vs[idx] = (value[idx] + delta_v[idx] + self.discount * c[idx] * (vs[idx + 1] - value[idx + 1]))
This should be corrected to the following code
vs[idx] = (value[idx].to('cpu') + delta_v[idx] + self.discount * c[idx] * (vs[idx + 1] - value[idx + 1].to('cpu')))
Do the same for line 373
It seems that your code produce error if the len of your trajectory < 2 ( len(tmp_observations) < 2). I tested this on PPO I don't know if this happens with all algorithms.
The error:
ValueError: The parameter probs has invalid values
Hi,
I was using machin until today, but I think it there was a super recent update and now my code stopped working (only for PPO, DQN still works).
With version 0.4.0 I get the error (when importing "from machin.frame.algorithms import PPO"):
ModuleNotFoundError: No module named 'machin.frame.helpers'
And if I install version 0.3.4 I get the error (when updating PPO):
AttributeError: 'PPO' object has no attribute 'grad_max'
Joao
Hey, I'm trying to implement hybrid action space with A2C agent, maybe you have some advice.
My expected output are two actions: one discrete, one continuous. Network predicts 3 things:
Net outputs sum of log probabilities of actions from both distributions (same for entropy). Network successfully learns the mean and std but the weight for the logits layers are not updated at all. What can be the reason?
Hi! Thanks for your excellent work!
I tried several RL frameworks based on PyTorch, machin is one of the few libraries that discusses hybrid action space.
I'm diving into a complex environment which is a hierarchical action space problem. I hope you could give me some advice!
To explain the meaning of hierarchical action space more clearly, here is an example in the paper Generalising Discrete Action Spaces with Conditional Action Trees. Figure2 in the paper shows that the actions are decomposed as an action tree. One should first select the first level actions, then select the second level actions. The action space of the first level is 3 and the action space of the second level depends on the first level.
I try to give one possible solution to solve this:
transition = { "state": {"some_state": old_state} , ... } # old
transition = {"state": {"some_state": old_state, "valid_actions": valid_action_set } , ... }
here, the "valid_actions" contains all of the possible second-level actions based on the first-level action.
state = env.reset() # state have key words "some_state" and "valid_actions", here, for initial, the valid_actions are pre-defined manually. For example, we choose the first-level action 'use', and then the valid_actions will be 'food'
while not done:
action2 = agent.act2(state) # choose action2 from valid action set. Here, action2 is 'food'
action1 = agent.act1(state) # choose action1 which denotes the first level action space. Here, suppose action1 is 'move'.
env.first_level(action1) # tell env, the valid action set in next step is 'up, down, right, left' under the 'move' branch
next_state, reward, done = env.step(action2) # next_state have key words "some_state" and "valid_actions", the valid_actions have 'up, down, right, left'.
class act2(nn.Module):
def forward(self, state = {'some_state': <...> , 'valid_actions': <...>} ):
state, second_level_actions = state.state, state.valid_actions
.... # calculate the similarity between state and valid actions, and output logits
return (...), None
Do you think this solution is reasonable? Is there any better way to support such a conditional hierarchical action space?
Hello,
Does machin support Multi Discrete Action Spaces? (two different actions in the same time step)
I've looked through the documentation but cannot find anything related to that
João
Apex-ddpg cannot use the GPU. The tests in the apex-ddpg seem to all be using the CPU, is this feature currently not supported? I tried changing the actor and critic network in ddpg_apex.py to use cuda:0 but I get the following error. I tried using the default example and also world size=2, 1 worker and 1 sampler to resolve any self-deadlocks being caused by the multiple processes.
RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Hey, i'm using DQN and my q-values are variable lengths sequences as I have different amount of actions for each state (my states have different shapes also). When sampling batches default Buffer concatenating them which leads to a tensor error. But when using update() with concatenate_samples=False
it stills doesn't solve the problem as now samples are just lists, and all torch operations fail. Of course, I can pad the sequences, but then argmax() can return one of the padded indexes, as it's not possible to pass original lengths of each sample in batch in update() function. Is there any way to solve the problem right now, or it yet to be implemented?
I've tried to train your_first_program on a GPU by
uncommenting static_module_wrapper lines like so:
q_net = static_module_wrapper(q_net, "cuda", "cuda")
q_net_t = static_module_wrapper(q_net_t, "cuda", "cuda")
and got an error:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
a combination
q_net = static_module_wrapper(q_net, "cuda", "cpu")
q_net_t = static_module_wrapper(q_net_t, "cuda", "cpu")
fails too. Following setting runs, but doesn't use GPU:
q_net = static_module_wrapper(q_net, "cpu", "cuda")
q_net_t = static_module_wrapper(q_net_t, "cpu", "cuda")
What shall I do to train your_first_program on a GPU?
Here is requirements.txt:
absl-py==0.10.0
astor==0.8.1
astunparse==1.6.3
backcall==0.2.0
brotlipy==0.7.0
cachetools==4.1.1
certifi==2020.6.20
cffi==1.13.2
chardet==3.0.4
cloudpickle==1.6.0
colorlog==4.4.0
cryptography @ file:///tmp/build/80754af9/cryptography_1601046817403/work
cycler==0.10.0
decorator==4.4.2
dill==0.3.2
dm-reverb==0.1.0
dm-tree==0.1.5
EasyProcess==0.3
future==0.18.2
gast==0.3.3
gin-config==0.3.0
google-auth==1.22.1
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
GPUtil==1.4.0
graphviz==0.14.2
grpcio==1.33.1
gym==0.17.3
h5py @ file:///tmp/build/80754af9/h5py_1593454119955/work
idna==2.10
imageio==2.9.0
imageio-ffmpeg==0.4.2
importlib-metadata==2.0.0
install==1.3.4
ipython @ file:///tmp/build/80754af9/ipython_1598883837425/work
ipython-genutils==0.2.0
jedi @ file:///tmp/build/80754af9/jedi_1596490743326/work
Keras-Applications @ file:///tmp/build/80754af9/keras-applications_1594366238411/work
Keras-Preprocessing==1.1.2
kiwisolver==1.2.0
machin==0.3.4
Markdown==3.3.2
matplotlib==3.3.2
mkl-fft==1.2.0
mkl-random==1.1.1
mkl-service==2.3.0
moviepy==1.0.3
numpy==1.18.5
oauthlib==3.1.0
opt-einsum==3.3.0
pandas @ file:///tmp/build/80754af9/pandas_1602088128026/work
parso==0.7.0
pexpect @ file:///tmp/build/80754af9/pexpect_1594383317248/work
pickleshare @ file:///tmp/build/80754af9/pickleshare_1594384075987/work
Pillow==8.0.1
portpicker==1.3.1
proglog==0.1.9
progressbar==2.5
prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1602688806899/work
protobuf==3.13.0
psutil==5.7.2
ptyprocess==0.6.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.19
pyglet==1.5.0
Pygments @ file:///tmp/build/80754af9/pygments_1600458456400/work
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1594392929924/work
pyparsing==2.4.7
PySocks @ file:///tmp/build/80754af9/pysocks_1594394576006/work
python-dateutil==2.8.1
pytz==2020.1
PyVirtualDisplay @ file:///home/conda/feedstock_root/build_artifacts/pyvirtualdisplay_1602367622068/work
requests==2.24.0
requests-oauthlib==1.3.0
rsa==4.6
scipy==1.5.3
six==1.15.0
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorboardX==2.1
tensorflow==2.3.1
tensorflow-estimator==2.3.0
tensorflow-probability==0.11.1
termcolor==1.1.0
torch==1.6.0
torchvision==0.5.0
torchviz==0.0.1
tqdm==4.50.2
traitlets @ file:///tmp/build/80754af9/traitlets_1602787416690/work
urllib3==1.25.11
wcwidth @ file:///tmp/build/80754af9/wcwidth_1593447189090/work
Werkzeug==1.0.1
wrapt==1.12.1
zipp==3.3.1
Hi,
this project is really awesome and the codes is well structured!
Is your feature request related to a problem? Please describe.
I have run some codes in examples/tutorials, but can not find some about MARL algorithms such as maddpg.
Describe the solution you'd like
Since you have already implemented the maddpg.py and test_maddpg.py, I am wondering could you implement a tutorial for maddpg.py too? (It would be better to implement more MARL algorithms, such as COMA, QMIX, VDN)
When i try to train the dqn rainbow agent i'm getting "index_add_(): self and source must have the same scalar type" error when weights are updated.
I'm using a simple Model to understand
class QNet(nn.Module):
# this test setup lacks the noisy linear layer and dueling structure.
def __init__(self, action_num, atom_num=10):
super(QNet, self).__init__()
self.hidden_in = nn.Conv2d(4, 64, kernel_size = 4, stride = 2)
self.fc1 = nn.Linear(105280, 64)
self.fc2 = nn.Linear(64, 16)
self.fc3 = nn.Linear(16, action_num * atom_num)
self.action_num = action_num
self.atom_num = atom_num
self.flat = nn.Flatten()
def forward(self, state):
a = t.relu(self.hidden_in(state))
a = self.flat(a)
a = t.relu(self.fc1(a))
a = t.relu(self.fc2(a))
return t.softmax(self.fc3(a)
.view(-1, self.action_num, self.atom_num),
dim=-1)
i'm not unable to understand where is the tensor changing its type.
input is in shape (4,50,50) dtype.float32.
Hello, when I am trying to run a tutorial script, e.g. the your_first_program example, I always encounter this AttributeError during the imports:
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'
However, I fulfill the listed requirements. Is there anything I am missing or have can I solve this?
First of all, I just want to show my gratitude regarding your efforts in writing such a cool library and also such a nice documentation. I just discovered your work a few days ago and I am trying to use it.
SYSTEM:
Getting the repo:
[MADDPG] Describe the bug
I am trying to run this code from your example folder but I get the following error message
[optimizer(acc.parameters(), lr=actor_learning_rate) for acc in ac]
TypeError: 'list' object is not callable
It seems that the optimizer is not a callable but instead I have the following data on it
[IMPALA] Describe the bug
Also, I am trying to run this code but I am facing another problem. My application is going to run without any kind of feedback and at some point in time, it seems that I will receive a connection error due to a timeout.
store = TCPStore(result.hostname, result.port, world_size, start_daemon, timeout)
RuntimeError: connect() timed out.
So far as I debug it's because of World(world_size=4, rank=rank, name=str(rank), rpc_timeout=20)
and to be more precise this part from world.py
dist.init_process_group(
backend=dist_backend,
init_method=dist_init_method,
timeout=timedelta(seconds=dist_timeout),
rank=rank,
world_size=world_size,
)
[TEST FAILES] Describe the bug
And one mode thing that I've tried is to run run_linux_test.sh
from the main branch this time (the rest of them were from the version v0.4.1) and there is the output, in case of having any importance for the 2 bugs that I've described above or any other.
================================== test session starts ===================================
platform linux -- Python 3.9.5, pytest-6.0.1, py-1.10.0, pluggy-0.13.1
rootdir: /home/vlad/Documents/TradingBotRL/machin, configfile: pytest.ini
plugins: metadata-1.11.0, repeat-0.8.0, html-1.22.1
collected 949 items / 3 errors / 31 deselected / 915 selected
========================================= ERRORS =========================================
_______________________ ERROR collecting test/auto/test_dataset.py _______________________
test/auto/test_dataset.py:1: in <module>
from machin.auto.dataset import determine_precision, DatasetResult
machin/auto/__init__.py:2: in <module>
from . import envs
machin/auto/envs/__init__.py:1: in <module>
from . import openai_gym
machin/auto/envs/openai_gym.py:5: in <module>
from ..dataset import DatasetResult, RLDataset, log_video, determine_precision
machin/auto/dataset.py:94: in <module>
class RLDataset(IterableDataset):
venv/lib/python3.9/site-packages/torch/utils/data/_typing.py:273: in __new__
return super().__new__(cls, name, bases, namespace, **kwargs) # type: ignore[call-overload]
/usr/lib/python3.9/abc.py:85: in __new__
cls = super().__new__(mcls, name, bases, namespace, **kwargs)
venv/lib/python3.9/site-packages/torch/utils/data/_typing.py:370: in _dp_init_subclass
raise TypeError("Expected 'Iterator' as the return annotation for `__iter__` of {}"
E TypeError: Expected 'Iterator' as the return annotation for `__iter__` of RLDataset, but found typing.Iterable
______________________ ERROR collecting test/auto/test_launcher.py _______________________
test/auto/test_launcher.py:2: in <module>
from machin.auto.launcher import Launcher
machin/auto/__init__.py:2: in <module>
from . import envs
machin/auto/envs/__init__.py:1: in <module>
from . import openai_gym
machin/auto/envs/openai_gym.py:5: in <module>
from ..dataset import DatasetResult, RLDataset, log_video, determine_precision
machin/auto/dataset.py:94: in <module>
class RLDataset(IterableDataset):
venv/lib/python3.9/site-packages/torch/utils/data/_typing.py:273: in __new__
return super().__new__(cls, name, bases, namespace, **kwargs) # type: ignore[call-overload]
/usr/lib/python3.9/abc.py:85: in __new__
cls = super().__new__(mcls, name, bases, namespace, **kwargs)
venv/lib/python3.9/site-packages/torch/utils/data/_typing.py:370: in _dp_init_subclass
raise TypeError("Expected 'Iterator' as the return annotation for `__iter__` of {}"
E TypeError: Expected 'Iterator' as the return annotation for `__iter__` of RLDataset, but found typing.Iterable
___________________ ERROR collecting test/auto/env/test_openai_gym.py ____________________
test/auto/env/test_openai_gym.py:11: in <module>
from machin.auto.envs.openai_gym import (
machin/auto/__init__.py:2: in <module>
from . import envs
machin/auto/envs/__init__.py:1: in <module>
from . import openai_gym
machin/auto/envs/openai_gym.py:5: in <module>
from ..dataset import DatasetResult, RLDataset, log_video, determine_precision
machin/auto/dataset.py:94: in <module>
class RLDataset(IterableDataset):
venv/lib/python3.9/site-packages/torch/utils/data/_typing.py:273: in __new__
return super().__new__(cls, name, bases, namespace, **kwargs) # type: ignore[call-overload]
/usr/lib/python3.9/abc.py:85: in __new__
cls = super().__new__(mcls, name, bases, namespace, **kwargs)
venv/lib/python3.9/site-packages/torch/utils/data/_typing.py:370: in _dp_init_subclass
raise TypeError("Expected 'Iterator' as the return annotation for `__iter__` of {}"
E TypeError: Expected 'Iterator' as the return annotation for `__iter__` of RLDataset, but found typing.Iterable
--- generated html file: file:///home/vlad/Documents/TradingBotRL/machin/test_api.html ---
================================ short test summary info =================================
ERROR test/auto/test_dataset.py - TypeError: Expected 'Iterator' as the return annotati...
ERROR test/auto/test_launcher.py - TypeError: Expected 'Iterator' as the return annotat...
ERROR test/auto/env/test_openai_gym.py - TypeError: Expected 'Iterator' as the return a...
!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 3 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!
============================ 31 deselected, 3 errors in 2.87s ============================
Is your feature request related to a problem? Please describe.
I read the tutorial about RL in spiningup. I found that for on-policy RL, they have a step to collect a set of trajectories in their pseudocode. However, in your documentation Data flow in Machin, you point out that
Currently, the constructor of the default transition implementation Transition requires batch size to be 1
and
Buffer.store_episode(): If you pass in a dict type transition object, it will be automatically converted to Transition
In your PPO examples/ppo.py it seems that you only save one trajectory per iteration and update it. What should I do if I want to save a set of trajectories? Will such a change affect the update() part?
In general scenarios, one trajectory (episode) will have one total reward. However, I encountered a case where multiple trajectories with only one total reward. For example:
(trajectory1: [s,a,0,s,a,0,...,s,a], trajectory2 [s,a,0,s,a,0,...,s,a], trajectory3[s,a,0,s,a,0,...,s,a] ) ---> final reward
Imagine that many football players are playing the same game. They receive the same reward only when goal,
Imagine that generating a batch of noise sequences to attack the neural network will get only one reward which indicates the degraded performance of the neural network.
I give two solutions to this problem in the next section, but I am not sure which one is better. Could you give me some advice?
Describe the solution you'd like
For feature 1, it may realized as below(I am not sure whether this affect the update() part):
while episode< max_episodes:
for i in range(sub_episode_size) # add a loop here
episode += 1
tmp_observations = []
while not terminal and step <= max_steps:
# ....
tmp_observations.append(...) # store transition
ppo.store_episode(tmp_observations)
ppo.update()
# clean buffer
For feature 2, it may have two solutions. One is to assign the final reward to every trajectory:
from collection import defaultdict
while episode< max_episodes:
batch_trajectory = defaultdict[List]
episode += batch_size
while not terminal and step <= max_steps:
# using batch state to generate batch action
reward = env.step(batch_action)
for i in range(batch_size):
batch_trajectory[i] += batch_state[i] + batch_action[i] + reward # assign the same reward to each trajectory
for i in range(batch_size):
ppo.store_episode(batch_trajectory[i])
ppo.update()
# clean buffer
Describe alternatives you've considered
For feature 2, another solution may resort to the multi-agent RL. Each agent manages one trajectory and they receive the same reward from the environment. I found that Machin has a multi-agent algorithm implementation called MADDPG. From spiningup I found that this algorithm is only for continuous action space. Is there any plan to implement other multi-agent RL algorithms such as multi-agent PPO for discrete action space?
Additional context
File "D:\Anaconda3\envs\universe\lib\site-packages\torch\distributed\rendezvous.py", line 9, in
from . import FileStore, TCPStore
ImportError: cannot import name 'FileStore'
After installing machin, run PPO. py,than report an error, and try others to report the same error.As follow:
D:\Anaconda3\envs\universe\python.exe F:/machin/machin-master/examples/framework_examples/dqn.py
Traceback (most recent call last):
File "F:/machin/machin-master/examples/framework_examples/dqn.py", line 1, in
from machin.frame.algorithms import DQN
File "D:\Anaconda3\envs\universe\lib\site-packages\machin_init_.py", line 1, in
from . import env, frame, model, parallel, utils
File "D:\Anaconda3\envs\universe\lib\site-packages\machin\env_init_.py", line 1, in
from . import utils, wrappers
File "D:\Anaconda3\envs\universe\lib\site-packages\machin\env\wrappers_init_.py", line 1, in
from . import base, openai_gym
File "D:\Anaconda3\envs\universe\lib\site-packages\machin\env\wrappers\openai_gym.py", line 8, in
from machin.parallel.exception import ExceptionWithTraceback
File "D:\Anaconda3\envs\universe\lib\site-packages\machin\parallel_init_.py", line 2, in
from . import distributed, server, assigner, exception, pickle, thread, pool, queue
File "D:\Anaconda3\envs\universe\lib\site-packages\machin\parallel\distributed_init_.py", line 1, in
from .world import (
File "D:\Anaconda3\envs\universe\lib\site-packages\machin\parallel\distributed\world.py", line 14, in
import torch.distributed.distributed_c10d as dist_c10d
File "D:\Anaconda3\envs\universe\lib\site-packages\torch\distributed\distributed_c10d.py", line 10, in
from .rendezvous import rendezvous, register_rendezvous_handler # noqa: F401
File "D:\Anaconda3\envs\universe\lib\site-packages\torch\distributed\rendezvous.py", line 9, in
from . import FileStore, TCPStore
ImportError: cannot import name 'FileStore'
Process finished with exit code 1
Sparse rewards are a problem for everyone in RL dealing with robotics and such.
Just like its possible to create your own networks (which is awesome), I think it could be useful be able to create your own replay buffers to implement things like Hindsight Experience Replay and others.
Where to alter
It's best to alter throughout the entire library, but my specific pain started with ppo.py
.
Why to alter
How to alter
Example: originally, this is how code looks that I needed to debug:
batch_size, (state, action, advantage) = \
self.replay_buffer.sample_batch(self.batch_size,
sample_method="random_unique",
concatenate=concatenate_samples,
sample_attrs=[
"state", "action", "gae"],
additional_concat_attrs=[
"gae"
])
Here is how it looks after black
reformatting:
batch_size, (state, target_value) = self.replay_buffer.sample_batch(
self.batch_size,
sample_method="random_unique",
concatenate=concatenate_samples,
sample_attrs=["state", "value"],
additional_concat_attrs=["value"],
)
Hi,
I guess the entropy in A2C is wrong:
if new_action_entropy is not None:
act_policy_loss += self.entropy_weight * new_action_entropy.mean()
instead it should be:
if new_action_entropy is not None:
act_policy_loss -= self.entropy_weight * new_action_entropy.mean()
Best,
Lorenzo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.